Developing Apama Applications > Apama EPL Reference > Types > Primitive and string data types > string
string
A text string.
Usage
Enclose string literals in double quotes. Values of the string type are sequences of non-null Unicode characters encoded in UTF-8 format. Note that UTF-8 is a variable-width encoding and a character can occupy from 1 to 4 bytes of storage. The characters in the 7-bit ASCII character set are a subset of UTF-8 and occupy a single byte each.
Although string types are discussed as though they are primitive types, they are actually reference types. However, EPL’s string objects are immutable. For example, a statement such as s:=s+" suffix"; creates a new string object and changes the variable s to refer to that new string object. Any other references to the old value continue to point to the old value.
Operations that can return a different string value, such as concatenation, case folding, or trimming white space, always create new strings rather than modifying the existing value in place. The previous value’s storage is recovered later by the EPL runtime garbage collector.
The length of a string is limited by the memory available at runtime, which can be multiple gigabytes. In practice, you are unlikely to exceed the limit in a single string. (The total address space available to the EPL runtime system is limited to roughly four gigabytes when running on a 32-bit system.)
Use the \ to enter special characters in string literals:
To enter this...
Insert this...
" (double quote)
\"
\ (backslash)
\\
newline character
\n
tab character
\t
Operators
The table below lists the EPL operators available for use with string values.
Operator
Description
Result Type
<
Less-than string comparison
boolean
<=
Less-than or equal string comparison
boolean
=
Equal string comparison
boolean
!=
Not equal string comparison
boolean
>=
Greater-than or equal string comparison
boolean
>
Greater-than string comparison
boolean
+
String concatenation
string
When you compare two strings for equality, the result is true if the strings are the same length and each character in one string is identical to the corresponding character at the same position in the other string.
When you compare two strings for less than or greater than, the characters in the strings are compared pairwise according to the numerical values of their Unicode code points. The comparison is case-sensitive so capital letters are not equal to their lower case equivalents. Characters earlier in the character set sort before characters later in the character set. To order two unequal strings, the earliest difference is considered. For example, "abcXdef" sorts earlier than "abcYdef", "abc" sorts earlier than "abcXYZ"; the empty string sorts earliest of all.
Methods
The following methods may be called on values of string type:
StringMethods
*canParse() — returns true if the string argument can be successfully parsed.
*clone(string) — returns a reference to the specified string. When called on a string, the clone() method does not make a copy of the string since strings are immutable.
*find(substring) — returns an integer indicating the index position of the substring passed as parameter to the method. If the string parameter does not exist as a substring within the string, the method returns -1. Note that in EPL string indices (the position of a character within the string) count upwards from 0.
*findFrom(substring, fromIndex) — behaves like the find() method, but starts searching for the specified substring with the character indicated by fromIndex. For example, if the value of fromIndex is 7, the search begins with the character that has an index of 7.
*intern() — marks the string it is called on as interned. Subsequent incoming events that contain a string that is identical to an interned string use the same string object. The intern() method takes no arguments and returns the interned version of the string it is called on. For example:
print "hello world";
print "hello world".intern();
Both statements print:
hello world
The benefit of using the intern() method is that it reduces the amount of memory used and the amount of work the garbage collector must do. A disadvantage is that you cannot free memory used for an interned string.
If there are a limited number of strings that will be used many times then calling intern() on these strings speeds the handling of events that use them. You might want to call intern() on the names of products or stock symbols, which are all used frequently. For example, invoking "APMA".intern() might make sense if you are expecting a large number of incoming events of the form Tick("APMA", ...). You would not want to call intern() on order IDs, because there are so many and each one is likely to be unique.
Calling intern() on a string is a global operation. That is, all contexts can then use the same string object. Any strings already in use by the correlator are not affected, even if they match the string intern() is called on.
If you use correlator persistence, details of which strings have been interned are not stored in the recovery datastore. If the correlator shuts down and restarts, you must call intern() again on the pertinent strings.
*join(sequence<string> s ) — concatenates the strings in s using the string it is called on as a separator. The single parameter must be a sequence type that contains strings. You cannot specify a variable number of string parameters. For example:
sequence<string> s :=
   ["Something", "Completely", "Different"];
print ", ".join(s);
This prints the following:
Something, Completely, Different
*length() — returns an integer indicating the length of the string.
*ltrim() — returns a string where all white space characters at the beginning have been removed. White space characters are space, new line and tab characters.
*parse() — method that returns the string value represented by the string argument without enclosing that value in quotation marks. You can call this method on the string type or on an instance of a string type. The more typical use is to call parse() directly on the string type.
The parse() method takes a single string as its argument. The string must adhere to the format described in Deploying and Managing Apama Applications, Event file format.
Use the following format to specify the string you want to parse:
"your_string_with_escape_characters"
Use a backslash to escape each quotation mark or backslash in your string, including quotation marks that enclose your string. For example, to parse "Hello World", specify it as "\"Hello World\"". In other words, if you are writing literal strings in EPL, you must precede all backslashes and quotation marks with a backslash. For example:
string a := "\".\\\\.\"";
string b := string.parse(a);
print a;
print b;
This prints the following:
".\\."
.\.
The string.parse() method is useful when you have a string that contains backslash escape characters and you want to obtain a string without them.
More examples:
string a := string.parse("\"Hello World\"");
string b := string.parse("\"\\\"\"");
print a;
print b;
This prints the following:
Hello World
"
You can specify the parse() method after an expression or type name. If the correlator is unable to parse the string, it is a runtime error and the monitor instance that the EPL is running in terminates. For example, the following is an error and causes the correlator to terminate:
a := string.parse("Hello World");
The parse() method cannot parse the result of a toString() method. This is because the toString() method does not enclose its result in quotation marks, nor does it escape any special characters. For example, the following is false:
   x = string.parse(x.toString())
If a string contains no special characters (for example, " or \) then the following equality does hold true:
   x = string.parse("\""+x.toString()+"\"")
*replaceAll(string, string) — takes two string arguments, string1 and string2. For the string the method is called on, the replaceAll() method makes a copy of that string, replaces instances of string1 with instances of string2 and returns the revised string. For example:
string x := "XYZ";
print x.replaceAll("Y","y");
print x;
This prints the following:
XyZ
XYZ
Notice that x itself is unchanged. If string1 is an empty string then the monitor instance dies. If instances of string1 overlap then the method replaces only the first instance in the overlapping instances.
*rtrim() — returns a string where all whitespace characters at the end have been removed. Whitespace characters are space, new line and tab characters.
*split(string) — returns a sequence of strings that represent the string argument split at occurrences of the string that the method is called on. The returned sequence always contains at least one string. The split() method is useful for separating a a string that contains newline characters into individual lines or for dividing comma-separated values in a single string into multiple strings. For example:
Method Call
Returned Sequence
",".split("x,y,z")
["x","y","z"]
",".split("")
[""]
",".split(",x,,y")
["","x","","y"]
*substring(integer, integer) — returns the substring indicated by the integer parameters. The parameters indicate the position of the first and last characters of the substring, the first being inclusive, while the second is exclusive. If a parameter is a positive value it is taken to be the position of a character going from left to right counting upwards from 0. If a parameter is a negative value it is taken to be the position of a character going from right to left counting downwards from -1. Therefore if
string s;   
s := "goodbye";
then
s.substring(0, 0) is ""   
s.substring(0, 2) is "go"
s.substring(2, 4) is "od"
s.substring(0, 7) is "goodbye"
s.substring(0, -1) is "goodby"
s.substring(-4, -1) is "dby"
s.substring(-7, -1) is "goodby"
s.substring(-7, 7) is "goodbye"
*toBoolean() — returns true if the string is "true" and false in all other cases. This method is case sensitive.
*toDecimal() — returns a decimal representation of the string, if the string starts with one or more numeric characters. The numeric characters can optionally have amongst them a decimal point or mantissa symbol. Returns 0.0 if there are no such characters.
*toFloat() — returns a float representation of the string, if the string starts with one or more numeric characters. The numeric characters can optionally have amongst them a decimal point or mantissa symbol. Returns 0.0 if there are no such characters.
*toInteger() — returns an integer representation of the string, if the string starts with one or more numeric characters. Returns 0 if there are no such characters.
*tokenize(string) — the format for invoking this method is delimiters.tokenize(text). The tokenize() method categorizes each character in the text argument as either part of a delimiter (the character appears in the delimiters string) or part of a token (any other character) and then divides the text argument into tokens separated by delimiters. The method returns the tokens as a sequence of strings. If you try to tokenize an empty string the returned sequence is empty. The tokenize() method is useful for extracting words from whitespace. For example:
string s := "     This   is\na test!  See? ")
print " ".tokenize(s).toString();
print " .,:;!?\n\t".tokenize(s).toString();
This prints the following:
["This","is\na","test!","See?"]
["This","is","a","test","See"]
*toLower() — returns an all-lowercase string representation of the string.
*toUpper() — returns an all-uppercase string representation of the string.
Copyright © 2013 Software AG, Darmstadt, Germany and/or Software AG USA Inc., Reston, VA, USA, and/or Terracotta Inc., San Francisco, CA, USA, and/or Software AG (Canada) Inc., Cambridge, Ontario, Canada, and/or, Software AG (UK) Ltd., Derby, United Kingdom, and/or Software A.G. (Israel) Ltd., Or-Yehuda, Israel and/or their licensors.