string

Enclose string literals in double quotes. Values of the string type are sequences of non-null Unicode characters encoded in UTF-8 format. Note that UTF-8 is a variable-width encoding and a character can occupy from 1 to 4 bytes of storage. The characters in the 7-bit ASCII character set are a subset of UTF-8 and occupy a single byte each.

Although string types are discussed as though they are primitive types, they are actually reference types. However, EPL's string objects are immutable. For example, a statement such as s:=s+" suffix"; creates a new string object and changes the variable s to refer to that new string object. Any other references to the old value continue to point to the old value.

Operations that can return a different string value, such as concatenation, case folding, or trimming white space, always create new strings rather than modifying the existing value in place. The previous value's storage is recovered later by the EPL runtime garbage collector.

The length of a string is limited by the memory available at runtime, which can be multiple gigabytes. In practice, you are unlikely to exceed the limit in a single string.

To enter this...	Insert this...
" (double quote)	\"
\ (backslash)	\\
newline character	\n
tab character	\t

Operator	Description	Result Type
<	Less-than string comparison	boolean
<=	Less-than or equal string comparison	boolean
=	Equal string comparison	boolean
!=	Not equal string comparison	boolean
>=	Greater-than or equal string comparison	boolean
>	Greater-than string comparison	boolean
+	String concatenation	string

When you compare two strings for equality, the result is true if the strings are the same length and each character in one string is identical to the corresponding character at the same position in the other string.

When you compare two strings for less than or greater than, the characters in the strings are compared pairwise according to the numerical values of their Unicode code points. The comparison is case-sensitive so capital letters are not equal to their lowercase equivalents. Characters earlier in the character set sort before characters later in the character set. To order two unequal strings, the earliest difference is considered. For example, "abcXdef" sorts earlier than "abcYdef", "abc" sorts earlier than "abcXYZ"; the empty string sorts earliest of all.

clone(string) — Returns a reference to the specified string. When called on a string, the clone() method does not make a copy of the string since strings are immutable.

find(substring) — Returns an integer indicating the index position of the substring passed as parameter to the method. If the string parameter does not exist as a substring within the string, the method returns -1. Note that in EPL string indices (the position of a character within the string) count upwards from 0.

findFrom(substring, fromIndex) — Behaves like the find() method, but starts searching for the specified substring with the character indicated by fromIndex. For example, if the value of fromIndex is 7, the search begins with the character that has an index of 7.

intern() — Marks the string it is called on as interned. Subsequent incoming events that contain a string that is identical to an interned string use the same string object. The intern() method takes no arguments and returns the interned version of the string it is called on. For example:

The benefit of using the intern() method is that it reduces the amount of memory used and the amount of work the garbage collector must do. A disadvantage is that you cannot free memory used for an interned string.

If there are a limited number of strings that will be used many times then calling intern() on these strings speeds the handling of events that use them. You might want to call intern() on the names of products or stock symbols, which are all used frequently. For example, invoking "APMA".intern() might make sense if you are expecting a large number of incoming events of the form Tick("APMA", ...). You would not want to call intern() on order IDs, because there are so many and each one is likely to be unique.

Calling intern() on a string is a global operation. That is, all contexts can then use the same string object. Any strings already in use by the correlator are not affected, even if they match the string intern() is called on.

If you use correlator persistence, details of which strings have been interned are not stored in the recovery datastore. If the correlator shuts down and restarts, you must call intern() again on the pertinent strings.

join(sequence<string> s) — Concatenates the strings in s using the string it is called on as a separator. The single parameter must be a sequence type that contains strings. You cannot specify a variable number of string parameters. For example:

ltrim() — Returns a string where all whitespace characters at the beginning have been removed. White space characters are space, new line and tab characters.

parse(string) — Returns the string value represented by the string argument without enclosing that value in quotation marks. You can call this method on the string type or on an instance of a string type. The more typical use is to call parse() directly on the string type.

Use a backslash to escape each quotation mark or backslash in your string, including quotation marks that enclose your string. For example, to parse "Hello World", specify it as "\"Hello World\"". In other words, if you are writing literal strings in EPL, you must precede all backslashes and quotation marks with a backslash. For example:

You can specify the parse() method after an expression or type name. If the correlator is unable to parse the string, it is a runtime error and the monitor instance that the EPL is running in terminates. For example, the following is an error and causes the correlator to terminate:

The parse() method cannot parse the result of a toString() method. This is because the toString() method does not enclose its result in quotation marks, nor does it escape any special characters. For example, the following is false:

replaceAll(string1, string2) — Makes a copy of the string, replaces instances of string1 with instances of string2 and returns the revised string. For example:

Notice that x itself is unchanged. If string1 is an empty string then the monitor instance dies. If instances of string1 overlap then the method replaces only the first instance in the overlapping instances.

split(string input) — Returns a sequence of the strings that result from splitting the input string on every occurrence of the delimiter string that the method is called on. The size of the returned sequence is always one more than the total number of occurrences of the delimiter string. Consecutive delimiters in the input string result in empty strings in the returned sequence. The split() method is useful for separating a string that contains newline characters into individual lines or for dividing comma-separated values in a single string into multiple strings. For example:

Method Call	Returned Sequence
",".split("x,y,z")	["x","y","z"]
",".split("")	[""]
",".split(",x,,y")	["","x","","y"]
"\r\n".split("line1\r\nline2\r\n\r\n")	["line1","line2", "", ""]

substring(integer, integer) — Returns the substring indicated by the integer parameters. The parameters indicate the position of the first and last characters of the substring, the first being inclusive, while the second is exclusive. If a parameter is a positive value it is taken to be the position of a character going from left to right counting upwards from 0. If a parameter is a negative value it is taken to be the position of a character going from right to left counting downwards from -1. Therefore if

s.substring(0, 0) is ""
s.substring(0, 2) is "go"
s.substring(2, 4) is "od"
s.substring(0, 7) is "goodbye"
s.substring(0, -1) is "goodby"
s.substring(-4, -1) is "dby"
s.substring(-7, -1) is "goodby"
s.substring(-7, 7) is "goodbye"

tokenize(string input) — Returns a sequence of all the non-empty strings (tokens) that result from splitting the input string on occurrences of any character from the string that the method is called on. The returned sequence never contains any empty strings, and will have no elements if the input string is empty or contains only delimiters. The tokenize() method is useful for extracting words from whitespace. For example:

toDecimal() — Returns a decimal representation of the string, if the string starts with one or more numeric characters. The numeric characters can optionally have among them a decimal point or mantissa symbol. Returns 0.0 if there are no such characters.

toFloat() — Returns a float representation of the string, if the string starts with one or more numeric characters. The numeric characters can optionally have among them a decimal point or mantissa symbol. Returns 0.0 if there are no such characters.

toInteger() — Returns an integer representation of the string, if the string starts with one or more numeric characters. Returns 0 if there are no such characters.

You can use the following methods to search and replace strings using regular expressions. You call them on values of string type. regex stands for the regular expression pattern against which the string is to be matched.

Method Call	Returned Boolean
"bit".matches("[a-f]it")	true
"sit".matches("[^a-f]it")	true
"Muenchen".matches("M(ü\|ue)nchen")	true
"αθηνα".matches("^[α-ω]+$")	true
"$12.99".matches("\\$\\d+\\.\\d{2}")	true
"3,45€".matches("\\d+[\\.\\,]\\d{2}\\s?€")	true
"xyzqxyzqqyz".matches("(x(yz))q\\1qq\\2")	true

search(string regex) — Returns a sequence<string> containing all non-overlapping matches that have been found for the specified regular expression. For example:

Method Call	Returned Sequence
"The dog plays in the big bad wolf's top yard".search("\\b\\w{3}\\b")	["The", "dog", "the", "big", "bad", "top"]
"price=[9.01, 8.73, 8.37, 8.88]".search("\\d+\\.\\d{2}")	[9.01, 8.73, 8.37, 8.88]
"{data: [-1.60e-19C, 9.1094E-31kg, -1.0011597μB]}".search("[-+]?[1-9]\\.?[0-9]*([eE][-+]?[0-9]+)?kg")	[9.1094E-31kg]

replace(string regex, string replacement) — Returns a new string in which the first match that has been found for the specified regular expression has been replaced with the specified replacement string. To replace all matches globally, you have to use (!g) before the regular expression. For example, to globally replace all A characters and to ignore case at the same time, you specify the following as the regex string:

Method Call	Returned String
"string your string okay string".replace("string", "STR") )	"STR your string okay string"
"StriNG your string okay string".replace("string", "STR") )	"StriNG your STR okay string"
"StriNG your string okay string".replace("(?i)string", "STR") )	"STR your string okay string"
"StriNG your string okay string".replace("(!g)string", "STR") )	"StriNG your STR okay STR"
"StriNG your string okay string".replace("(!g)(?i)string", "STR") )	"STR your STR okay STR"
"convert text message".replace("(!g)a\|e\|i\|o\|u", "")	"cnvrt txt mssg"
"Lässt grössenordnungsmässig grösstmöglich messgerät".replace("(!g)ss", "ß").replace()	"Läßt größenordnungsmäßig größtmöglich meßgerät"
"capture yz and format".replace("(yz)", "### $1 ###")	capture ### yz ### and format

EPL uses IBM's International Components for Unicode (ICU) to implement regular expressions (see http://site.icu-project.org/). Thus, you can use all of the regular expressions that are described in the ICU User Guide with the above methods; see http://userguide.icu-project.org/strings/regexp for detailed information. Other than the ICU regular expression syntax, Apama provides the additional option (!g) for the replace method, which allows you to replace all matches rather than just the first one. This option must be the first part of the regular expression.