<Default Package>
Type string
UTF-8-encoded string type.
Enclose string literals in double quotes. Values of the string type are sequences of non-null Unicode characters encoded in UTF-8 format. Note that UTF-8 is a variable-width encoding and a character can occupy from 1 to 4 bytes of storage. The characters in the 7-bit ASCII character set are a subset of UTF-8 and occupy a single byte each.
Although string types are discussed as though they are primitive types, they are actually reference types. However, EPL's string objects are immutable. For example, a statement such as s:=s+" suffix"; creates a new string object and changes the variable s to refer to that new string object. Any other references to the old value continue to point to the old value.
Operations that can return a different string value, such as concatenation, case folding, or trimming white space, always create new strings rather than modifying the existing value in place. The previous value's storage is recovered later by the EPL runtime garbage collector.
The length of a string is limited by the memory available at runtime, which can be multiple gigabytes. In practice, you are unlikely to exceed the limit in a single string.
Special characters are encoded with a backslash (\) as follows:
\" | double quote |
\\ | backslash |
\n | newline |
\t | tab |
The following operators are supported with strings:
< | Less-than string comparison |
> | Greater-than string comparison |
<= | Less-than or equal string comparison |
>= | Greater-than or equal string comparison |
= | Equal string comparison |
!= | Not-equal string comparison |
+ | String concatenation |
When you compare two strings for equality, the result is true if the strings are the same length and each character in one string is identical to the corresponding character at the same position in the other string.
When you compare two strings for less than or greater than, the characters in the strings are compared pairwise according to the numerical values of their Unicode code points. The comparison is case-sensitive, so capital letters are not equal to their lowercase equivalents. Characters earlier in the character set sort before characters later in the character set. To order two unequal strings, the earliest difference is considered. For example, "abcXdef" sorts earlier than "abcYdef", "abc" sorts earlier than "abcXYZ"; the empty string sorts earliest of all.
The default value of a string is the empty string ("").
Strings can be parsed, routed and compared. They are not cyclic.
canParse
boolean static canParse(string s)
Check if the string argument can be parsed as a string.
-
Parameters:
-
s - The string to test for parseability.
-
Returns:
- True if the string could be parsed as a string, false otherwise.
-
See Also:
-
string#parse() - See the parse method for what is parseable.
clone
string clone()
Get a new reference to this string.
Because strings are immutable, clone() returns another reference to the same string and does not create a copy.
-
Returns:
- A reference to the same string.
find
integer find(string needle)
Locate a string within this string.
-
Parameters:
-
needle - The string to search for.
-
Returns:
- The index (starting from 0) of the first character of needle within this string. Returns -1 if needle is not found.
-
See Also:
-
string#search() - See search() if you want to search using a regular expression rather than a string literal.
findFrom
integer findFrom(string needle, integer fromIndex)
Locate a string within this string.
findFrom behaves like find, but it begins searching from the specified index.
-
Parameters:
-
needle - The string to search for.
-
fromIndex - The index in the string to start searching from.
-
Returns:
- The index of the first character of needle within this string. Returns -1 if needle is not found after fromIndex.
hash
integer hash()
Get an integer hash representation of the underlying object.
This function will return an integer evenly distributed over the whole range suitable for partitioning or indexing of that structure. Multiple different object can resolve to the same hash value.
-
Returns:
- An integer respresentation of the underlying object.
intern
string intern()
Mark the string it is called on as interned.
Subsequent incoming events that contain a string that is identical to an interned string use the same string object.
The benefit of using the intern() method is that it reduces the amount of memory used and the amount of work the garbage collector must do. A disadvantage is that you cannot free memory used for an interned string.
If there are a limited number of strings that will be used many times, then calling intern() on these strings speeds the handling of events that use them. You might want to call intern() on the names of products or stock symbols, which are all used frequently. For example, invoking "APMA".intern() might make sense if you are expecting a large number of incoming events of the form Tick("APMA", ...). You would not want to call intern() on order IDs because there are so many and each one is likely to be unique.
Calling intern() on a string is a global operation. That is, all contexts can then use the same string object. Any strings already in use by the correlator are not affected, even if they match the string intern() is called on.
If you use correlator persistence, the set of strings that have been interned is stored in the recovery datastore, so there is no need to call intern() again after a restart.
The interned version of the string is returned. The original will be garbage collected when all references to it have been dropped.
-
Returns:
- The interned version of the string.
join
string join(sequence<string> args)
Concatenate the sequence argument using this string as a separator.
For example: string s := ", ".join(seq);
-
Parameters:
-
args - The sequence to join.
-
Returns:
- A string with args joined together separated by this string.
length
integer length()
Get the length of the string.
-
Returns:
- The length of the string.
ltrim
string ltrim()
Strip whitespace from the start of the string.
Whitespace characters are space, newline and tabs.
-
Returns:
- A copy of the string where all the whitespace characters at the start have been removed.
matches
boolean matches(string regex)
Test whether the string matches the specified regular expression.
EPL uses IBM's International Components for Unicode (ICU) to implement regular expressions (see http://site.icu-project.org/). Thus, you can use all of the regular expressions that are described in the ICU User Guide with the above methods; see http://userguide.icu-project.org/strings/regexp for detailed information. Other than the ICU regular expression syntax, Apama provides the additional option (!g) for the replace method, which allows you to replace all matches rather than just the first one. This option must be the first part of the regular expression.
-
Parameters:
-
regex - The regular expression to compare with.
-
Returns:
- True if the entire string matches the given regular expression, false otherwise.
-
Throws:
- IllegalArgumentException if the regular expression is invalid.
-
See Also:
-
string#search() - See search to get access to the matching parts of the string.
parse
string static parse(string s)
Parse the argument string as a string.
The parse method takes a string in the form used for event files. String arguments must be enclosed in double quotes. All escape characters will be converted to the natural character.
Note that parse cannot parse the output of toString() on a string, since that does not enclose the string in double quotes, nor does it add escaping to characters.
-
Parameters:
-
s - The string to parse. Must be enclosed in double quotes.
-
Returns:
- The parsed string.
-
Throws:
- ParseException if s cannot be parsed as a string.
replace
string replace(string regex, string replacement)
Search and replace regular expressions within this string.
By default, only the first matching substring is replaced. If the regular expression starts with (!g), then all matching substrings are replaced instead.
The replacement string can contain references to parts of the matched expression. Matching groups are referred to with $1 for the first match etc.
-
Parameters:
-
regex - The regular expression to search for.
-
replacement - The replacement string, which may contain placeholders such as $1, $2, etc.
-
Returns:
- A copy of the string with string(s) matching regex replaced by replacement.
-
Throws:
- IllegalArgumentException if the regular expression is invalid.
-
See Also:
-
string#matches() - See the documentation on matches for regular expression syntax.
-
string#replaceAll() - If you wish to replace string literals rather than regular expressions.
replaceAll
string replaceAll(string needle, string replace)
Search and replace string literals within this string.
Searches for each instance of needle in the string and creates a copy of the string with each one replaced with the replace string.
-
Parameters:
-
needle - The string to search for. This is a string literal not a regular expression.
-
replace - The string to replace needle with.
-
Returns:
- A copy of the string with each instance of needle replaced with replace.
-
See Also:
-
string#replace() - If you wish to replace regular expressions rather than string literals.
rtrim
string rtrim()
Strip whitespace from the end of the string.
Whitespace characters are space, newline and tabs.
-
Returns:
- A copy of the string where all the whitespace characters at the end have been removed.
search
sequence<string> search(string regex)
Find all the substrings matching a specified regular expression.
-
Parameters:
-
regex - The regular expression to search for.
-
Returns:
-
A sequence of each (non-overlapping) match for the regular expression within this string.
Note that this method returns matches for the entire regex (not for any regex groups that may be present), so the length of the sequence is equal to the number of substrings that matched the regex, and is not affected by any regex groups.
-
Throws:
- IllegalArgumentException if the regular expression is invalid.
-
See Also:
-
string#matches() - See the documentation on matches for regular expression syntax.
split
sequence<string> split(string input)
Split a string using a delimiter.
For example: sequence items := ",".split("a,b,c");
Returns a sequence of the strings that result from splitting the input string on every occurrence of the delimiter string that the method is called on. The size of the returned sequence is always one more than the total number of occurrences of the delimiter string. Consecutive delimiters in the input string result in empty strings in the returned sequence. The split() method is useful for separating a string that contains newline characters into individual lines or for dividing comma-separated values in a single string into multiple strings.
-
Parameters:
-
input - The string which should be split.
-
Returns:
- A sequence containing the input string split using this string as a delimiter.
-
See Also:
-
string#join() - This method performs the inverse of join.
-
string#tokenize() - tokenize has slightly different behavior.
substring
string substring(integer start, integer end)
Extract part of this string.
The parameters indicate the position of the first and last characters of the substring, the first being inclusive, while the second is exclusive. If a parameter is a positive value, it is taken to be the position of a character going from left to right, counting upwards from 0. If a parameter is a negative value, it is taken to be the position of a character going from right to left, counting downwards from -1.
Examples:
"Apama".substring( 1, 4 ) : returns "pam"
"Apama".substring( 2, 6 ) : throws IndexOutOfBoundsException
"Apama".substring( -6, -3 ) : throws IndexOutOfBoundsException
-
Parameters:
-
start - The first character, inclusive.
-
end - The last character, exclusive.
-
Returns:
- A new string containing the specified range from this string.
-
Throws:
- IndexOutOfBoundsException if the magnitude of start or end is larger than the length of the original string.
toBoolean
boolean toBoolean()
Convert the string to a boolean.
This method is case-sensitive.
-
Returns:
- True if the string is "true", false otherwise.
-
See Also:
-
boolean#parse() - Parse as a boolean for case-insensitivity.
toDecimal
decimal toDecimal()
Convert the string to a decimal.
Returns a decimal representation of the string if the string starts with one or more numeric characters. The numeric characters can optionally have among them a decimal point or mantissa symbol. Returns 0.0 if there are no such characters.
-
Returns:
- The decimal representation of the string.
toFloat
float toFloat()
Convert the string to a float.
Returns a float representation of the string if the string starts with one or more numeric characters. The numeric characters can optionally have among them a decimal point or mantissa symbol. Returns 0.0 if there are no such characters.
-
Returns:
- The float representation of the string.
toInteger
integer toInteger()
Convert the string to an integer.
The string this method is invoked on should be of the form:
[PREFIX][SIGN][BASE]INTEGER[SUFFIX]
Where:
PREFIX is zero or more whitespace characters (space, tab).
SIGN is zero or one sign characters (+ or -).
BASE is either empty (for base 10), 0b/0B for base 2 (binary) or 0x/0X for base 16 (hex).
INTEGER is a a sequence of one or more digits according to the base (i.e. 0 or 1 for base 2, 0-9 for base 10 and 0-F for base 16).
SUFFIX is zero or more other characters (whitespace, letters, symbols, digits outside the allowed set).
-
Returns:
- The integer representation of the string, or 0 if the string does not conform to the above conditions.
-
See Also:
-
integer#parse() - integer.parse for a more strict method to parse integers.
tokenize
sequence<string> tokenize(string input)
Split a string into a sequence using an arbitrary set of delimiters.
Returns a sequence of all the non-empty strings (tokens) that result from splitting the input string on occurrences of any character from the string that the method is called on. The returned sequence never contains any empty strings, and will have no elements if the input string is empty or contains only delimiters. The tokenize() method is useful for extracting words from whitespace.
-
Parameters:
-
input - The string to tokenize.
-
Returns:
- A sequence of strings containing the tokenized values.
toLower
string toLower()
Convert the string to lowercase.
-
Returns:
- A copy of the string with all the characters converted to lowercase.
toString
string toString()
Return a reference to this string.
This method does not escape or enclose the string in quotes. The output is unsuitable for passing to string.parse.
-
Returns:
- The string, verbatim.
toUpper
string toUpper()
Convert the string to uppercase.
-
Returns:
- A copy of the string with all the characters converted to uppercase.