Defining content-dependent windows
The contents of the window can also depend on the content of individual items in the stream. Currently the only content-dependent window operator is the with unique clause, which limits the window to containing only the most recent item for each key value. The with unique clause can be added to a within or a retain window by following it with:
with unique keyExpr
The
keyExpr follows the same rules as a partition key expression. That is, it is an expression that should contain at least one reference to the input item and must return a comparable type. See
Comparable types in the "Types" section of the
Apama EPL Reference.) Some examples are in the following table.
If you add a with unique clause, if there is more than one item in the window that has the same value for the key identified by keyExpr, only the most recently received item is considered to be in the window. It is important to note that the with unique clause processing happens after the rest of the window processing. Consider the following query:
from p in pairs retain 3 with unique p.letter select sum(p.number)
If the most recent two events have the same letter, there will be only two events over which the sum is calculated. This is illustrated in the following diagram:
The query before the diagram corresponds to the aggregate projection. The three queries shown here are:
Simple istream Projection | from p in pairs retain 3 with unique p.letter select p |
Simple rstream Projection | from p in pairs retain 3 with unique p.letter select rstream p |
Aggregate Projection | from p in pairs retain 3 with unique p.letter select sum(p.number) |
As you can see, when the last three items received all have a unique letter, the query behaves like a retain 3 window. When the last three items received do not all have a unique letter, the duplicate that arrived first is removed from the window. In this example, the arrival of c,5 causes the removal of c,3 even though it was one of the last 3 items received. In other words, the with unique clause can cause an item to be removed from the window and the sum earlier than it would otherwise be removed.
The difference between a partitioned window and a window that is using a with unique clause can be described as “using partition by gives you the last 3 values for each key” and “using with unique gives you one value of each key, from the last 3”. You can combine both partition by and with unique if you are using different key expressions in each clause.
Note that you cannot specify within followed by retain followed by with unique.