Partitioning streams
The partition by clause splits a stream into partitions, based on one or more key values. The subsequent window operators are applied to the partitioned stream; the behavior is as if the window operators had been applied separately to each partition. The result of using partition by followed by a window operator is referred to as a partitioned window. You use a query with a partitioned window to retain particular items for each partition specified by the partition by clause.
Partitioning is introduced with the following syntax:
partition by partitionByExpr[, partitionByExpr]...
The partition by clause precedes other window operators, so a complete query would be:
from a in sA partition by a.x retain 2 select sum(a.y);
Each
partitionByExpr is an expression that should contain at least one reference to the input item and must return a comparable type. See
Comparable types. Some examples are in the following table. Assume that each
partition by clause in the table starts with the following:
from a in all A() ...
Definition | Description |
partition by a.x | Partition on a single primitive type field of the input event. This is likely to be the most common case. |
partition by a | Partition on an event's field values. The events that have identical values for all fields are in the same partition. For example: from a in all A() partition by a retain 2 select a; Given the following input events: A(1,1) A(1,2) A(1,1) The first and third events are in the same partition, the second is not. In this case, the event type A must itself be a comparable type. |
partition by 1 | This is a valid partition expression, but it is not recommended. A partition expression should reference the input item in some way. |
partition by f(a) | This is a valid partition expression if f() is a function that returns an appropriate type. |
partition by a.x*globaldict[a.y] | Another valid partition expression. |
Example:
from t in all Tick()
partition by t.symbol retain 1
select rstream t;
This query creates a separate partition for each new stock symbol it finds. Each partition contains the most recent Tick event for that symbol. The query output, for each encountered symbol, is the previous Tick event for that symbol. Note that it is possible for this query to consume a large quantity of memory.