Partitions and aggregate functions
The partition by clause creates several partitions within the window. However, a stream query has other parts in addition to the window. The other parts include the projection and optional join or where elements. These other parts of the query operate on a single window that contains all items from all partitions.
Likewise, when you partition a stream any specified aggregate functions aggregate over all partitions. If you want to generate separate aggregate values for different groups of events then you must specify a
group by clause. See
Grouping output items. A common use case is to specify matching
partition by and
group by clauses.
Consider the following stream query:
from a in all A() partition by a.x retain 2 select sum(a.y);
The window definition is retain 2, and this is partitioned by a.x, where x is the first field in A. There is one retain 2 partition for each value of x. Suppose this stream query receives the following input events:
A(1,1)
A(1,2)
A(2,1)
A(2,2)
A(1,3)
A(2,3)
After these events have all arrived, one partition contains A(1,2) and A(1,3) while a second partition contains A(2,2) and A(2,3). However, the parts of the query following the window definition operate on the collection of all items in all partitions. In this example, the sum() aggregate function generates 10. It does not generate a lot that contains two values of 5. Now consider the following query:
from t in all Tick()
partition by t.symbol retain 10
group by t.symbol
select mean(t.price)
This query returns one mean value per symbol, which is the mean of the last 10 ticks for that symbol. If you do not want all means for all symbols in one lot, you might prefer to spawn monitors so that you have an instance of the following query for each symbol:
from t in all Tick(symbol=X)
retain 10
select mean(t.price)
If you do want the averages for all the symbols in the same stream, then you can specify the group key in the select clause in order to later differentiate between the output events, as in the following example:
from t in all Tick()
partition by t.symbol retain 10
group by t.symbol
select Output(t.symbol, mean(t.price))
As you can see, the partition by clause is often used in conjunction with the group by clause.
Tip:
In EPL, it is common to use spawn in a monitor to create separate monitor instances. For example, each monitor instance might process a separate stock symbol. Spawning separate monitor instances might be preferable to using a single monitor instance that specifies partition by in a stream query so that it, for example, processes all stock symbols. Spawning separate monitor instances can be more efficient because your application processes only the subset of symbols that are of interest. Also, the subset of symbols of interest can change through the day. Appropriate monitor instances and queries can be created as required.