Defining batched windows
The default behavior is that the contents of a window change upon the arrival of each item. The every keyword can be used to control when the contents of the window change: it causes the items to be added to the window in batches. Time-based windows can be controlled to update only every p seconds and size-based windows can be controlled to update only after every m events.
The syntax for a batched window is one of the following:
within windowDurationExpr every batchPeriodExpr
| retain windowSizeExpr every batchSizeExpr
| within windowDurationExpr every batchPeriodExpr retain windowSizeExpr
Here, windowDurationExpr and windowSizeExpr retain their meaning from the previous sections. The batchPeriodExpr is an expression that returns the time, p, between updates as a float value. The batchSizeExpr is an expression that returns the number of events between updates, m, as an integer value.
When you specify within followed by every followed by retain, the every keyword always indicates a number of seconds. That is, the window updates its content every p seconds.
If no items have arrived or expired since the previous window update, the window content is unchanged and consequently the query does not execute. The correlator executes the query only when the window content changes.
Here is an example of a stream query that defines a batched, time-based window. The correlator creates the query at t=0.0.
from v in values within 1.5 every 1.0 select sum(v)
The following diagram illustrates how this works in practice.
The query before the diagram corresponds to the aggregate projection. The three queries shown here are:
Simple istream Projection | from v in values within 1.5 every 1.0 select v |
Simple rstream Projection | from v in values within 1.5 every 1.0 select rstream v |
Aggregate Projection | from v in values within 1.5 every 1.0 select sum(v) |
The important things to note about the behavior of these queries is that the window content changes only every second. Nothing appears on any insert or remove stream between those points. This means that the items 10.0, 20.0 and 40.0 are not in the window at the moment they arrive, but are kept until the next multiple of 1.0 second. Item lifetimes are calculated from the item arrival time, not the point at which the batching allows the item into the window. Consequently, the lifetime of the items in the window is also affected by the batching. In these examples, you can see that the items that were delayed entering the window are only in the window for one second because they were already 0.5 seconds old at the point they entered the window. For contrast, the item with the value 30.0 remains in the window for 2.0 seconds because after 1.5 seconds the batching has not occurred, and so the window cannot change until the next multiple of 1.0 second.
In the examples given here the batch period is smaller than the duration of the window. If the batch period is larger than the duration of the window then some items can never enter the window, if they would have already expired by the time the next batch arrives in the window.
Batched size-based windows behave similarly to batched time-based windows, except that the batch criteria is waiting for a number of items to arrive. In that case, items always arrive in the window as a multiple of the batch size.
Batched windows produce multiple items at one time. A single group of items flowing between queries together is called a lot. A lot can contain one item or several items. A batched window is one way of producing a lot that contains several items.