Filtering items before projection

In a stream query, after the window definition and any join clause, you can optionally specify a where clause to filter the items produced by the window or join. The where clause specifies an arbitrary EPL expression and can filter items based on any criteria available to EPL. The syntax of the where clause is as follows:

where booleanExpr

Replace booleanExpr with a Boolean expression. This expression is referred to as the where predicate. Only those items for which the where predicate evaluates to true are passed by the filter. For example:

from t in ticks retain 100

where t.price*t.volume>threshold

select mean(t.price)

where t.price*t.volume>threshold

select mean(t.price)

To calculate the mean price, this query operates on only the items whose value (t.price * t.volume) is greater than the specified threshold.

Performance

The filtering performed by the where clause happens after any window, with or join operations. In some cases, it is possible to rephrase the query to improve operational efficiency. For example:

from t in ticks within 60.0

where t.price*t.volume>threshold

select mean(t.price)

where t.price*t.volume>threshold

select mean(t.price)

This query maintains a window of Tick items. Now consider this revision:

from p in

(from t in ticks where t.price*t.volume>threshold select t.price)

within 60.0

select mean(p)

(from t in ticks where t.price*t.volume>threshold select t.price)

within 60.0

select mean(p)

In the first example, the within window contains all Tick events received in the last minute. In the second example, the where clause is before the window definition so the filtering happens before items enter the window. Consequently, the window contains only float items for which the where predicate is true. These types of optimization are of particular benefit in queries that include both a where clause and a join operation (equi-join or cross-join). However, care must be taken when refactoring queries, particularly when size-based windows are involved. For example, consider the two queries below:

from t in ticks retain 100 where t.price*t.volume>threshold

select mean(t.price)

from p in

(from t in ticks where t.price*t.volume>threshold select t.price)

retain 100 select mean(p)

select mean(t.price)

from p in

(from t in ticks where t.price*t.volume>threshold select t.price)

retain 100 select mean(p)

These queries are not equivalent. The first query generates the mean of a subset of the last 100 items. The where predicate evaluated to true for only the items in the subset. The second query generates the mean of the last 100 items for which the where predicate evaluated to true.