Apama 10.7.2 | Developing Apama Applications | Developing Apama Applications in EPL | Working with Streams and Stream Queries | Defining stream queries | Generating query results | Filtering items in projections
 
Filtering items in projections
In a stream query, as part of an aggregate projection definition, you can optionally specify a having clause to filter the items produced by the projection. The having clause specifies an arbitrary EPL expression and can filter items based on any criteria available to EPL. The syntax of the having clause is as follows:
having booleanExpr
Replace booleanExpr with a Boolean expression. This expression is referred to as the having predicate. The having predicate is evaluated for each lot that arrives. When the having predicate evaluates to false, the projection does not generate output.
Unlike the where clause, the having clause
*Is part of the projection
*Filters the output of the projection rather than what comes into the projection
*Cannot refer to individual items
*Can refer only to the group key or aggregates
A having clause can only be in an aggregate projection; it cannot be in a simple projection. Each aggregate projection must contain at least one aggregate in a having clause or in the select clause. Values for aggregates, whether in having expressions or select expressions, are always calculated over the same window(s). See Grouping output items.
For example:
from t in all Temperature() within 60.0
   having count() > 10
   select mean(t.value)
This query calculates a rolling average of temperatures over the last minute. In this stream query, the having clause permits the average to be output only when it is a reliable measure. The count() aggregate function ensures that there are sufficient measurements (at least 10) in the previous 60 seconds to compensate for any noise or one-off errors in the readings.
Because the filtering occurs after the select expression has been processed, the average is still being calculated invisibly in the background, and can be output the very moment the measurement passes the reliability criterion. In the previous example, this means that after ten items have arrived, the average of all values in the last minute is output.
Filtering grouped aggregate projections
If you specify the group by clause, the having clause operates separately on each group, just as the select clause operates separately on each group. For example, the following code changes the previous code so that it outputs a reliable rolling average for each zone:
from t in all Temperature() within 60.0
group by t.zone
   having count() > 10
   select ZoneAverage(t.zone, mean(t.value))
Just as a distinct mean is output for each group (each zone), the criterion for the having expression are applied separately to each group. A rolling average for a zone is output only when count() > 10 is true for that zone.
Performance
It is possible for the stream network to avoid some calculations in a select clause when the having clause evaluates to false. Since maintaining aggregates can be expensive, this can be a useful optimization. When you know that a having clause can often evaluate to false, you can obtain better performance by specifying a having clause in the stream query as opposed to specifying a query like this:
from t in all Ticks(symbol="APMA") within 60.0 * 10.0
select MeanStddev(mean(t.value), stddev(t.value)) as avg_sd {
if(shouldOutput()) {
send avg_sd to "output";
}
}
This query computes a rolling average and standard deviation over the last ten minutes of a stock, and sends them to a dashboard or similar. Optionally, the output feed that sends out the rolling average and standard deviation can be turned off, and this is indicated by the return value of the shouldOutput() action. However, even when the output is turned off, Tick events still come in and the stream network still calculates the rolling average and standard deviation.
You can rewrite the code such that turning off the output terminates the query and turning on the output restarts the query. This option loses the state of the window and introduces a 10-minute lag before accurate output is available. A better option is to add a having clause so that turning off the output removes the performance penalty without losing state. For example:
from t in all Ticks(symbols="APMA") within 60.0 * 10.0
having shouldOutput()
select AvgStddev(mean(t.value), stddev(t.value)) as avg_sd {
send avg_sd to "output";
}
The mean() and stddev() aggregates continue to accumulate state when shouldOutput() returns false, but they do not fully calculate the rolling average and standard deviation for each incoming item.