Group and Analyze Rows with Row Detail
The other RAQL query clause that you can use to perform analysis is the Over clause. Like Group By, the Over clause segments the rows of the dataset into different groups, known as partitions. The primary differences between Group By and Over are:
All rows of the dataset that meet the conditions of the Where clause, if any, are returned from a query with an Over clause, rather than just one row per group.
You can also define
windows within a partition. A window consists of the current row within a partition and the number of preceding and following rows you define.
Analytic functions are applied to either the full partition or to each window within the partition. The results of the analysis is added as a new column to either each row in the partition or to the current row for each window.
The results of analytic functions can also be running calculations, such as running totals, including the current row and all preceding rows.
Let’s look at a simple partition example. The following mashup loads the stock dataset from an In-Memory Store (stored earlier in Getting Started).
The Select clause selects each column that will be used in the calculation or in the definition of the partition (symbol, open and close).
Lastly, Select uses the built-in corr analytic function to determine if there is a linear correlation between opening and closing prices for each symbol. The Over clause defines the partitions that this analytic function is applied to.
The Where clause filters the rows that are included in each partition to specific stock symbols based on a name pattern. With this dataset, this limits the results to the symbols DISH and NFLX.
The results of this query, shown below, include each row for the selected symbols and a new column, coefficient, that includes the result of this analysis function:
RAQL has a set of built-in analytic functions that you can use in Over clauses, or you can write and add your own analytic functions. See
Built-In
RAQL
Functions for more information.
The complete EMML for this mashup is:
<mashup name='stockPriceCorrelation'
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:schemaLocation='http://www.openmashup.org/schemas/v1.0/EMML/../schemas/EMMLPrestoSpec.xsd'
xmlns='http://www.openmashup.org/schemas/v1.0/EMML'
xmlns:macro='http://www.openmashup.org/schemas/v1.0/EMMLMacro'
xmlns:presto='http://www.jackbe.com/v1.0/EMMLPrestoExtensions'>
<output name='result' type='document' />
<variable name='stocks' type='document' stream='true' />
<loadfrom cache='stocks2011' variable='stocks'/>
<raql outputvariable='result'>
select symbol, open, close,
corr(open,close) over (partition by symbol) as coefficient
from stocks
where symbol like 'D%' or symbol like 'N%'
</raql>
</mashup>