MashZone NextGen Analytics : RAQL Queries : Multi-Level Group Calculations
Multi-Level Group Calculations
The example of a Group By clause in Getting Started, Group and Analyze Rows, used a composite group containing two group columns to segment stock data into sets by symbol and then year within symbol.
Once groups are defined, you determine what analysis occurs and is returned using analytic functions in the Select clause. Each group returns one row of data, so the functions you choose must be aggregate analytic functions that return a single value for all rows in the group. See Built-In Analytic Functions: Aggregate and Window for the list of built-in Business Analytics analytic functions that you may use with Group By clauses.
For more examples, see:
*Single Level Group
*Three Group Levels
*Groups Using an Aggregate Analytic Function
* Aggregate on Multiple Groups with ROLLUP, CUBE and GROUPING SETS
* Special Grouping Sets: ROLLUP and CUBE
*Using composite group criteria
* Null values in Grouping Set aggregations
*Combining and Merging Grouping Sets
Single Level Group
You can use a single column group such as this example which counts the number of legislators in each house of the US Congress:
SELECT chamber, COUNT(chamber) FROM congress GROUP BY chamber
Chamber
COUNT_chamber
senate
100
house
439
Three Group Levels
With each level of group, you add columns to the expression. This example groups stock data by symbol, year and quarter within year to derive the average low price for each quarter:
SELECT symbol,extract_year(datetime), quarter(datetime), round(avg(low)) AS qtrLow
FROM stocks
GROUP BY symbol,extract_year(datetime), quarter(datetime)
ORDER BY symbol,extract_year(datetime) DESC, quarter(datetime)
Symbol
Extract_year_datetime
Quarter_datetime
QtrLow
AAPL
2011
1
337.0
AAPL
2011
2
335.0
AAPL
2010
1
206.0
AAPL
2010
2
242.0
AAPL
2010
3
253.0
AAPL
2010
4
305.0
AAPL
2009
1
89.0
AAPL
2009
2
126.0
AAPL
2009
3
161.0
AAPL
2009
4
192.0
AAPL
2008
1
130.0
AAPL
2008
2
168.0
...
...
...
...
The columns that appear in the Select clause must either be used for group in the Group By clause or be used in aggregate calculations. It is also quite common to sort by the columns used in grouping.
Groups Using an Aggregate Analytic Function
This example calculates the standard deviation, using the stddev() aggregate analytic function, for the high prices of each stock and year:
SELECT symbol, extract_year(datetime) AS yr, stddev(high) AS highStdDev
FROM stocks
GROUP BY symbol, extract_year(datetime)
ORDER BY symbol, yr
Symbol
Yr
HighStdDev
AAPL
1984
1.6057616847575154
AAPL
1985
4.628989380498321
AAPL
1986
5.626270136472927
AAPL
1987
14.69298548409742
AAPL
1988
2.4555231784570233
AAPL
1989
4.2820135239165875
AAPL
1990
4.529251236509766
AAPL
1991
7.4271879581244775
AAPL
1992
7.62725912432657
AAPL
1993
13.459208620188244
AAPL
1994
4.371574862654053
AAPL
1995
4.069107853754948
AAPL
1996
3.336661126725468
...
...
...
The columns that appear in the Select clause must either be used for group in the Group By clause or be used in aggregate calculations. It is also quite common to sort by the columns used in grouping.
Aggregate on Multiple Groups with ROLLUP, CUBE and GROUPING SETS
The examples above partitioned the data and calculated an aggregate value for each partition based only on a single group criterion. However, let's assume you are not only interested in the standard deviation of high prices for each stock and year, but additionally would like to see the overall standard deviation of high prices per stock, and the total standard deviation of high prices over all stocks. You could run three separate GROUP BY queries and combine their results as follows:
SELECT symbol, extract_year(datetime) AS yr, stddev(high) AS highStdDev FROM stocks
GROUP BY symbol, extract_year datetime)
UNION ALL
SELECT symbol, CAST(NULL AS DATE) AS yr, stddev(high) AS highStdDev FROM stocks
GROUP BY symbol
UNION ALL
SELECT CAST(NULL AS STRING) AS symbol, CAST(NULL AS DATE) AS yr, stddev(high) AS
highStdDev FROM stocks ORDER BY symbol NULLS FIRST, yr DESC NULLS FIRST
-- we specify an ordering here to see the aggregated symbol and year columns
in the example output --
MashZone NextGenfully supports the SQL syntax for complex GROUPING SETS, which allows to specify a single set of group expressions in the group-by clause, and to specify multiple sets that the data is partitioned and aggregated upon. Using this syntax, the above query could easily be rewritten to:
SELECT symbol, extract_year(datetime) AS yr, stddev(high) AS highStdDev FROM stocks
GROUP BY GROUPING SETS ( (symbol, extract_year(datetime)), symbol, () )
ORDER BY symbol NULLS FIRST, yr DESC NULLS FIRST
-- we specify an ordering here to see the aggregated symbol and year columns
in the example output --
Note:  
Please note, that this query is not only much shorter, but may also be evaluated more efficiently by the query engine.
The above GROUPING SETS clause defines three sets of group expressions (symbol, extract_year(datetime)), symbol, and the empty grouping set (), so that in the result of the query we will find a row for every unique pair of (symbol, extract_year(datetime)) values, concatenated by rows for each unique value of symbol, and a single row representing the empty group. For each such group row, the values of the group expressions that are not considered in the current grouping are filled with NULL, e. g. the yr column is NULL for the second grouping by symbol (see also section Null values in Grouping Set aggregations).
Symbol
Yr
HighStdDev
95.99022287735269
This row indicates the HighStdDev aggregated over all stocks and all years.
AAPL
66.38352453765187
This row indicates the HighStdDev for symbol AAPL aggregated over all years.
AAPL
2011
7.78515914570644
AAPL
2010
37.77321347891091
AAPL
2009
40.274807396460886
AAPL
2008
33.6190395688354
AAPL
2007
37.83959132285136
AAPL
2006
9.55659436868603
AAPL
2005
15.564417219126476
AAPL
2004
13.782231301956635
AAPL
2003
3.4495406114241156
AAPL
2002
4.4393911367617065
AAPL
2001
2.6334977228792624
AAPL
2000
43.679940128685516
...
...
...
Special Grouping Sets: ROLLUP and CUBE
It is very common, to have hierarchical dimensions on which to partition the data. For instance, year and month could be such a dimension hierarchy. If you are interested in the highest price values among all stocks for each dimension level, you could use the GROUPING SETS syntax to specify the following four sets of group expressions:
SELECT extract_year(datetime) AS y, extract_month(datetime) AS m,
max(high) AS maxHigh
FROM stocks
GROUP BY GROUPING SETS ( (extract_year(datetime), extract_month(datetime)),
(extract_year(datetime)),
() )
ORDER BY y NULLS FIRST, m NULLS FIRST
-- we specify an ordering here to see the aggregated symbol and year columns
in the example output --
However, for queries like this, where the grouping set dimensionality is reduced from right to left, Business Analytics supports the SQL keyword ROLLUP , which - given a list of group expressions - rolls-up the list by consecutively removing one expression from the end of the list and calculating the next higher level aggregate, up to the overall aggregate.
Hence, the above query is equivalent to the following query using the ROLLUP syntax instead:
SELECT extract_year(datetime) AS y, extract_month(datetime) AS m, max(high) AS maxHigh
FROM stocks
GROUP BY ROLLUP (extract_year(datetime), extract_month(datetime))
ORDER BY y NULLS FIRST, m NULLS FIRST
-- we specify an ordering here to see the aggregated symbol and year columns
in the example output --
Note:  
Please note, that the roll-up of the grouping columns is reflected in the result by a NULL value in the corresponding group column (shown as an empty column).
Y
M
MaxHigh
747.24
This row indicates the MaxHigh aggregated over all years and all months.
1984
128.5
This row indicates the HighStdDev for year 1984 aggregated over all months.
1984
9
128.5
1984
10
127.75
1984
11
127.75
1984
12
124.5
1985
158.75
This row indicates the HighStdDev for year 1985 aggregated over all months.
1985
1
137.63
1985
2
138.25
1985
3
136.38
1985
4
130.38
1985
5
133.25
1985
6
130.63
1985
7
132.75
...
...
...
...
In addition to the ROLLUP keyword, which enumerates a hierarchy of grouping sets,MashZone NextGensupports the SQL keyword CUBE, to enumerate the power set of grouping sets, that is the set of all subsets. As an example, consider the group expressions chamber, state and gender. Then the clause CUBE(party, state, gender) translates into the following grouping sets:
(party, state, gender)
(party, state)
(party, gender)
(state, gender)
(party)
(state)
(gender)
()
In comparison, these would be the grouping sets produced by ROLLUP(party, state, gender) :
(party, state, gender)
(party, state)
(party)
()
That is the query:
SELECT party, state, gender, count(*) FROM congress
WHERE chamber = 'house' AND state LIKE 'K%' -- only to limit the result size here --
GROUP BY GROUPING SETS ( (party, state, gender),
(party, state), (party, gender), (state, gender)
(party), (state), (gender),
() )
It yields the exact same results as the much shorter query:
SELECT party, state, gender, count(*) FROM congress
WHERE chamber = 'house' AND state LIKE 'K%' -- only to limit the result size here --
GROUP BY CUBE(party, state, gender)
ORDER BY party NULLS FIRST, state NULLS FIRST, gender NULLS FIRST
Party
State
Gender
COUNT_
10
Overall number of congressmen representing Kansas or Kentucky
F
1
Overall number of female congressmen representing Kansas or Kentucky
M
9
Overall number of male congressmen representing Kansas or Kentucky
KS
4
Number of congressmen representing Kansas
KS
F
1
Number of female congressmen representing Kansas
KS
M
3
Number of male congressmen representing Kansas
KS
6
Overall number of congressmen representing Kentucky
KS
M
6
Number of male congressmen representing Kentucky
D
2
Number of democratic congressmen representing Kansas or Kentucky
D
M
2
Number of male democratic congressmen representing Kansas or Kentucky
D
KS
2
Number of democratic congressmen representing Kentucky
D
KS
M
2
Number of male democratic congressmen representing Kentucky
R
8
Number of republican congressmen representing Kansas or Kentucky
R
F
1
Number of female republican congressmen representing Kansas or Kentucky
...
...
...
...
...
Using composite group criteria
With the CUBE and ROLLUP syntax it is also possible, to combine multiple columns into a group expression, that is treated as if it was a single column. For instance, the expression ROLLUP(a, (b,c), d) translates into the grouping sets (a, (b,c), d), (a, (b,c)), (a), (). I.e., c is not rolled-up from (b,c).
Note:  
It is a best practice, to use ROLLUP and CUBE only in cases, when all calculated groupings are relevant. If only a subset of the produced groupings is of interest, it is more efficient to use the corresponding GROUPING SETS expression generating only those relevant groupings.
Null values in Grouping Set aggregations
In the result of a grouping query over multiple grouping sets, NULL values in one of the group columns may have different meanings:
1. If the group column contains NULL values in its source table, all those NULL values are treated equally and grouped into a separate NULL group.
2. If the group column is aggregated, i. e., it is omitted in the grouping set for this grouping’s row, then the group column is also NULL.
Consequently, it is not possible to tell the two different meanings of NULL apart only from looking at the result row itself. Therefore, a special SQL function GROUPING is provided, the purpose of which is to tell for a single group expression, whether or not this expression has been aggregated in the result (result = 1) or not (result = 0).
SELECT symbol, GROUPING(symbol) AS grp_symbol, extract_year(datetime) AS yr,
GROUPING(extract_year(datetime)) AS grp_year,
stddev(high) AS highStdDev
FROM stocks
GROUP BY ROLLUP (symbol, extract_year(datetime))
ORDER BY symbol NULLS FIRST, yr DESC NULLS FIRST
Symbol
Grp_symbol
Yr
Grp_year
HighStdDev
1
1
95.99022287735269
NULL values here indicate that symbol and yr have been aggregated.
AAPL
0
1
66.38352453765187
NULL value in column yr here indicates that yr has been aggregated.
AAPL
0
2011
0
7.78515914570644
AAPL
0
2010
0
37.77321347891091
AAPL
0
2009
0
40.274807396460886
AAPL
0
2008
0
33.6190395688354
...
...
...
...
...
...
Additionally, this function is overloaded to also take a number of group expressions as parameters. In this case, the result will be a numerical id for each resulting grouping. In order to make this id unique among all groupings, all group expressions should be given as parameters to the function.
SELECT symbol, extract_year(datetime) AS yr,
GROUPING(symbol, extract_year(datetime)) AS grp,
stddev(high) AS highStdDev
FROM stocks
WHERE extract_year(datetime) BETWEEN 2009 AND 2011 -- only to limit the result set here --
GROUP BY ROLLUP (symbol, extract_year(datetime))
ORDER BY symbol NULLS FIRST, yr DESC NULLS FIRST
Symbol
Yr
Grp
HighStdDev
3
151.89615745136808
There is only one row in this grouping representing the overall aggregate.
AAPL
1
82.34565115212739
This row belongs to the grouping 1 which relates to the grouping set (symbol).
AAPL
2011
0
7.785159145706435
This row belongs to the grouping 0 which relates to the grouping set (symbol, extract_year(datetime)).
AAPL
2010
0
37.773213478910954
AAPL
2009
0
40.274807396460936
AAPL
1
41.95096980230709
This row also belongs to the grouping 1 which relates to the grouping set (symbol).
...
...
...
...
...
The GROUPING(col1, … colN) result is internally computed by combining the GROUPING results for each single column into a bit set which is finally interpreted as a LONG value. Example: Assume the following GROUPING values for single group columns a, b and c:
GROUPING(a) = 1, GROUPING(b) = 0 and GROUPING(c) = 1
Then GROUPING(a,b,c) yields the bit set 101 which is interpreted as the LONG value 5.
To make things clear, the following query shows all the distinct groupings resulting from a group by CUBE over three columns. Note how the single GROUPING values for each column are concatenated into a bit set representing the final GROUPING value.
SELECT DISTINCT
GROUPING(symbol) AS grpSmbl, GROUPING(extract_year(datetime)) AS grpYr,
GROUPING(extract_month(datetime)) AS grpMnth,
CAST(GROUPING(symbol) AS STRING) || CAST(GROUPING(extract_year(datetime)) AS STRING) ||
CAST(GROUPING(extract_month(datetime)) AS STRING) AS bits,
GROUPING(symbol, extract_year(datetime), extract_month(datetime)) AS grp
FROM stocks
GROUP BY CUBE (symbol, extract_year(datetime), extract_month(datetime))
ORDER BY grp ASC
grpSymbol
grpYear
grpMonth
bits
grp
0
0
0
000
0
0
0
1
001
1
0
1
0
010
2
0
1
1
011
3
1
0
0
100
4
1
0
1
101
5
1
1
0
110
6
1
1
1
111
7
Note:  
Please note that each unique GROUPING id only indicates the grouping set, which produced the groups in this grouping. It does not uniquely identify a group within this grouping!
Combining and Merging Grouping Sets
The group by clause may not only take simple group expressions or grouping sets, but also any combination of them, e. g., GROUP BY a, ROLLUP(b,c), GROUPING SETS((d,e), (f)). The resulting grouping sets are then defined by the Cartesian product of the involved grouping sets, i.e. the example clause translates into the following grouping sets:
This is the result of the Cartesian product of {(a)} x {(b, c), (b), ()} x {(d, e), (f)}
(a, b, c, d, e)
(a, b, c, f)
(a, b, d, e)
(a, b, f)
(a, d, e)
(a, f)
Likewise, the GROUPING SETS clause may take nested ROLLUP , CUBE and GROUPING SET clauses, e. g. GROUP BY GROUPING SETS (a, ROLLUP(b,c), GROUPING SETS((d,e), (f)) ). In this case, the nested grouping sets are "flattened" and merged into the surrounding grouping set, i.e., the example clause translates into the following grouping sets:
This is the result of the grouping sets (a), followed by all grouping sets (b,c), (b), () resulting from the rollup clause, followed by the grouping sets (d,e) and (f).
(a)
(b, c)
(b)
()
(d,e)
(f)
Copyright © 2013-2016 Software AG, Darmstadt, Germany.

Product LogoContact Support   |   Community   |   Feedback