Multi-Level Group Calculations

The example of a Group By clause in Getting Started, Group and Analyze Rows, used a composite group containing two group columns to segment stock data into sets by symbol and then year within symbol.

Once groups are defined, you determine what analysis occurs and is returned using analytic functions in the Select clause. Each group returns one row of data, so the functions you choose must be aggregate analytic functions that return a single value for all rows in the group. See Built-In Analytic Functions: Aggregate and Window for the list of built-in Business Analytics analytic functions that you may use with Group By clauses.

Chamber	COUNT_chamber
senate	100
house	439

SELECT symbol,extract_year(datetime), quarter(datetime), round(avg(low)) AS qtrLow
FROM stocks
GROUP BY symbol,extract_year(datetime), quarter(datetime)
ORDER BY symbol,extract_year(datetime) DESC, quarter(datetime)

Symbol	Extract_year_datetime	Quarter_datetime	QtrLow
AAPL	2011	1	337.0
AAPL	2011	2	335.0
AAPL	2010	1	206.0
AAPL	2010	2	242.0
AAPL	2010	3	253.0
AAPL	2010	4	305.0
AAPL	2009	1	89.0
AAPL	2009	2	126.0
AAPL	2009	3	161.0
AAPL	2009	4	192.0
AAPL	2008	1	130.0
AAPL	2008	2	168.0
...	...	...	...

The columns that appear in the Select clause must either be used for group in the Group By clause or be used in aggregate calculations. It is also quite common to sort by the columns used in grouping.

Symbol	Yr	HighStdDev
AAPL	1984	1.6057616847575154
AAPL	1985	4.628989380498321
AAPL	1986	5.626270136472927
AAPL	1987	14.69298548409742
AAPL	1988	2.4555231784570233
AAPL	1989	4.2820135239165875
AAPL	1990	4.529251236509766
AAPL	1991	7.4271879581244775
AAPL	1992	7.62725912432657
AAPL	1993	13.459208620188244
AAPL	1994	4.371574862654053
AAPL	1995	4.069107853754948
AAPL	1996	3.336661126725468
...	...	...

The examples above partitioned the data and calculated an aggregate value for each partition based only on a single group criterion. However, let's assume you are not only interested in the standard deviation of high prices for each stock and year, but additionally would like to see the overall standard deviation of high prices per stock, and the total standard deviation of high prices over all stocks. You could run three separate GROUP BY queries and combine their results as follows:

SELECT symbol, extract_year(datetime) AS yr, stddev(high) AS highStdDev FROM stocks
GROUP BY symbol, extract_year datetime)
UNION ALL
SELECT symbol, CAST(NULL AS DATE) AS yr, stddev(high) AS highStdDev FROM stocks
GROUP BY symbol
UNION ALL
SELECT CAST(NULL AS STRING) AS symbol, CAST(NULL AS DATE) AS yr, stddev(high) AS
highStdDev FROM stocks ORDER BY symbol NULLS FIRST, yr DESC NULLS FIRST
-- we specify an ordering here to see the aggregated symbol and year columns
in the example output --

MashZone NextGenfully supports the SQL syntax for complex GROUPING SETS, which allows to specify a single set of group expressions in the group-by clause, and to specify multiple sets that the data is partitioned and aggregated upon. Using this syntax, the above query could easily be rewritten to:

SELECT symbol, extract_year(datetime) AS yr, stddev(high) AS highStdDev FROM stocks
GROUP BY GROUPING SETS ( (symbol, extract_year(datetime)), symbol, () )
ORDER BY symbol NULLS FIRST, yr DESC NULLS FIRST
-- we specify an ordering here to see the aggregated symbol and year columns
in the example output --

Note:

Please note, that this query is not only much shorter, but may also be evaluated more efficiently by the query engine.

The above GROUPING SETS clause defines three sets of group expressions (symbol, extract_year(datetime)), symbol, and the empty grouping set (), so that in the result of the query we will find a row for every unique pair of (symbol, extract_year(datetime)) values, concatenated by rows for each unique value of symbol, and a single row representing the empty group. For each such group row, the values of the group expressions that are not considered in the current grouping are filled with NULL, e. g. the yr column is NULL for the second grouping by symbol (see also section Null values in Grouping Set aggregations).

Symbol	Yr	HighStdDev
		95.99022287735269	This row indicates the HighStdDev aggregated over all stocks and all years.
AAPL		66.38352453765187	This row indicates the HighStdDev for symbol AAPL aggregated over all years.
AAPL	2011	7.78515914570644
AAPL	2010	37.77321347891091
AAPL	2009	40.274807396460886
AAPL	2008	33.6190395688354
AAPL	2007	37.83959132285136
AAPL	2006	9.55659436868603
AAPL	2005	15.564417219126476
AAPL	2004	13.782231301956635
AAPL	2003	3.4495406114241156
AAPL	2002	4.4393911367617065
AAPL	2001	2.6334977228792624
AAPL	2000	43.679940128685516
...	...	...

It is very common, to have hierarchical dimensions on which to partition the data. For instance, year and month could be such a dimension hierarchy. If you are interested in the highest price values among all stocks for each dimension level, you could use the GROUPING SETS syntax to specify the following four sets of group expressions:

SELECT extract_year(datetime) AS y, extract_month(datetime) AS m,
max(high) AS maxHigh
FROM stocks
GROUP BY GROUPING SETS ( (extract_year(datetime), extract_month(datetime)),
(extract_year(datetime)),
() )
ORDER BY y NULLS FIRST, m NULLS FIRST
-- we specify an ordering here to see the aggregated symbol and year columns
in the example output --

However, for queries like this, where the grouping set dimensionality is reduced from right to left, Business Analytics supports the SQL keyword ROLLUP , which - given a list of group expressions - rolls-up the list by consecutively removing one expression from the end of the list and calculating the next higher level aggregate, up to the overall aggregate.

SELECT extract_year(datetime) AS y, extract_month(datetime) AS m, max(high) AS maxHigh
FROM stocks
GROUP BY ROLLUP (extract_year(datetime), extract_month(datetime))
ORDER BY y NULLS FIRST, m NULLS FIRST
-- we specify an ordering here to see the aggregated symbol and year columns
in the example output --

Note:

Please note, that the roll-up of the grouping columns is reflected in the result by a NULL value in the corresponding group column (shown as an empty column).

Y	M	MaxHigh
		747.24	This row indicates the MaxHigh aggregated over all years and all months.
1984		128.5	This row indicates the HighStdDev for year 1984 aggregated over all months.
1984	9	128.5
1984	10	127.75
1984	11	127.75
1984	12	124.5
1985		158.75	This row indicates the HighStdDev for year 1985 aggregated over all months.
1985	1	137.63
1985	2	138.25
1985	3	136.38
1985	4	130.38
1985	5	133.25
1985	6	130.63
1985	7	132.75
...	...	...	...

In addition to the ROLLUP keyword, which enumerates a hierarchy of grouping sets,MashZone NextGensupports the SQL keyword CUBE, to enumerate the power set of grouping sets, that is the set of all subsets. As an example, consider the group expressions chamber, state and gender. Then the clause CUBE(party, state, gender) translates into the following grouping sets:

SELECT party, state, gender, count(*) FROM congress
WHERE chamber = 'house' AND state LIKE 'K%' -- only to limit the result size here --
GROUP BY GROUPING SETS ( (party, state, gender),
(party, state), (party, gender), (state, gender)
(party), (state), (gender),
() )

SELECT party, state, gender, count(*) FROM congress
WHERE chamber = 'house' AND state LIKE 'K%' -- only to limit the result size here --
GROUP BY CUBE(party, state, gender)
ORDER BY party NULLS FIRST, state NULLS FIRST, gender NULLS FIRST

Party	State	Gender	COUNT_
			10	Overall number of congressmen representing Kansas or Kentucky
		F	1	Overall number of female congressmen representing Kansas or Kentucky
		M	9	Overall number of male congressmen representing Kansas or Kentucky
	KS		4	Number of congressmen representing Kansas
	KS	F	1	Number of female congressmen representing Kansas
	KS	M	3	Number of male congressmen representing Kansas
	KS		6	Overall number of congressmen representing Kentucky
	KS	M	6	Number of male congressmen representing Kentucky
D			2	Number of democratic congressmen representing Kansas or Kentucky
D		M	2	Number of male democratic congressmen representing Kansas or Kentucky
D	KS		2	Number of democratic congressmen representing Kentucky
D	KS	M	2	Number of male democratic congressmen representing Kentucky
R			8	Number of republican congressmen representing Kansas or Kentucky
R		F	1	Number of female republican congressmen representing Kansas or Kentucky
...	...	...	...	...

With the CUBE and ROLLUP syntax it is also possible, to combine multiple columns into a group expression, that is treated as if it was a single column. For instance, the expression ROLLUP(a, (b,c), d) translates into the grouping sets (a, (b,c), d), (a, (b,c)), (a), (). I.e., c is not rolled-up from (b,c).

Note:

It is a best practice, to use ROLLUP and CUBE only in cases, when all calculated groupings are relevant. If only a subset of the produced groupings is of interest, it is more efficient to use the corresponding GROUPING SETS expression generating only those relevant groupings.

Consequently, it is not possible to tell the two different meanings of NULL apart only from looking at the result row itself. Therefore, a special SQL function GROUPING is provided, the purpose of which is to tell for a single group expression, whether or not this expression has been aggregated in the result (result = 1) or not (result = 0).

SELECT symbol, GROUPING(symbol) AS grp_symbol, extract_year(datetime) AS yr,
GROUPING(extract_year(datetime)) AS grp_year,
stddev(high) AS highStdDev
FROM stocks
GROUP BY ROLLUP (symbol, extract_year(datetime))
ORDER BY symbol NULLS FIRST, yr DESC NULLS FIRST

Symbol	Grp_symbol	Yr	Grp_year	HighStdDev
	1		1	95.99022287735269	NULL values here indicate that symbol and yr have been aggregated.
AAPL	0		1	66.38352453765187	NULL value in column yr here indicates that yr has been aggregated.
AAPL	0	2011	0	7.78515914570644
AAPL	0	2010	0	37.77321347891091
AAPL	0	2009	0	40.274807396460886
AAPL	0	2008	0	33.6190395688354
...	...	...	...	...	...

Additionally, this function is overloaded to also take a number of group expressions as parameters. In this case, the result will be a numerical id for each resulting grouping. In order to make this id unique among all groupings, all group expressions should be given as parameters to the function.

SELECT symbol, extract_year(datetime) AS yr,
GROUPING(symbol, extract_year(datetime)) AS grp,
stddev(high) AS highStdDev
FROM stocks
WHERE extract_year(datetime) BETWEEN 2009 AND 2011 -- only to limit the result set here --
GROUP BY ROLLUP (symbol, extract_year(datetime))
ORDER BY symbol NULLS FIRST, yr DESC NULLS FIRST

Symbol	Yr	Grp	HighStdDev
		3	151.89615745136808	There is only one row in this grouping representing the overall aggregate.
AAPL		1	82.34565115212739	This row belongs to the grouping 1 which relates to the grouping set (symbol).
AAPL	2011	0	7.785159145706435	This row belongs to the grouping 0 which relates to the grouping set (symbol, extract_year(datetime)).
AAPL	2010	0	37.773213478910954
AAPL	2009	0	40.274807396460936
AAPL		1	41.95096980230709	This row also belongs to the grouping 1 which relates to the grouping set (symbol).
...	...	...	...	...

The GROUPING(col1, … colN) result is internally computed by combining the GROUPING results for each single column into a bit set which is finally interpreted as a LONG value. Example: Assume the following GROUPING values for single group columns a, b and c:

To make things clear, the following query shows all the distinct groupings resulting from a group by CUBE over three columns. Note how the single GROUPING values for each column are concatenated into a bit set representing the final GROUPING value.

SELECT DISTINCT
GROUPING(symbol) AS grpSmbl, GROUPING(extract_year(datetime)) AS grpYr,
GROUPING(extract_month(datetime)) AS grpMnth,
CAST(GROUPING(symbol) AS STRING) || CAST(GROUPING(extract_year(datetime)) AS STRING) ||
CAST(GROUPING(extract_month(datetime)) AS STRING) AS bits,
GROUPING(symbol, extract_year(datetime), extract_month(datetime)) AS grp
FROM stocks
GROUP BY CUBE (symbol, extract_year(datetime), extract_month(datetime))
ORDER BY grp ASC

grpSymbol	grpYear	grpMonth	bits	grp
0	0	0	000	0
0	0	1	001	1
0	1	0	010	2
0	1	1	011	3
1	0	0	100	4
1	0	1	101	5
1	1	0	110	6
1	1	1	111	7

Note:

Please note that each unique GROUPING id only indicates the grouping set, which produced the groups in this grouping. It does not uniquely identify a group within this grouping!

The group by clause may not only take simple group expressions or grouping sets, but also any combination of them, e. g., GROUP BY a, ROLLUP(b,c), GROUPING SETS((d,e), (f)). The resulting grouping sets are then defined by the Cartesian product of the involved grouping sets, i.e. the example clause translates into the following grouping sets:

Likewise, the GROUPING SETS clause may take nested ROLLUP , CUBE and GROUPING SET clauses, e. g. GROUP BY GROUPING SETS (a, ROLLUP(b,c), GROUPING SETS((d,e), (f)) ). In this case, the nested grouping sets are "flattened" and merged into the surrounding grouping set, i.e., the example clause translates into the following grouping sets: