Analytic functions most commonly perform calculations using sets of rows within a dataset. This may be the entire dataset or specific sets of rows defined as groups, partitions or windows.
Aggregation analytic functions: use all rows in the dataset, group, partition or window to perform a calculation and return a single value. For example, sum adds the values of all rows in the current scope. Aggregate analytic functions can be be used in the SELECT or HAVING clauses of RAQL queries:
To return a single value for each group defined in a GROUP BY clause.
To return a single value for each partition or each window defined in an OVER clause. This single value is added to each row in the partition or window. You can also use aggregate analytic functions to return running calculations for partitions or windows defined in an OVER clause.
To return a single value for the entire dataset if no group definition or partition definition is specified.
Window analytic functions: use some of the rows in a partition or window to perform a calculation and return a value for each row. Typically, this uses rows that are relative to the current row, such as FIRST_VALUE or ROW_NUMBER.
You must include an OVER clause in queries that use window analytic functions. This defines the partitions or windows used by the function.
As with plain functions, RAQL provides analytic functions defined in the SQL standard as well as RAQL-specific extensions.
The RAQL-specific extensions are:
Aggregate |
Window |
Function |
Description |
|
correlation(Number col-or-expr, Number col-or-expr) |
Deprecated. See Update analytic functions for 3.8 for alternatives. |
|
|
covar_samp(Number col-or-expr, Number col-or-expr) |
Returns the sample covariance (an estimate) of the two specified columns. See covariance for more information. |
|
|
analytics.discretize(Number col-or-expr, int binCount) |
This function segments a continuous range of values for a column or expression into discrete bins based on the number of bins specified. If the value:
So 0 <= return value < 1, indicates the first bin, 1 <= return value < 2, indicates the second bin, and so on. Note that this function does not support windows. It can only be applied to the entire partition. |
|
|
gmean(Number col-or-expr) |
Returns the geometric mean of the values for this column or expression within the partition or the current window in this partition. The geometric mean is typically used to define a mean when more than one property is involved, especially if the scale for the properties is different. |
|
|
analytics.kmeans_clusters(Number col-or-expr1[,Number col-or-expr2,...Number col-or-exprN]; Integer k, Integer iterations, String measure) |
Returns the optimum center point for the number of clusters specified as k that group rows within a minimum distance for each cluster. Inclusion in a cluster for each row is defined by the features specified in the list of column parameters (before the semi-colon). The distance from the cluster center point is measured by the formula specified in measure.
|
|
|
analytics.kmeans_observations(Number col-or-expr1[,Number col-or-expr2,...Number col-or-exprN]; Integer k, Integer iterations, String measure) |
Returns the ID of the cluster that each row belongs to for the number of clusters specified as k. Inclusion in a cluster for each row is defined by the features specified in the list of column parameters (before the semi-colon). The distance from the cluster center point is measured by the formula specified in measure.
|
|
|
analytics.ordinal(String col-or-expr) |
Returns an index number for each unique string value in the specified column or expression. Typically, assigning an ordinal number allows computations to work with string columns. |
|
|
skew(Number col-or-expr) |
Returns a measure of the asymmetry from a normal probability distribution of the values for this column or expression within the current partition or the current window in this partition. This uses the adjusted Fisher-Pearson standardized moment coefficient. Positive values tend to indicate that more values are found left of the mean, while negative values imply the reverse. This returns 0 if there are not at least three rows in the dataset. This function can also return NaN if the caluclation is undefined or not a number. |