9.4. Linear Regression Aggregate Functions
Linear regression functions are useful for trend line continuation. The trend or regression line is usually a pattern followed by a set of values. Linear regression is useful to predict future values. To continue the regression line, you need to know the slope and the point of intersection with the y-axis. As set of linear functions can be used for calculating these values.
In the function syntax, y is interpreted as an x-dependent variable.
The linear regression aggregate functions take a pair of arguments, the dependent variable expression (y) and the independent variable expression (x), which are both numeric value expressions.
Any row in which either argument evaluates to NULL is removed from the rows that qualify.
If there are no rows that qualify, then the result of REGR_COUNT is 0 (zero), and the other linear regression aggregate functions result in NULL.
9.4.1. REGR_AVGX()
Average of the independent variable of the regression line
Result typeDOUBLE PRECISION
Syntax
REGR_AVGX ( <y>, <x> )
REGR_AVGX Function Parameters| Parameter | Description |
|---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_AVGX calculates the average of the independent variable (x) of the regression line.
The function REGR_AVGX(<y>, <x>) is equivalent to
SUM(<exprX>) / REGR_COUNT(<y>, <x>)<exprX> :==CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END
See alsoSection 9.4.2, “REGR_AVGY()”, Section 9.4.3, “REGR_COUNT()”, Section 9.2.6, “SUM()”
9.4.2. REGR_AVGY()
Average of the dependent variable of the regression line
Result typeDOUBLE PRECISION
Syntax
REGR_AVGY ( <y>, <x> )
REGR_AVGY Function Parameters| Parameter | Description |
|---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_AVGY calculates the average of the dependent variable (y) of the regression line.
The function REGR_AVGY(<y>, <x>) is equivalent to
SUM(<exprY>) / REGR_COUNT(<y>, <x>)<exprY> :==CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <y> END
See alsoSection 9.4.1, “REGR_AVGX()”, Section 9.4.3, “REGR_COUNT()”, Section 9.2.6, “SUM()”
9.4.3. REGR_COUNT()
Number of non-empty pairs of the regression line
Result typeDOUBLE PRECISION
Syntax
REGR_COUNT ( <y>, <x> )
REGR_COUNT Function Parameters| Parameter | Description |
|---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_COUNT counts the number of non-empty pairs of the regression line.
The function REGR_COUNT(<y>, <x>) is equivalent to
COUNT(*) FILTER (WHERE <x> IS NOT NULL AND <y> IS NOT NULL)See alsoSection 9.2.2, “COUNT()”
9.4.4. REGR_INTERCEPT()
Point of intersection of the regression line with the y-axis
Result typeDOUBLE PRECISION
Syntax
REGR_INTERCEPT ( <y>, <x> )
REGR_INTERCEPT Function Parameters| Parameter | Description |
|---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_INTERCEPT calculates the point of intersection of the regression line with the y-axis.
The function REGR_INTERCEPT(<y>, <x>) is equivalent to
REGR_AVGY(<y>, <x>) - REGR_SLOPE(<y>, <x>) * REGR_AVGX(<y>, <x>)9.4.4.1. REGR_INTERCEPT Examples
Forecasting sales volume
with recursive years (byyear) as (select 1991from rdb$databaseunion allselect byyear + 1from yearswhere byyear < 2020),s as (selectextract(year from order_date) as byyear,sum(total_value) as total_valuefrom salesgroup by 1),regr as (selectregr_intercept(total_value, byyear) as intercept,regr_slope(total_value, byyear) as slopefrom s)selectyears.byyear as byyear,intercept + (slope * years.byyear) as total_valuefrom yearscross join regr
BYYEAR TOTAL_VALUE------ ------------1991 118377.351992 414557.621993 710737.891994 1006918.161995 1303098.431996 1599278.691997 1895458.961998 2191639.231999 2487819.502000 2783999.77...
See alsoSection 9.4.1, “REGR_AVGX()”, Section 9.4.2, “REGR_AVGY()”, Section 9.4.6, “REGR_SLOPE()”
9.4.5. REGR_R2()
Coefficient of determination of the regression line
Result typeDOUBLE PRECISION
Syntax
REGR_R2 ( <y>, <x> )
REGR_R2 Function Parameters| Parameter | Description |
|---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The REGR_R2 function calculates the coefficient of determination, or R-squared, of the regression line.
The function REGR_R2(<y>, <x>) is equivalent to
POWER(CORR(<y>, <x>), 2)See alsoSection 9.3.1, “CORR()”, POWER
9.4.6. REGR_SLOPE()
Slope of the regression line
Result typeDOUBLE PRECISION
Syntax
REGR_SLOPE ( <y>, <x> )
REGR_SLOPE Function Parameters| Parameter | Description |
|---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_SLOPE calculates the slope of the regression line.
The function REGR_SLOPE(<y>, <x>) is equivalent to
COVAR_POP(<y>, <x>) / VAR_POP(<exprX>)<exprX> :==CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END
See alsoSection 9.3.2, “COVAR_POP()”, Section 9.3.6, “VAR_POP()”
9.4.7. REGR_SXX()
Sum of squares of the independent variable
Result typeDOUBLE PRECISION
Syntax
REGR_SXX ( <y>, <x> )
REGR_SXX Function Parameters| Parameter | Description |
|---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_SXX calculates the sum of squares of the independent expression variable (x).
The function REGR_SXX(<y>, <x>) is equivalent to
REGR_COUNT(<y>, <x>) * VAR_POP(<exprX>)<exprX> :==CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END
See alsoSection 9.4.3, “REGR_COUNT()”, Section 9.3.6, “VAR_POP()”
9.4.8. REGR_SXY()
Sum of products of the independent variable and the dependent variable
Result typeDOUBLE PRECISION
Syntax
REGR_SXY ( <y>, <x> )
REGR_SXY Function Parameters| Parameter | Description |
|---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_SXY calculates the sum of products of independent variable expression (x) times dependent variable expression (y).
The function REGR_SXY(<y>, <x>) is equivalent to
REGR_COUNT(<y>, <x>) * COVAR_POP(<y>, <x>)See alsoSection 9.3.2, “COVAR_POP()”, Section 9.4.3, “REGR_COUNT()”
9.4.9. REGR_SYY()
Sum of squares of the dependent variable
Result typeDOUBLE PRECISION
Syntax
REGR_SYY ( <y>, <x> )
REGR_SYY Function Parameters| Parameter | Description |
|---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_SYY calculates the sum of squares of the dependent variable (y).
The function REGR_SYY(<y>, <x>) is equivalent to
REGR_COUNT(<y>, <x>) * VAR_POP(<exprY>)<exprY> :==CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <y> END
See alsoSection 9.4.3, “REGR_COUNT()”, Section 9.3.6, “VAR_POP()”