VARIANCE Function

variance([DISTINCT] expr)

Description

The VARIANCE function is an alias of VAR_SAMP and computes the sample variance of a set of numeric data. Variance is a statistical measure of data dispersion, reflecting the degree of variability in the data.

Parameters

  • expr: A numeric type, which can be TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, or DECIMAL.
  • DISTINCT (optional): specifies that the variance should be computed on the deduplicated set. If this parameter is not set, the variance is computed on the set including duplicates.

Return Type

  • Returns a DOUBLE value representing the computed sample variance.

Notes

  • If the input contains NULL values, they are excluded from computation.
  • The VARIANCE function is fully identical to VAR_SAMP and the two can be used interchangeably.

Examples

  1. Compute the sample variance including duplicates
SELECT variance(col) FROM VALUES (1), (2), (3), (3), (NULL) AS tab(col);
+--------------------+
| `variance`(col)    |
+--------------------+
| 0.9166666666666666 |
+--------------------+
  1. Compute the sample variance on deduplicated values
SELECT variance(DISTINCT col) FROM VALUES (1), (2), (3), (3), (NULL) AS tab(col);
+--------------------------+
| `variance`(DISTINCT col) |
+--------------------------+
| 1.0                      |
+--------------------------+
  1. Use FILTER clause to conditionally compute variance
SELECT variance(col) FILTER (WHERE col > 1) FROM VALUES (1), (2), (3), (4) AS tab(col);
+--------------------------------------+
| variance(col) FILTER (WHERE col > 1) |
+--------------------------------------+
| 1.0                                  |
+--------------------------------------+
  1. Compute variance by group
SELECT c, variance(col)
FROM VALUES ('a', 1), ('a', 2), ('a', 3), ('b', 10), ('b', 20) AS tab(c, col)
GROUP BY c;
+---+--------------------+
| c | variance(col)      |
+---+--------------------+
| a | 1.0                |
| b | 50.0               |
+---+--------------------+