COVAR_POP Function

covar_pop(y, x)

Description

The COVAR_POP function computes the population covariance between two numeric columns. Covariance measures the degree to which two variables change together, and is a statistical measure of variable correlation.

Parameters

  • y: A numeric expression serving as the dependent variable. Must be a numeric type that can be cast to DOUBLE.
  • x: A numeric expression serving as the independent variable. Must be a numeric type that can be cast to DOUBLE.

Return Type

  • Returns a DOUBLE value representing the population covariance.
    • Positive value: x and y tend to change in the same direction (positive correlation)
    • Negative value: x and y tend to change in opposite directions (negative correlation)
    • Close to 0: weak correlation between variables

Notes

  • During computation, NULL values are ignored.
  • If there are fewer than 1 valid data point, returns NULL.
  • Population covariance uses the formula with denominator n, whereas sample covariance uses (n-1), so population covariance is typically smaller than sample covariance.

Examples

  1. Basic usage: compute population covariance
SELECT covar_pop(c1, c2)
FROM VALUES (1, 1), (2, 2), (2, 2), (3, 3) AS tab(c1, c2);
+-------------------+
| covar_pop(c1, c2) |
+-------------------+
| 0.5               |
+-------------------+
  1. Comparison with sample covariance
SELECT covar_pop(c1, c2), covar_samp(c1, c2)
FROM VALUES (1, 1), (2, 2), (2, 2), (3, 3) AS tab(c1, c2);
+-------------------+--------------------+
| covar_pop(c1, c2) | covar_samp(c1, c2) |
+-------------------+--------------------+
| 0.5               | 0.6666666666666666 |
+-------------------+--------------------+
  1. Perfect positive correlation data
SELECT covar_pop(x, y)
FROM VALUES (1, 2), (2, 4), (3, 6), (4, 8) AS t(x, y);
+------------------+
| covar_pop(x, y)  |
+------------------+
| 2.5              |
+------------------+
  1. Perfect negative correlation data
SELECT covar_pop(x, y)
FROM VALUES (1, 8), (2, 6), (3, 4), (4, 2) AS t(x, y);
+------------------+
| covar_pop(x, y)  |
+------------------+
| -2.5             |
+------------------+
  1. Uncorrelated data
SELECT covar_pop(x, y)
FROM VALUES (1, 5), (2, 5), (3, 5), (4, 5) AS t(x, y);
+------------------+
| covar_pop(x, y)  |
+------------------+
| 0.0              |
+------------------+