COVAR_SAMP Function

covar_samp(y, x)

Description

The COVAR_SAMP function computes the sample covariance between two numeric columns. Covariance measures the degree to which two variables change together. Sample covariance uses (n-1) as the denominator and is an unbiased estimator of the population covariance.

Parameters

  • y: A numeric expression serving as the dependent variable. Must be a numeric type that can be cast to DOUBLE.
  • x: A numeric expression serving as the independent variable. Must be a numeric type that can be cast to DOUBLE.

Return Type

  • Returns a DOUBLE value representing the sample covariance.
    • Positive value: x and y tend to change in the same direction (positive correlation)
    • Negative value: x and y tend to change in opposite directions (negative correlation)
    • Close to 0: weak correlation between variables

Notes

  • During computation, NULL values are ignored.
  • If there are fewer than 1 valid data point, returns NULL.
  • Sample covariance uses the formula with denominator (n-1) and is an unbiased estimator of the population covariance. With small sample sizes, the variability (variance) of sample covariance tends to be larger.
  • When n is large, the difference between sample covariance and population covariance is negligible.

Examples

  1. Basic usage: compute sample covariance
SELECT covar_samp(c1, c2)
FROM VALUES (1, 1), (2, 2), (2, 2), (3, 3) AS tab(c1, c2);
+--------------------+
| covar_samp(c1, c2) |
+--------------------+
| 0.6666666666666666 |
+--------------------+
  1. Comparison with population covariance
SELECT covar_pop(c1, c2), covar_samp(c1, c2)
FROM VALUES (1, 1), (2, 2), (2, 2), (3, 3) AS tab(c1, c2);
+-------------------+--------------------+
| covar_pop(c1, c2) | covar_samp(c1, c2) |
+-------------------+--------------------+
| 0.5               | 0.6666666666666666 |
+-------------------+--------------------+
  1. Perfect positive correlation data
SELECT covar_samp(x, y)
FROM VALUES (1, 2), (2, 4), (3, 6), (4, 8) AS t(x, y);
+--------------------+
| covar_samp(x, y)   |
+--------------------+
| 3.3333333333333335 |
+--------------------+
  1. Perfect negative correlation data
SELECT covar_samp(x, y)
FROM VALUES (1, 8), (2, 6), (3, 4), (4, 2) AS t(x, y);
+---------------------+
| covar_samp(x, y)    |
+---------------------+
| -3.3333333333333335 |
+---------------------+
  1. NULL values are ignored
SELECT covar_samp(x, y)
FROM VALUES (1, 1), (2, NULL), (3, 3), (NULL, 4), (5, 5) AS t(x, y);
+-------------------+
| covar_samp(x, y)  |
+-------------------+
| 4.0               |
+-------------------+
  1. Compute sample covariance by group
SELECT
  product,
  covar_samp(price, sales_volume) as price_sales_covar
FROM VALUES
  ('A', 10, 100),
  ('A', 20, 80),
  ('A', 30, 60),
  ('B', 15, 50),
  ('B', 25, 40),
  ('B', 35, 30)
AS sales(product, price, sales_volume)
GROUP BY product;
+---------+--------------------+
| product | price_sales_covar  |
+---------+--------------------+
| A       | -200.0             |
| B       | -100.0             |
+---------+--------------------+