VAR_SAMP

var_samp([distinct] expr)

Description

The VAR_SAMP function is used to calculate the sample variance of a set of numerical data. Variance is a statistic that measures the degree of dispersion in a data distribution, reflecting the magnitude of data volatility. By calculating the variance, we can understand the trend and range of data fluctuations.

Parameter Description

  • expr: Numeric type, can be tinyint, smallint, int, bigint, float, double, or decimal.
  • distinct (optional): Indicates the calculation of the variance of the set after deduplication. If this parameter is not set, the variance of the set including duplicate values is calculated.

Return Result

  • Returns a double type value, representing the calculated sample variance.
  • If the input parameters contain null values, the null values are not included in the calculation.

Usage Example

  1. Calculate the sample variance including duplicate values:
SELECT var_samp(col) FROM VALUES (1), (2), (3), (3), (null) AS tab(col);
+--------------------+
|  `var_samp`(col)   |
+--------------------+
| 0.9166666666666666 |
+--------------------+
  1. Calculate the sample variance after deduplication:
SELECT var_samp(DISTINCT col) FROM VALUES (1), (2), (3), (3), (null) AS tab(col);
+--------------------------+
| `var_samp`(DISTINCT col) |
+--------------------------+
| 1.0                      |
+--------------------------+