APPROX_PERCENTILE
Description
The APPROX_PERCENTILE
function is used to calculate and return the approximate percentile of values in a specified column. This function accepts two parameters: the first parameter is the name of the column for which the percentile is to be calculated, and the second parameter is a floating-point number representing the desired percentile. When the data is sorted in ascending order, this function returns the value corresponding to the specified percentage.
Parameter Description
col
: The name of the column for which the percentile is to be calculated. This column should contain numeric data types such as tinyint, smallint, int, bigint, float, double, or decimal.percentage
: A constant of type double representing the desired percentile. This value should be in the range [0.0, 1.0].
Return Result
The function returns a value of type double. If the DISTINCT
keyword is specified, the calculation will be based on the deduplicated dataset. Note that null values will not be included in the calculation.
Usage Example
- Calculate the median (50th percentile) in a dataset:
- Calculate the median (50th percentile) of the deduplicated dataset:
- Calculate the 25th percentile of the dataset:
- Calculate the 75th percentile of the dataset:
Summary
The APPROX_PERCENTILE
function provides users with a quick way to calculate approximate percentiles of a dataset. By adjusting the percentage
parameter, users can easily obtain values for different percentiles, thereby better analyzing and understanding the data. When using this function, please ensure that the data types match and that the percentage
parameter value is within the valid range.