Singdata - Documents

BUCKET

bucket(numBuckets, col)

Description

Computes a hash bucket number based on a specified number of buckets and a column value. This function uses a hash algorithm to map the input value to an integer within the range [0, numBuckets), commonly used in data partitioning and distributed computing scenarios.

Parameters

numBuckets: int type, the number of buckets, must be greater than 0
col: any type, the column value for which to compute the hash bucket

Return Result

int type
The return value range is [0, numBuckets)
Returns NULL if the input value is NULL

Examples

SELECT bucket(10, 'test'); -- result: 4

SELECT bucket(10, 123); -- result: 4

Notes

The BUCKET function uses a hash algorithm to ensure the same input value is always mapped to the same bucket
Commonly used in scenarios such as data partitioning, sampling, and load balancing
The returned bucket number starts from 0, with a maximum value of numBuckets - 1
The same value of different types (e.g., the integer 1 and the string '1') may be mapped to different buckets
This function is deterministic: the same input always produces the same output.