Lakehouse Ranking and Percentile Analysis Guide
Overview
Ranking and percentile analysis are advanced scenarios in data analysis, used to compute the relative position of data within groups, percentile scores, and more. Singdata Lakehouse provides comprehensive window function support, including ranking functions, offset functions, and aggregate window functions. This guide categorizes usage by business scenario to help you quickly master efficient ranking and percentile analysis methods.
Quick Navigation
- Basic Ranking -- Use ROW_NUMBER / RANK / DENSE_RANK to compute rankings
- Percentile Ranking -- Use PERCENT_RANK to compute relative position
- Before/After Row Comparison -- Use LAG / LEAD for period-over-period comparison
- Moving Average -- Use window frames for sliding statistics
- Quantile Calculation -- Use PERCENTILE to compute median and quartiles
SQL Commands Covered
| Command/Function | Purpose | Applicable Scenario |
|---|---|---|
ROW_NUMBER() | Generate unique row numbers | Deduplication, pagination, Top-N |
RANK() | Ranking (ties skip numbers) | Score ranking, competition ranking |
DENSE_RANK() | Ranking (ties don't skip numbers) | Continuous ranking |
PERCENT_RANK() | Percentile ranking | Relative position analysis |
LAG() / LEAD() | Access previous/next row data | Period-over-period, year-over-year |
AVG() OVER (... ROWS BETWEEN) | Moving average | Trend analysis |
PERCENTILE() | Quantile calculation | Median, quartiles |
Prerequisites
The following examples use a simulated employee performance table performance:
Basic Ranking
Use ranking functions to compute employee performance rankings within departments. Lakehouse provides three ranking functions with slightly different behaviors.
Result:
| emp_name | dept | score | row_num | rank | dense_rank |
|---|---|---|---|---|---|
| Alice | Engineering | 95 | 1 | 1 | 1 |
| Carol | Engineering | 95 | 2 | 1 | 1 |
| Bob | Engineering | 85 | 3 | 3 | 2 |
| Eve | Marketing | 92 | 1 | 1 | 1 |
| David | Marketing | 88 | 2 | 2 | 2 |
| Frank | Marketing | 78 | 3 | 3 | 3 |
| Grace | HR | 90 | 1 | 1 | 1 |
| Henry | HR | 85 | 2 | 2 | 2 |
Differences Between Ranking Functions
| Function | Tie Handling | Number Continuity |
|---|---|---|
ROW_NUMBER() | Randomly assigns different numbers | Continuous |
RANK() | Same rank for ties, subsequent numbers skip | Discontinuous (1,1,3) |
DENSE_RANK() | Same rank for ties, subsequent numbers are continuous | Continuous (1,1,2) |
Percentile Ranking
Use PERCENT_RANK() to compute an employee's relative position within the department (between 0 and 1).
Result:
| emp_name | dept | score | pct_rank |
|---|---|---|---|
| Alice | Engineering | 95 | 0 |
| Carol | Engineering | 95 | 0 |
| Bob | Engineering | 85 | 0.5 |
| Eve | Marketing | 92 | 0 |
| David | Marketing | 88 | 0.5 |
| Frank | Marketing | 78 | 1 |
| Grace | HR | 90 | 0 |
| Henry | HR | 85 | 1 |
Before/After Row Comparison
Use LAG() and LEAD() to access the previous or next row's data relative to the current row, commonly used for period-over-period change calculations.
Result:
| emp_name | dept | score | prev_score | diff |
|---|---|---|---|---|
| Alice | Engineering | 95 | NULL | NULL |
| Carol | Engineering | 95 | 95 | 0 |
| Bob | Engineering | 85 | 95 | -10 |
| Eve | Marketing | 92 | NULL | NULL |
| David | Marketing | 88 | 92 | -4 |
| Frank | Marketing | 78 | 88 | -10 |
| Grace | HR | 90 | NULL | NULL |
| Henry | HR | 85 | 90 | -5 |
Moving Average
Use window frames (ROWS BETWEEN) to compute sliding window statistics, suitable for trend analysis.
Result:
| emp_name | dept | score | moving_avg_3 |
|---|---|---|---|
| Alice | Engineering | 95 | 95 |
| Carol | Engineering | 95 | 95 |
| Bob | Engineering | 85 | 91.7 |
| Eve | Marketing | 92 | 92 |
| David | Marketing | 88 | 90 |
| Frank | Marketing | 78 | 86 |
| Grace | HR | 90 | 90 |
| Henry | HR | 85 | 87.5 |
Quantile Calculation
Use the PERCENTILE() function to compute data quantiles such as the median (0.5) and quartiles (0.25, 0.75).
Result:
| dept | median | q1 | q3 | min_score | max_score |
|---|---|---|---|---|---|
| Engineering | 95 | 90 | 95 | 85 | 95 |
| HR | 87.5 | 86.25 | 88.75 | 85 | 90 |
| Marketing | 88 | 83 | 90 | 78 | 92 |
Clean Up Test Data
After completing ranking analysis verification, it is recommended to clean up test tables:
Important Notes
- Window Function Execution Order: Window functions execute after
WHEREandGROUP BY; window function results cannot be used directly inWHERE. UseQUALIFYor subqueries to filter. - NULL Value Sorting:
ORDER BYplacesNULLvalues last (DESC) or first (ASC) by default; useNULLS FIRST/LASTto control. - Performance Optimization: Choose
PARTITION BYcolumns with moderate cardinality to avoid individual partitions becoming too large. - PERCENT_RANK Syntax:
PERCENT_RANK()does not accept arguments; writePERCENT_RANK() OVER (...)directly.
