Lakehouse Ranking and Percentile Analysis Guide

Overview

Ranking and percentile analysis are advanced scenarios in data analysis, used to compute the relative position of data within groups, percentile scores, and more. Singdata Lakehouse provides comprehensive window function support, including ranking functions, offset functions, and aggregate window functions. This guide categorizes usage by business scenario to help you quickly master efficient ranking and percentile analysis methods.


SQL Commands Covered

Command/FunctionPurposeApplicable Scenario
ROW_NUMBER()Generate unique row numbersDeduplication, pagination, Top-N
RANK()Ranking (ties skip numbers)Score ranking, competition ranking
DENSE_RANK()Ranking (ties don't skip numbers)Continuous ranking
PERCENT_RANK()Percentile rankingRelative position analysis
LAG() / LEAD()Access previous/next row dataPeriod-over-period, year-over-year
AVG() OVER (... ROWS BETWEEN)Moving averageTrend analysis
PERCENTILE()Quantile calculationMedian, quartiles

Prerequisites

The following examples use a simulated employee performance table performance:

-- Create test table CREATE TABLE IF NOT EXISTS performance ( emp_id INT, emp_name STRING, dept STRING, score DOUBLE, review_date DATE ); -- Insert test data INSERT INTO performance VALUES (1, 'Alice', 'Engineering', 95, '2024-06-01'), (2, 'Bob', 'Engineering', 85, '2024-06-01'), (3, 'Carol', 'Engineering', 95, '2024-06-01'), (4, 'David', 'Marketing', 88, '2024-06-01'), (5, 'Eve', 'Marketing', 92, '2024-06-01'), (6, 'Frank', 'Marketing', 78, '2024-06-01'), (7, 'Grace', 'HR', 90, '2024-06-01'), (8, 'Henry', 'HR', 85, '2024-06-01');


Basic Ranking

Use ranking functions to compute employee performance rankings within departments. Lakehouse provides three ranking functions with slightly different behaviors.

-- Comparison of three ranking functions SELECT emp_name, dept, score, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY score DESC) as row_num, RANK() OVER (PARTITION BY dept ORDER BY score DESC) as rank, DENSE_RANK() OVER (PARTITION BY dept ORDER BY score DESC) as dense_rank FROM performance ORDER BY dept, row_num;

Result:

emp_namedeptscorerow_numrankdense_rank
AliceEngineering95111
CarolEngineering95211
BobEngineering85332
EveMarketing92111
DavidMarketing88222
FrankMarketing78333
GraceHR90111
HenryHR85222

Differences Between Ranking Functions

FunctionTie HandlingNumber Continuity
ROW_NUMBER()Randomly assigns different numbersContinuous
RANK()Same rank for ties, subsequent numbers skipDiscontinuous (1,1,3)
DENSE_RANK()Same rank for ties, subsequent numbers are continuousContinuous (1,1,2)

Percentile Ranking

Use PERCENT_RANK() to compute an employee's relative position within the department (between 0 and 1).

-- Compute percentile ranking SELECT emp_name, dept, score, ROUND(PERCENT_RANK() OVER (PARTITION BY dept ORDER BY score DESC), 2) as pct_rank FROM performance ORDER BY dept, pct_rank DESC;

Result:

emp_namedeptscorepct_rank
AliceEngineering950
CarolEngineering950
BobEngineering850.5
EveMarketing920
DavidMarketing880.5
FrankMarketing781
GraceHR900
HenryHR851

Before/After Row Comparison

Use LAG() and LEAD() to access the previous or next row's data relative to the current row, commonly used for period-over-period change calculations.

-- Compute score differences between employees within departments SELECT emp_name, dept, score, LAG(score, 1) OVER (PARTITION BY dept ORDER BY score DESC) as prev_score, score - LAG(score, 1) OVER (PARTITION BY dept ORDER BY score DESC) as diff FROM performance ORDER BY dept, score DESC;

Result:

emp_namedeptscoreprev_scorediff
AliceEngineering95NULLNULL
CarolEngineering95950
BobEngineering8595-10
EveMarketing92NULLNULL
DavidMarketing8892-4
FrankMarketing7888-10
GraceHR90NULLNULL
HenryHR8590-5

Moving Average

Use window frames (ROWS BETWEEN) to compute sliding window statistics, suitable for trend analysis.

-- Compute moving average within department ordered by score (current row and two preceding rows) SELECT emp_name, dept, score, ROUND(AVG(score) OVER ( PARTITION BY dept ORDER BY score DESC ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ), 1) as moving_avg_3 FROM performance ORDER BY dept, score DESC;

Result:

emp_namedeptscoremoving_avg_3
AliceEngineering9595
CarolEngineering9595
BobEngineering8591.7
EveMarketing9292
DavidMarketing8890
FrankMarketing7886
GraceHR9090
HenryHR8587.5

Quantile Calculation

Use the PERCENTILE() function to compute data quantiles such as the median (0.5) and quartiles (0.25, 0.75).

-- Compute median and quartiles of scores by department SELECT dept, PERCENTILE(score, 0.5) as median, PERCENTILE(score, 0.25) as q1, PERCENTILE(score, 0.75) as q3, MIN(score) as min_score, MAX(score) as max_score FROM performance GROUP BY dept ORDER BY dept;

Result:

deptmedianq1q3min_scoremax_score
Engineering9590958595
HR87.586.2588.758590
Marketing8883907892

Clean Up Test Data

After completing ranking analysis verification, it is recommended to clean up test tables:

-- Drop test table DROP TABLE IF EXISTS performance;


Important Notes

  1. Window Function Execution Order: Window functions execute after WHERE and GROUP BY; window function results cannot be used directly in WHERE. Use QUALIFY or subqueries to filter.
  2. NULL Value Sorting: ORDER BY places NULL values last (DESC) or first (ASC) by default; use NULLS FIRST/LAST to control.
  3. Performance Optimization: Choose PARTITION BY columns with moderate cardinality to avoid individual partitions becoming too large.
  4. PERCENT_RANK Syntax: PERCENT_RANK() does not accept arguments; write PERCENT_RANK() OVER (...) directly.