Window Function
Overview
Window Functions are a powerful analytical feature in Singdata Lakehouse SQL that allow you to perform calculations across a set of related rows, rather than just on individual rows. By using the WINDOW clause, you can define one or more named windows and then reference those windows in window functions, avoiding repetitive window specifications and making queries more concise and maintainable.
Syntax
Parameter Description
- window_name: The name of the window, which must be a valid identifier and must not conflict with table or column names
- window_specification: The window specification, defining the scope and ordering of the window, consisting of:
PARTITION BY: Specifies how data is grouped; each group is a partitionORDER BY: Specifies how data within each partition is orderedframe_clause: Specifies the window frame, limiting the computation range of the window function
Window Frame Types
- ROWS Frame: Window frame based on row count
- RANGE Frame: Window frame based on value range
Environment Preparation
Create Test Table
Insert Test Data
Usage Examples
Example 1: Moving Window Aggregation - Moving Sum and Moving Average
Business Scenario: Calculate the sum and average of sales for the current month and the two preceding months
Execution Result:
Explanation: Window w defines a sliding window containing the current row and the 2 preceding rows, so each row computes the sum and average of the most recent 3 months of sales.
Example 2: Multi-Window Ranking - Quarterly and Annual Ranking
Business Scenario: Simultaneously compute ranking within each quarter and across the full year
Execution Result:
Explanation:
- The w1 window partitions by quarter (every 3 months), computing sales ranking within each quarter
- The w2 window has no partition, computing sales ranking across the full year
- CEIL(month/3) divides months into Q1 (months 1-3), Q2 (months 4-6), etc.
Example 3: Cumulative Aggregation - Year-to-Date Statistics
Business Scenario: Calculate cumulative sales and average profit margin from the beginning of the year to the current month
Execution Result:
Explanation: UNBOUNDED PRECEDING starts from the first row of the partition, and CURRENT ROW ends at the current row, achieving cumulative computation.
Example 4: Time Series Analysis - Period-over-Period Comparison
Business Scenario: View each month's sales compared to the previous and next month, and the period-over-period change
Execution Result:
Explanation:
LAG(sales, 1): Gets the sales from the previous rowLEAD(sales, 1): Gets the sales from the next row- prev_sales and next_sales are NULL for the first and last rows, respectively
Example 5: Comparison of Different Ranking Functions
Business Scenario: Understand the differences between various ranking functions
Execution Result:
Ranking Function Descriptions:
ROW_NUMBER(): Unique sequential row numbers; rows with the same value do not tieRANK(): Same value gets the same rank; the next rank skipsDENSE_RANK(): Same value gets the same rank; the next rank is consecutivePERCENT_RANK(): Returns a percentile rank between 0 and 1
Example 6: Position Functions - Accessing Values at Specific Positions
Business Scenario: Get the first, last, and nth values within the window
Execution Result:
Explanation:
FIRST_VALUE(): Returns the value of the first row in the windowLAST_VALUE(): Returns the value of the last row in the window (requires UNBOUNDED FOLLOWING to see the actual last row)NTH_VALUE(expr, n): Returns the value of the nth row in the window
Example 7: RANGE Window Frame - Value-Range-Based Windows
Business Scenario: Find all records within a sales value range of +/-10 and compute statistics
Execution Result:
Explanation:
- RANGE windows are based on value ranges rather than row counts
- For the row with sales=100, the window includes all rows with sales values in the range [90, 110]
- For the row with sales=150, only itself falls within the range [140, 160]
Example 8: Comprehensive Analysis - Multi-Dimensional Business Reports (Advanced)
Business Scenario: Generate a comprehensive analytical report including ranking, cumulative values, period-over-period comparison, and proportions
Execution Result:
Explanation:
- This example uses 3 different window definitions simultaneously
- w1: Ordered by month, used for ranking and period-over-period comparison
- w2: Cumulative window, used for YTD (Year-To-Date) calculations
- w3: Full window, used for computing total sum and calculating proportions
- Demonstrates how to achieve complex multi-dimensional analysis in a single query
Example 9: Data Bucketing - NTILE Function Application
Business Scenario: Divide sales data into high, medium, and low tiers based on performance
Execution Result:
Explanation: NTILE(n) divides ordered data into n buckets as evenly as possible, commonly used for tiered analysis, ABC classification, etc.
Example 10: Cumulative Distribution - CUME_DIST Function Application
Business Scenario: Calculate the cumulative distribution percentile of each sales value within the dataset
Execution Result:
Explanation: CUME_DIST() returns the ratio of rows with values less than or equal to the current row's value to the total number of rows, indicating the cumulative distribution position of the current value within the dataset.
Window Frame Details
ROWS vs RANGE
| Frame Type | Description | Example |
|---|---|---|
| ROWS | Based on physical row count | ROWS BETWEEN 2 PRECEDING AND CURRENT ROW |
| RANGE | Based on logical value range | RANGE BETWEEN 10 PRECEDING AND 10 FOLLOWING |
Boundary Keywords
UNBOUNDED PRECEDING: The first row of the partitionUNBOUNDED FOLLOWING: The last row of the partitionCURRENT ROW: The current rown PRECEDING: The nth row before the current row (for ROWS) or rows with value - n (for RANGE)n FOLLOWING: The nth row after the current row (for ROWS) or rows with value + n (for RANGE)
Common Window Function Categories
Aggregate Functions
SUM(),AVG(),COUNT(),MAX(),MIN()
Ranking Functions
ROW_NUMBER(): Unique sequential row numbersRANK(): Same value gets same rank, with gapsDENSE_RANK(): Same value gets same rank, without gapsPERCENT_RANK(): Percentile rankingNTILE(n): Divides data into n bucketsCUME_DIST(): Cumulative distribution value
Value Access Functions
LAG(): Access preceding rowsLEAD(): Access following rowsFIRST_VALUE(): First value in the windowLAST_VALUE(): Last value in the windowNTH_VALUE(): The nth value in the window
Best Practices
1. Use the WINDOW Clause to Improve Readability
Not recommended (repeated window definitions):
Recommended (using the WINDOW clause):
2. Use PARTITION BY Appropriately
For grouped analysis, use PARTITION BY to perform intra-group calculations without breaking the data structure:
Execution Result:
Explanation: Rankings are computed independently within each category without affecting each other.
3. Pay Attention to LAST_VALUE Window Scope
To get the actual last value, you must use UNBOUNDED FOLLOWING:
4. Specific Considerations
In Lakehouse, it is recommended to add a LIMIT clause at the end of the query to improve performance:
Performance Optimization Suggestions
- Index Optimization: Create appropriate indexes on columns used in ORDER BY and PARTITION BY
- Window Scope: Use smaller window scopes whenever possible to reduce computation
- Reuse Window Definitions: Use the WINDOW clause to avoid redefining identical windows
- Partitioning Strategy: Use PARTITION BY appropriately to reduce the data volume within each window
Frequently Asked Questions
Q1: Where must the WINDOW clause be placed?
A: The WINDOW clause must be placed after the FROM and WHERE clauses, and before the ORDER BY clause.
Q2: Can multiple windows be defined in the same query?
A: Yes, use commas to separate multiple window definitions.
Q3: What is the main difference between ROWS and RANGE?
A: ROWS is based on physical row count, while RANGE is based on logical value range. When there are duplicate values, RANGE will include all rows with the same value.
Q4: Why doesn't LAST_VALUE return the expected last value?
A: You need to explicitly specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to include the entire partition.
