Window Functions
Window Functions are powerful SQL functions that allow users to perform grouped calculations on a dataset within a query. Unlike traditional aggregate functions, window functions can calculate each row of data while retaining the detailed information of the original data. Window functions have a wide range of applications in data analysis, report generation, and other scenarios, such as calculating rankings, cumulative sums, moving averages, etc.
Basic Syntax of Window Functions
The basic syntax of window functions is as follows:
<window_function>
: Specify the window function to use, such asROW_NUMBER()
,RANK()
,SUM()
, etc.PARTITION BY
: Divide the dataset into different partitions, each partition being an independent calculation scope.ORDER BY
: Sort the data within each partition, affecting the calculation results of some window functions.
Classification of Window Functions
Based on functionality and return value type, window functions can be divided into the following categories:
- Ranking Functions: Rank the data and return an integer value representing the relative position. Common ranking functions include
ROW_NUMBER()
,RANK()
,DENSE_RANK()
,PERCENT_RANK()
,CUME_DIST()
,NTILE()
, etc. - Aggregate Functions: Perform aggregate calculations on the data and return a value representing the overall characteristics. Lakehouse supports all standard aggregate functions.
- Analytic Functions: Perform analytical calculations on the data and return a value representing the data characteristics. Common analytic functions include
FIRST_VALUE()
,LAST_VALUE()
,LAG()
,LEAD()
,NTH_VALUE()
,CUME_DIST()
, etc.
Examples of Using Window Functions
Below are some examples of SQL queries using window functions and their output results. Suppose we have a table named sales that contains data on monthly sales and profit margins, as shown below:
Ranking Functions
- Query: Use the
ROW_NUMBER()
function to rank sales in descending order.
- Query: Use the
RANK()
function to rank the profit margin in ascending order and group by quarter.
Aggregate Functions
- Query: Use the
AVG()
function to calculate the difference between the monthly sales and the annual average sales.
Analytic Functions
- Query: Use the
LAG()
function andLEAD()
function to calculate the difference in sales between each month and the previous month and the next month.
- Query: Use the
CUME_DIST()
function to calculate the cumulative distribution of monthly sales in the annual sales.