SQL Window Function

Description

Window functions are a powerful analytical feature in SQL statements that allow you to perform calculations over a set of related rows, rather than just individual rows. By using the WINDOW clause, you can define one or more named windows, and then reference these windows in window functions to avoid rewriting the same window specification. This makes handling complex data sets in queries simpler and more efficient.

Syntax

WINDOW <window_name> AS (<window_specification>, ...)

其中,<window_name> 是为窗口指定的名称,它必须是一个有效的标识符,不能与表名或列名冲突。<window_specification> 是定义窗口范围和顺序的子句,它可以包含以下几个部分:

  • PARTITION BY 子句:指定如何将数据分成不同的组,每个组是一个分区(partition)。
  • ORDER BY 子句:指定在每个分区中如何对数据进行排序,排序顺序会影响一些窗口函数的结果,例如 ROW_NUMBER()RANK() 等。
  • frame_clause 子句:指定窗口的一个子集,即窗口框架(window frame),它可以用来限制窗口函数的计算范围,只对窗口框架中的数据进行计算。frame_clause 子句有两种形式:ROWS frameRANGE frame,它们的语法和含义请参考 Window Frame 的文档。

Example of Using Window Clause

以下是一些使用window clause的SQL语句的示例,以及它们的输出结果。假设我们有一个名为sales的表,它包含了每个月的销售额和利润率的数据,如下所示:

create table sales(month int,sales int,profit double);
INSERT INTO sales (month, sales, profit) VALUES
(1, 100, 0.1),
(2, 120, 0.15),
(3, 80, 0.05),
(4, 150, 0.2),
(5, 90, 0.1),
(6, 110, 0.12);
  • Query: Use the window clause to define a window named w, which is ordered by month and includes the current row and the previous two rows of data. Then use the SUM() function and the AVG() function to calculate the total and average sales for each month.
SELECT month, sales, SUM(sales) OVER w AS sum_sales, AVG(sales) OVER w AS avg_sales
FROM sales
WINDOW w AS (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW);



+-------+-------+-----------+--------------------+
| month | sales | sum_sales |     avg_sales      |
+-------+-------+-----------+--------------------+
| 1     | 100   | 100       | 100.0              |
| 2     | 120   | 220       | 110.0              |
| 3     | 80    | 300       | 100.0              |
| 4     | 150   | 350       | 116.66666666666667 |
| 5     | 90    | 320       | 106.66666666666667 |
| 6     | 110   | 350       | 116.66666666666667 |
+-------+-------+-----------+--------------------+
  • Query: Define two windows named w1 and w2 using the window clause, which group and sort by quarter and month respectively, then use the RANK() function and ROW_NUMBER() function to calculate the sales ranking for each month within the quarter and the entire year.
SELECT month, sales, RANK() OVER w1 AS rank_quarter, ROW_NUMBER() OVER w2 AS rank_year
FROM sales
WINDOW w1 AS (PARTITION BY CEIL(month / 3) ORDER BY sales DESC),
       w2 AS (ORDER BY sales DESC);
       
       
+-------+-------+--------------+-----------+
| month | sales | rank_quarter | rank_year |
+-------+-------+--------------+-----------+
| 4     | 150   | 1            | 1         |
| 2     | 120   | 1            | 2         |
| 6     | 110   | 2            | 3         |
| 1     | 100   | 2            | 4         |
| 5     | 90    | 3            | 5         |
| 3     | 80    | 3            | 6         |
+-------+-------+--------------+-----------+