Description

The GROUP BY clause is a key component in SQL queries used to group rows of the result set based on a set of specified grouping expressions. It allows the application of aggregate functions such as SUM(), COUNT(), MAX(), etc., to calculate aggregate values for each group. Lakehouse SQL supports advanced aggregation features, including GROUPING SETS, CUBE, and ROLLUP clauses, which make it possible to perform multiple aggregations on the same input record set, meeting complex data analysis needs.

Syntax

sql
SELECT ...
FROM ...  
 GROUP BY 
[ ALL |
  GROUPING SETS (group_expression) |
  CUBE(group_expression) |
  ROLLUP(group_expression)
];

Parameter Description

  • ALL: Abbreviation for Lakehouse, adds all SELECT list expressions that do not contain aggregate functions as group_expression. If no such expressions exist, GROUP BY ALL is equivalent to omitting the GROUP BY clause that results in a global aggregation.
  • group_expression: Specifies the conditions for row grouping. Rows are grouped based on the result values of the grouping expression. The grouping expression can be a column name, column position, or expression.
  • grouping_set: A grouping set is specified by zero or more comma-separated expressions within parentheses. If the grouping set has only one element, the parentheses can be omitted.
  • GROUPING SETS: Groups rows for each grouping set specified after GROUPING SETS. This clause is a shorthand for UNION ALL, where each segment of the UNION ALL operator performs the aggregation for each grouping set specified in the GROUPING SETS clause.
  • ROLLUP: Specifies multiple levels of aggregation in one statement. This clause is used to compute aggregations based on multiple grouping sets and is a shorthand for GROUPING SETS.
  • CUBE: Performs aggregation based on the combinations of grouping columns specified in the GROUP BY clause. CUBE is a shorthand for GROUPING SETS.

Example

Case 1 ALL is an option of the GROUP BY clause that adds all non-aggregate SELECT list expressions as grouping expressions. If there are no non-aggregate expressions in the SELECT list, GROUP BY ALL is equivalent to omitting the GROUP BY clause, resulting in a global aggregation.

Example:

CREATE  VIEW dealer (id, city, car_model, quantity) AS
VALUES (100, 'Fremont', 'Honda Civic', 10),
       (100, 'Fremont', 'Honda Accord', 15),
       (100, 'Fremont', 'Honda CRV', 7),
       (200, 'Dublin', 'Honda Civic', 20),
       (200, 'Dublin', 'Honda Accord', 10),
       (200, 'Dublin', 'Honda CRV', 3),
       (300, 'San Jose', 'Honda Civic', 5),
       (300, 'San Jose', 'Honda Accord', 8);
SELECT  sum(quantity) FROM dealer GROUP BY ALL;
+-----------------+
| `sum`(quantity) |
+-----------------+
| 78              |
+-----------------+
SELECT id, sum(quantity) FROM dealer GROUP BY ALL;
+-----+-----------------+
| id  | `sum`(quantity) |
+-----+-----------------+
| 200 | 33              |
| 100 | 32              |
| 300 | 13              |
+-----+-----------------+
SELECT id, city,sum(quantity) FROM dealer GROUP BY ALL;
+-----+----------+-----------------+
| id  |   city   | `sum`(quantity) |
+-----+----------+-----------------+
| 100 | Fremont  | 32              |
| 200 | Dublin   | 33              |
| 300 | San Jose | 13              |
+-----+----------+-----------------+

Case 2 grouping_set is part of the GROUP BY clause, used to specify grouping sets. It can contain a single expression or multiple expressions, separated by commas.

Example:

-- Group by city and car model, and calculate the total quantity for each group
SELECT city, car_model, sum(quantity) AS sum
    FROM dealer
    GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
    ORDER BY city;
+----------+--------------+-----+
|   city   |  car_model   | sum |
+----------+--------------+-----+
| null     | Honda Accord | 33  |
| null     | null         | 78  |
| null     | Honda CRV    | 10  |
| null     | Honda Civic  | 35  |
| Dublin   | Honda Accord | 10  |
| Dublin   | Honda CRV    | 3   |
| Dublin   | null         | 33  |
| Dublin   | Honda Civic  | 20  |
| Fremont  | Honda CRV    | 7   |
| Fremont  | Honda Civic  | 10  |
| Fremont  | Honda Accord | 15  |
| Fremont  | null         | 32  |
| San Jose | Honda Accord | 8   |
| San Jose | null         | 13  |
| San Jose | Honda Civic  | 5   |
+----------+--------------+-----+

Case 3 The ROLLUP clause is used to specify multiple levels of aggregation in a single query, and it is a shorthand form of GROUPING SETS. For example, ROLLUP(city, car_model) is equivalent to GROUPING SETS((city, car_model), (city), ()).

Example:

```sql
-- Calculate the total quantity of each car model in each city, as well as the total quantity in each city and the overall total quantity
SELECT city, car_model, sum(quantity) AS sum FROM dealer
GROUP BY ROLLUP(city, car_model)  ;
+----------+--------------+-----+
|   city   |  car_model   | sum |
+----------+--------------+-----+
| Fremont  | Honda Accord | 15  |
| San Jose | Honda Civic  | 5   |
| Fremont  | null         | 32  |
| Dublin   | null         | 33  |
| San Jose | Honda Accord | 8   |
| San Jose | null         | 13  |
| Fremont  | Honda Civic  | 10  |
| null     | null         | 78  |
| Fremont  | Honda CRV    | 7   |
| Dublin   | Honda Civic  | 20  |
| Dublin   | Honda Accord | 10  |
| Dublin   | Honda CRV    | 3   |
+----------+--------------+-----+

Case 4 The CUBE clause is used to perform aggregation based on all possible combinations of the grouping columns specified in the GROUP BY clause. It is a shorthand form of GROUPING SETS. For example, CUBE(city, car_model) is equivalent to GROUPING SETS((city, car_model), (city), (car_model), ()).

Example:

```sql
-- Calculate the total quantity of each car model in each city, the total quantity in each city, the total quantity of each car model, and the overall total quantity
SELECT city, car_model, sum(quantity) AS sum FROM dealer
GROUP BY city, car_model WITH CUBE;
+----------+--------------+-----+
|   city   |  car_model   | sum |
+----------+--------------+-----+
| Fremont  | null         | 32  |
| null     | Honda CRV    | 10  |
| Dublin   | null         | 33  |
| San Jose | null         | 13  |
| Dublin   | Honda Civic  | 20  |
| San Jose | Honda Civic  | 5   |
| null     | Honda Civic  | 35  |
| Dublin   | Honda Accord | 10  |
| San Jose | Honda Accord | 8   |
| null     | Honda Accord | 33  |
| null     | null         | 78  |
| Fremont  | Honda Accord | 15  |
| Fremont  | Honda CRV    | 7   |
| Fremont  | Honda Civic  | 10  |
| Dublin   | Honda CRV    | 3   |
+----------+--------------+-----+