Description
The GROUP BY
clause is a key component in SQL queries used to group rows of the result set based on a set of specified grouping expressions. It allows the application of aggregate functions such as SUM()
, COUNT()
, MAX()
, etc., to calculate aggregate values for each group. Lakehouse SQL supports advanced aggregation features, including GROUPING SETS
, CUBE
, and ROLLUP
clauses, which make it possible to perform multiple aggregations on the same input record set, meeting complex data analysis needs.
Syntax
sql
SELECT ...
FROM ...
GROUP BY
[ ALL |
GROUPING SETS (group_expression) |
CUBE(group_expression) |
ROLLUP(group_expression)
];
Parameter Description
ALL
: Abbreviation for Lakehouse, adds all SELECT
list expressions that do not contain aggregate functions as group_expression
. If no such expressions exist, GROUP BY ALL
is equivalent to omitting the GROUP BY
clause that results in a global aggregation.
group_expression
: Specifies the conditions for row grouping. Rows are grouped based on the result values of the grouping expression. The grouping expression can be a column name, column position, or expression.
grouping_set
: A grouping set is specified by zero or more comma-separated expressions within parentheses. If the grouping set has only one element, the parentheses can be omitted.
GROUPING SETS
: Groups rows for each grouping set specified after GROUPING SETS
. This clause is a shorthand for UNION ALL
, where each segment of the UNION ALL
operator performs the aggregation for each grouping set specified in the GROUPING SETS
clause.
ROLLUP
: Specifies multiple levels of aggregation in one statement. This clause is used to compute aggregations based on multiple grouping sets and is a shorthand for GROUPING SETS
.
CUBE
: Performs aggregation based on the combinations of grouping columns specified in the GROUP BY
clause. CUBE
is a shorthand for GROUPING SETS
.
Example
Case 1
ALL
is an option of the GROUP BY
clause that adds all non-aggregate SELECT
list expressions as grouping expressions. If there are no non-aggregate expressions in the SELECT
list, GROUP BY ALL
is equivalent to omitting the GROUP BY
clause, resulting in a global aggregation.
Example:
CREATE VIEW dealer (id, city, car_model, quantity) AS
VALUES (100, 'Fremont', 'Honda Civic', 10),
(100, 'Fremont', 'Honda Accord', 15),
(100, 'Fremont', 'Honda CRV', 7),
(200, 'Dublin', 'Honda Civic', 20),
(200, 'Dublin', 'Honda Accord', 10),
(200, 'Dublin', 'Honda CRV', 3),
(300, 'San Jose', 'Honda Civic', 5),
(300, 'San Jose', 'Honda Accord', 8);
SELECT sum(quantity) FROM dealer GROUP BY ALL;
+-----------------+
| `sum`(quantity) |
+-----------------+
| 78 |
+-----------------+
SELECT id, sum(quantity) FROM dealer GROUP BY ALL;
+-----+-----------------+
| id | `sum`(quantity) |
+-----+-----------------+
| 200 | 33 |
| 100 | 32 |
| 300 | 13 |
+-----+-----------------+
SELECT id, city,sum(quantity) FROM dealer GROUP BY ALL;
+-----+----------+-----------------+
| id | city | `sum`(quantity) |
+-----+----------+-----------------+
| 100 | Fremont | 32 |
| 200 | Dublin | 33 |
| 300 | San Jose | 13 |
+-----+----------+-----------------+
Case 2
grouping_set
is part of the GROUP BY
clause, used to specify grouping sets. It can contain a single expression or multiple expressions, separated by commas.
Example:
-- Group by city and car model, and calculate the total quantity for each group
SELECT city, car_model, sum(quantity) AS sum
FROM dealer
GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
ORDER BY city;
+----------+--------------+-----+
| city | car_model | sum |
+----------+--------------+-----+
| null | Honda Accord | 33 |
| null | null | 78 |
| null | Honda CRV | 10 |
| null | Honda Civic | 35 |
| Dublin | Honda Accord | 10 |
| Dublin | Honda CRV | 3 |
| Dublin | null | 33 |
| Dublin | Honda Civic | 20 |
| Fremont | Honda CRV | 7 |
| Fremont | Honda Civic | 10 |
| Fremont | Honda Accord | 15 |
| Fremont | null | 32 |
| San Jose | Honda Accord | 8 |
| San Jose | null | 13 |
| San Jose | Honda Civic | 5 |
+----------+--------------+-----+
Case 3
The ROLLUP
clause is used to specify multiple levels of aggregation in a single query, and it is a shorthand form of GROUPING SETS
. For example, ROLLUP(city, car_model)
is equivalent to GROUPING SETS((city, car_model), (city), ())
.
Example:
```sql
-- Calculate the total quantity of each car model in each city, as well as the total quantity in each city and the overall total quantity
SELECT city, car_model, sum(quantity) AS sum FROM dealer
GROUP BY ROLLUP(city, car_model) ;
+----------+--------------+-----+
| city | car_model | sum |
+----------+--------------+-----+
| Fremont | Honda Accord | 15 |
| San Jose | Honda Civic | 5 |
| Fremont | null | 32 |
| Dublin | null | 33 |
| San Jose | Honda Accord | 8 |
| San Jose | null | 13 |
| Fremont | Honda Civic | 10 |
| null | null | 78 |
| Fremont | Honda CRV | 7 |
| Dublin | Honda Civic | 20 |
| Dublin | Honda Accord | 10 |
| Dublin | Honda CRV | 3 |
+----------+--------------+-----+
Case 4
The CUBE
clause is used to perform aggregation based on all possible combinations of the grouping columns specified in the GROUP BY
clause. It is a shorthand form of GROUPING SETS
. For example, CUBE(city, car_model)
is equivalent to GROUPING SETS((city, car_model), (city), (car_model), ())
.
Example:
```sql
-- Calculate the total quantity of each car model in each city, the total quantity in each city, the total quantity of each car model, and the overall total quantity
SELECT city, car_model, sum(quantity) AS sum FROM dealer
GROUP BY city, car_model WITH CUBE;
+----------+--------------+-----+
| city | car_model | sum |
+----------+--------------+-----+
| Fremont | null | 32 |
| null | Honda CRV | 10 |
| Dublin | null | 33 |
| San Jose | null | 13 |
| Dublin | Honda Civic | 20 |
| San Jose | Honda Civic | 5 |
| null | Honda Civic | 35 |
| Dublin | Honda Accord | 10 |
| San Jose | Honda Accord | 8 |
| null | Honda Accord | 33 |
| null | null | 78 |
| Fremont | Honda Accord | 15 |
| Fremont | Honda CRV | 7 |
| Fremont | Honda Civic | 10 |
| Dublin | Honda CRV | 3 |
+----------+--------------+-----+