Set Operations (UNION / INTERSECT / EXCEPT)
Description
Set operations are used to combine the results of multiple SELECT queries into a single result set. Lakehouse supports the following set operations:
| Operation | Description |
|---|---|
UNION | Returns the union of two query results, with duplicates automatically removed |
UNION ALL | Returns the union of two query results, retaining duplicate rows |
INTERSECT | Returns the intersection of two query results, with duplicates automatically removed |
INTERSECT ALL | Returns the intersection of two query results, retaining duplicate rows |
EXCEPT | Returns rows present in the first result but not in the second, with duplicates automatically removed |
EXCEPT ALL | Returns rows present in the first result but not in the second, retaining duplicate rows |
MINUS | Synonym for EXCEPT |
Syntax
Key Rules:
- When
ALLorDISTINCTis not specified, the default isDISTINCT(duplicates removed). - Multiple set operations can be chained and are evaluated from left to right.
INTERSECThas higher precedence thanUNIONandEXCEPT. That is,A UNION B INTERSECT Cis equivalent toA UNION (B INTERSECT C).ORDER BYandLIMITclauses apply to the entire result set and must be written after the lastSELECT.
Requirements
- All
SELECTstatements participating in set operations must return the same number of columns. - Columns in corresponding positions must have compatible data types.
- The column names of the result set are taken from the first
SELECTstatement.
Usage Examples
UNION — Merge with Deduplication
Result (the duplicate row (1, 'a') is kept only once):
UNION ALL — Retain Duplicate Rows
Result:
INTERSECT — Get Intersection
Result:
EXCEPT / MINUS — Get Difference
Result:
MINUS is completely equivalent to EXCEPT:
The result is the same (returns 1).
Chained Set Operations with ORDER BY
Result (UNION ALL retains all rows, including duplicates):
Business Table Example — Merging Active and Inactive Employees
INTERSECT — Find Employees in Two Departments
Result (no employee with the same name exists in both departments, returns an empty set):
Notes
UNION(withoutALL) deduplicates the result set, which has a higher performance overhead thanUNION ALL. If deduplication is not needed,UNION ALLis recommended.- When using
ORDER BYorLIMITin a subquery, wrap that subquery in parentheses. MINUSis an alias forEXCEPTwith identical behavior; they are interchangeable.- When
INTERSECT ALLandEXCEPT ALLretain duplicate rows, the number of duplicates is the smaller of the two result sets.
