JOIN
JOIN is used to merge data from two or more tables based on specified conditions. Lakehouse supports the following JOIN types:
left_table_reference and right_table_reference are the two tables (or subqueries) participating in the JOIN, join_type specifies the join type, and join_criteria specifies the join condition.
JOIN Types
| Type | Description |
|---|---|
| INNER JOIN | Returns rows from both tables that satisfy the join condition (intersection); this is the default JOIN type |
| LEFT [OUTER] JOIN | Returns all rows from the left table; fills with NULL when there is no match in the right table |
| RIGHT [OUTER] JOIN | Returns all rows from the right table; fills with NULL when there is no match in the left table |
| FULL [OUTER] JOIN | Returns all rows from both tables; fills with NULL on the side with no match |
| [LEFT] SEMI JOIN | Returns rows from the left table that have a match in the right table; does not return right table columns |
| [LEFT] ANTI JOIN | Returns rows from the left table that have no match in the right table; does not return right table columns |
| CROSS JOIN | Returns the Cartesian product of both tables (all combinations of rows) |
| NATURAL JOIN | Performs an implicit equi-join on all columns with the same name in both tables; no condition needs to be specified |
JOIN Conditions
ON boolean_expression: Specifies any boolean expression as the join condition. Subqueries are not supported in JOIN conditions.USING (column_name [, ...]): Specifies one or more column names for an equi-join; these columns must exist in both tables.
Test Data Description
The following examples use two tables in the doc_test schema:
employees(id, name, dept, salary, hire_date, is_active)— 5 employee recordsdepartments(dept_id, dept_name, manager)— 3 departments
employees.dept is joined with departments.dept_name. The Dave employee belongs to the Marketing department which exists in departments, but the HR department in departments has only 1 employee, and departments also has an extra Finance department (no employees belong to it in employees) — this clearly demonstrates how each JOIN type handles unmatched rows.
Usage Examples
INNER JOIN
Returns rows where dept = dept_name matches in both employees and departments.
Result (only employees with matches in both tables):
LEFT [OUTER] JOIN
Returns all rows from employees; fills with NULL when there is no match in departments.
Result (all employees are retained; dept_id/manager are NULL if dept has no corresponding record in departments):
RIGHT [OUTER] JOIN
Returns all rows from departments; fills with NULL when there is no match in employees.
Result (all departments are retained; id/name are NULL if a department has no employees):
FULL [OUTER] JOIN
Returns all rows from both tables; fills with NULL on the side with no match.
Result (both the employee side and department side are fully retained; corresponding columns are NULL when there is no match):
[LEFT] SEMI JOIN
Returns rows from the left table that have a match in the right table; does not return right table columns.
Result (only employee information is returned, without department table columns):
[LEFT] ANTI JOIN
Returns rows from the left table that have no match in the right table; does not return right table columns.
Result (empty result when no employees have a dept that does not exist in departments):
If there are employees in employees whose department is not in departments, those employees will appear in the result.
CROSS JOIN
Returns the Cartesian product of both tables; every row is combined with every other row. Result rows = left table rows × right table rows.
Result (5 employees × 3 departments = 15 rows; first 9 rows shown here):
NATURAL JOIN
Performs an implicit equi-join based on all columns with the same name in both tables; no ON or USING condition needs to be specified.
USING Syntax
When the join columns have the same name in both tables, you can use USING instead of ON; the column appears only once in the result set:
Notes
- JOIN conditions (
ONclause) do not support subqueries. - SEMI JOIN and ANTI JOIN only return left table columns; referencing right table columns in SELECT will cause an error.
- FULL OUTER JOIN can have significant performance overhead in some distributed engines; pay attention to the execution plan when dealing with large data volumes.
- CROSS JOIN does not require an ON condition; if you accidentally write
JOINwithout anON, the optimizer will treat it as a CROSS JOIN.
