Lakehouse Upsert Operation Guide (MERGE INTO)
Overview
In data warehouses, it is often necessary to update or insert data into a target table based on source data (i.e., Upsert operations). Singdata Lakehouse provides the MERGE INTO statement, which supports handling both matching row updates and non-matching row inserts in a single SQL statement. This guide categorizes usage by business scenario to help you quickly master efficient data merging methods.
Quick Navigation
- Basic Upsert -- Update on match, insert on non-match
- Conditional Update -- Update only when specific conditions are met
- Insert New Data Only -- Ignore existing records
- Delete Orphaned Data -- Delete on match, insert on non-match
- Multi-Condition Matching -- Use complex ON conditions
SQL Commands Covered
| Command | Purpose | Applicable Scenario |
|---|---|---|
MERGE INTO ... WHEN MATCHED THEN UPDATE | Update on match | Data synchronization, status updates |
MERGE INTO ... WHEN NOT MATCHED THEN INSERT | Insert on non-match | New data writing |
MERGE INTO ... WHEN MATCHED THEN DELETE | Delete on match | Clean up invalid data |
Prerequisites
The following examples use a simulated inventory table inventory (target) and daily_stock (source):
Basic Upsert
The most common scenario: match by primary key, update if exists, insert if not.
Verify Result:
| product_id | product_name | stock | last_updated |
|---|---|---|---|
| 1 | iPhone 15 | 120 | 2024-06-02 |
| 2 | MacBook Pro | 50 | 2024-06-01 |
| 3 | New Product | 200 | 2024-06-02 |
Conditional Update
Perform update operations only when specific conditions are met, avoiding unnecessary writes.
Result:
product_id = 1: New stock 120 > old stock 120 (already updated last time), condition not met, no update.product_id = 3: No match, insert new record.
Insert New Data Only
Ignore existing records and only insert new data from the source table (similar to INSERT IGNORE).
Delete Orphaned Data
When source data is marked for deletion, remove the corresponding records from the target table.
Multi-Condition Matching
Use complex ON conditions for matching, suitable for composite primary keys or business logic matching.
Clean Up Test Data
After completing Upsert verification, it is recommended to clean up test tables:
Important Notes
- Clause Order:
WHEN MATCHEDmust come beforeWHEN NOT MATCHED, otherwise a syntax error occurs. - Uniqueness Requirement: The
ONcondition should ensure each source row matches at most one target row; otherwise non-deterministic results may occur. - Dynamic Table Limitation: Dynamic Tables do not support direct
MERGE INTO; useDELETE + INSERTas an alternative. - Performance Optimization: Ensure columns in the
ONcondition have indexes or partitions to significantly improve matching efficiency.
