Lakehouse Upsert Operation Guide (MERGE INTO)

Overview

In data warehouses, it is often necessary to update or insert data into a target table based on source data (i.e., Upsert operations). Singdata Lakehouse provides the MERGE INTO statement, which supports handling both matching row updates and non-matching row inserts in a single SQL statement. This guide categorizes usage by business scenario to help you quickly master efficient data merging methods.


SQL Commands Covered

CommandPurposeApplicable Scenario
MERGE INTO ... WHEN MATCHED THEN UPDATEUpdate on matchData synchronization, status updates
MERGE INTO ... WHEN NOT MATCHED THEN INSERTInsert on non-matchNew data writing
MERGE INTO ... WHEN MATCHED THEN DELETEDelete on matchClean up invalid data

Prerequisites

The following examples use a simulated inventory table inventory (target) and daily_stock (source):

-- Create target table
CREATE TABLE IF NOT EXISTS inventory (
    product_id INT,
    product_name STRING,
    stock INT,
    last_updated DATE
);

-- Create source table
CREATE TABLE IF NOT EXISTS daily_stock (
    product_id INT,
    new_stock INT,
    update_date DATE
);

-- Insert initial data
INSERT INTO inventory VALUES
(1, 'iPhone 15', 100, '2024-06-01'),
(2, 'MacBook Pro', 50, '2024-06-01');

-- Insert daily stock data
INSERT INTO daily_stock VALUES
(1, 120, '2024-06-02'),  -- Update existing product
(3, 200, '2024-06-02');  -- New product

Basic Upsert

The most common scenario: match by primary key, update if exists, insert if not.

-- Merge daily stock data into the inventory table
MERGE INTO inventory AS target
USING daily_stock AS source
ON target.product_id = source.product_id
WHEN MATCHED THEN 
    UPDATE SET 
        target.stock = source.new_stock,
        target.last_updated = source.update_date
WHEN NOT MATCHED THEN 
    INSERT (product_id, product_name, stock, last_updated)
    VALUES (source.product_id, 'New Product', source.new_stock, source.update_date);

Verify Result:

SELECT * FROM inventory ORDER BY product_id;
product_idproduct_namestocklast_updated
1iPhone 151202024-06-02
2MacBook Pro502024-06-01
3New Product2002024-06-02

Conditional Update

Perform update operations only when specific conditions are met, avoiding unnecessary writes.

-- Update only when new stock is greater than old stock
MERGE INTO inventory AS target
USING daily_stock AS source
ON target.product_id = source.product_id
WHEN MATCHED AND source.new_stock > target.stock THEN 
    UPDATE SET 
        target.stock = source.new_stock,
        target.last_updated = source.update_date
WHEN NOT MATCHED THEN 
    INSERT (product_id, product_name, stock, last_updated)
    VALUES (source.product_id, 'New Product', source.new_stock, source.update_date);

Result:

  • product_id = 1: New stock 120 > old stock 120 (already updated last time), condition not met, no update.
  • product_id = 3: No match, insert new record.

Insert New Data Only

Ignore existing records and only insert new data from the source table (similar to INSERT IGNORE).

-- Insert only non-matching records
MERGE INTO inventory AS target
USING daily_stock AS source
ON target.product_id = source.product_id
WHEN NOT MATCHED THEN 
    INSERT (product_id, product_name, stock, last_updated)
    VALUES (source.product_id, 'New Product', source.new_stock, source.update_date);

Delete Orphaned Data

When source data is marked for deletion, remove the corresponding records from the target table.

-- Assume stock = 0 in daily_stock means delisted
MERGE INTO inventory AS target
USING daily_stock AS source
ON target.product_id = source.product_id
WHEN MATCHED AND source.new_stock = 0 THEN DELETE
WHEN NOT MATCHED THEN 
    INSERT (product_id, product_name, stock, last_updated)
    VALUES (source.product_id, 'New Product', source.new_stock, source.update_date);

Multi-Condition Matching

Use complex ON conditions for matching, suitable for composite primary keys or business logic matching.

-- Match by product ID and date
MERGE INTO inventory AS target
USING daily_stock AS source
ON target.product_id = source.product_id 
   AND target.last_updated < source.update_date
WHEN MATCHED THEN 
    UPDATE SET 
        target.stock = source.new_stock,
        target.last_updated = source.update_date
WHEN NOT MATCHED THEN 
    INSERT (product_id, product_name, stock, last_updated)
    VALUES (source.product_id, 'New Product', source.new_stock, source.update_date);

Clean Up Test Data

After completing Upsert verification, it is recommended to clean up test tables:

-- Drop test tables
DROP TABLE IF EXISTS inventory;
DROP TABLE IF EXISTS daily_stock;

Important Notes

  1. Clause Order: WHEN MATCHED must come before WHEN NOT MATCHED, otherwise a syntax error occurs.
  2. Uniqueness Requirement: The ON condition should ensure each source row matches at most one target row; otherwise non-deterministic results may occur.
  3. Dynamic Table Limitation: Dynamic Tables do not support direct MERGE INTO; use DELETE + INSERT as an alternative.
  4. Performance Optimization: Ensure columns in the ON condition have indexes or partitions to significantly improve matching efficiency.