Lakehouse Batch Insert Guide

Overview

Inserting data into tables is a fundamental operation in data warehouse construction. Singdata Lakehouse supports multiple data insertion methods, including single-row inserts, multi-row inserts, query result inserts, and overwrite inserts. This guide categorizes insertion methods by business scenario to help you quickly master efficient data loading techniques.


SQL Commands Covered

CommandPurposeApplicable Scenario
INSERT INTO ... VALUESInsert literal dataTest data, configuration tables, small record counts
INSERT INTO ... SELECTInsert query resultsETL data transfer, batch import
INSERT OVERWRITEOverwrite writeRefresh partition data, rebuild table data

Prerequisites

The following examples use a simulated product table products:

-- Create test table
CREATE TABLE IF NOT EXISTS products (
    product_id INT,
    product_name STRING,
    category STRING,
    price DOUBLE,
    stock INT
);

Single-Row and Multi-Row Insert

Use the VALUES clause to insert one or more rows. Suitable for test data or small batches of configuration data.

-- Insert a single row
INSERT INTO products VALUES (1, 'iPhone 15', 'Phone', 8000, 100);

-- Insert multiple rows
INSERT INTO products VALUES
(2, 'MacBook Pro', 'Laptop', 15000, 50),
(3, 'AirPods', 'Audio', 1200, 200);

Verify Results:

SELECT * FROM products ORDER BY product_id;
product_idproduct_namecategorypricestock
1iPhone 15Phone8000100
2MacBook ProLaptop1500050
3AirPodsAudio1200200

Query Result Insert

Use INSERT INTO ... SELECT to batch insert query results into the target table, the most common data writing method in ETL pipelines.

-- Create target table
CREATE TABLE IF NOT EXISTS phone_products (
    product_id INT,
    product_name STRING,
    price DOUBLE
);

-- Filter and insert from source table
INSERT INTO phone_products
SELECT product_id, product_name, price
FROM products
WHERE category = 'Phone';

Verify Results:

SELECT * FROM phone_products;
product_idproduct_nameprice
1iPhone 158000

Overwrite Insert

Use INSERT OVERWRITE to overwrite existing data in the target table (or partition). Commonly used for daily data refreshes or rebuilding a specific partition.

-- Overwrite entire table data
INSERT OVERWRITE TABLE products
SELECT * FROM products WHERE stock > 0;

Specified Column Insert

When only certain columns need to be inserted, explicitly specify column names. Unspecified columns will use default values or NULL.

-- Insert only some columns
INSERT INTO products (product_id, product_name, category)
VALUES (4, 'iPad Air', 'Tablet');

Verify Results:

SELECT product_id, product_name, category, price, stock 
FROM products 
WHERE product_id = 4;
product_idproduct_namecategorypricestock
4iPad AirTabletNULLNULL

Clean Up Test Data

After completing insert verification, it is recommended to clean up test tables:

-- Drop test tables
DROP TABLE IF EXISTS products;
DROP TABLE IF EXISTS phone_products;

Important Notes

  1. Large Batch Import: For imports exceeding ten thousand rows, use COPY INTO from a Volume or use data sync tasks.
  2. Type Matching: Data types in VALUES must match the table definition. Types like DATE, TIME, JSON require literal syntax (e.g., DATE '2024-06-01').
  3. Dynamic Table Limitation: Dynamic Tables do not support direct INSERT INTO; data is refreshed automatically by upstream table changes.
  4. Transactionality: A single INSERT operation is atomic -- either all rows succeed or all fail.