Lakehouse File Batch Import/Export Guide (COPY INTO)

Overview

COPY INTO is a high-performance batch data import/export command provided by Singdata Lakehouse. It supports batch loading of CSV, Parquet, JSON, and other format files from Volumes or external storage into tables, as well as exporting table data to files. This guide categorizes usage by business scenario to help you quickly master efficient data transfer methods.


SQL Commands Covered

CommandPurposeApplicable Scenario
COPY INTO table FROM VOLUME ...Import files from VolumeBatch load CSV/Parquet/JSON
COPY INTO VOLUME ... FROM tableExport table data to VolumeData backup, external system exchange
load_history('table_name')View load historyAudit import records and file status

Prerequisites

The following examples use a simulated target table copy_target and hypothetical Volume files:

-- Create target table
CREATE TABLE IF NOT EXISTS copy_target (
    id INT,
    name STRING,
    value DOUBLE
);

Import CSV from Volume

Use COPY INTO to batch import CSV files from a User Volume.

-- Import a CSV file (with header)
COPY INTO copy_target
FROM VOLUME 'volume:user//~/data.csv'
USING CSV OPTIONS ('header' = 'true');

Parameter Descriptions:

  • header = 'true': Skip the first header row.
  • delimiter = ',': Specify the delimiter (default is comma).

Import Parquet from Volume

Parquet is a columnar storage format whose import performance typically exceeds CSV.

-- Import a Parquet file
COPY INTO copy_target
FROM VOLUME 'volume:user//~/data.parquet'
USING PARQUET;

Advantages:

  • Automatic schema inference, no manual column definition required.
  • High compression ratio, faster network transfer and parsing.

Export Table Data to Volume

Use COPY INTO VOLUME to export table data as files in various formats.

-- Export as a CSV file
COPY INTO VOLUME 'volume:user//~/export_data.csv'
FROM copy_target
USING CSV OPTIONS ('header' = 'true');

Applicable Scenarios:

  • Data backup to object storage
  • Provide data files for downstream systems
  • Machine learning feature export

Handle Import Errors

Use the ON_ERROR parameter to control the handling strategy when format errors are encountered.

-- Skip erroneous rows and continue import
COPY INTO copy_target
FROM VOLUME 'volume:user//~/data_with_errors.csv'
USING CSV OPTIONS ('header' = 'true')
ON_ERROR = 'SKIP';

Available Strategies:

  • SKIP (default): Skip erroneous rows and continue import.
  • ABORT: Immediately terminate import on error.
  • CONTINUE: Ignore errors, import as much as possible.

View Load History

Use the load_history function to view file import records for a table.

-- View load history
SELECT * FROM load_history('copy_target');

Returned Information:

  • file_name: Imported file name
  • file_size: File size
  • rows_loaded: Number of successfully loaded rows
  • status: Import status (SUCCESS / PARTIAL / FAILED)

Clean Up Test Data

After completing import/export verification, it is recommended to clean up test tables:

-- Drop test table
DROP TABLE IF EXISTS copy_target;

Important Notes

  1. Idempotency: COPY INTO records imported files; repeated execution will not re-import the same files.
  2. Schema Matching: The column order in imported files must match the table definition, or use MATCH_BY_COLUMN_NAME = 'CASE_SENSITIVE' to match by name.
  3. Large File Splitting: Split very large files into multiple smaller files (e.g., 100MB-1GB) to improve parallel import performance.
  4. Permission Requirements: Executing COPY INTO requires INSERT permission on the target table and READ permission on the Volume.
  5. Export Overwrite: By default, exporting to an existing path appends files. Use COPY OVERWRITE INTO to overwrite.