Lakehouse File Batch Import/Export Guide (COPY INTO)
Overview
COPY INTO is a high-performance batch data import/export command provided by Singdata Lakehouse. It supports batch loading of CSV, Parquet, JSON, and other format files from Volumes or external storage into tables, as well as exporting table data to files. This guide categorizes usage by business scenario to help you quickly master efficient data transfer methods.
Quick Navigation
- Import CSV from Volume -- Batch load structured files
- Import Parquet from Volume -- Efficiently load columnar files
- Export Table Data to Volume -- Save query results as files
- Handle Import Errors -- Use ON_ERROR to control fault tolerance strategy
- View Load History -- Audit file import records
SQL Commands Covered
| Command | Purpose | Applicable Scenario |
|---|---|---|
COPY INTO table FROM VOLUME ... | Import files from Volume | Batch load CSV/Parquet/JSON |
COPY INTO VOLUME ... FROM table | Export table data to Volume | Data backup, external system exchange |
load_history('table_name') | View load history | Audit import records and file status |
Prerequisites
The following examples use a simulated target table copy_target and hypothetical Volume files:
Import CSV from Volume
Use COPY INTO to batch import CSV files from a User Volume.
Parameter Descriptions:
header = 'true': Skip the first header row.delimiter = ',': Specify the delimiter (default is comma).
Import Parquet from Volume
Parquet is a columnar storage format whose import performance typically exceeds CSV.
Advantages:
- Automatic schema inference, no manual column definition required.
- High compression ratio, faster network transfer and parsing.
Export Table Data to Volume
Use COPY INTO VOLUME to export table data as files in various formats.
Applicable Scenarios:
- Data backup to object storage
- Provide data files for downstream systems
- Machine learning feature export
Handle Import Errors
Use the ON_ERROR parameter to control the handling strategy when format errors are encountered.
Available Strategies:
SKIP(default): Skip erroneous rows and continue import.ABORT: Immediately terminate import on error.CONTINUE: Ignore errors, import as much as possible.
View Load History
Use the load_history function to view file import records for a table.
Returned Information:
file_name: Imported file namefile_size: File sizerows_loaded: Number of successfully loaded rowsstatus: Import status (SUCCESS / PARTIAL / FAILED)
Clean Up Test Data
After completing import/export verification, it is recommended to clean up test tables:
Important Notes
- Idempotency:
COPY INTOrecords imported files; repeated execution will not re-import the same files. - Schema Matching: The column order in imported files must match the table definition, or use
MATCH_BY_COLUMN_NAME = 'CASE_SENSITIVE'to match by name. - Large File Splitting: Split very large files into multiple smaller files (e.g., 100MB-1GB) to improve parallel import performance.
- Permission Requirements: Executing
COPY INTOrequiresINSERTpermission on the target table andREADpermission on the Volume. - Export Overwrite: By default, exporting to an existing path appends files. Use
COPY OVERWRITE INTOto overwrite.
