SQL Insert Guide

Description

The INSERT INTO statement is used to insert data into a table. You can explicitly specify values for each column in the table, or use the result of a SELECT query as the data source for the insertion.

Syntax


INSERT INTO|OVERWRITE [TABLE] table_name 
[ PARTITION partition_spec] 
[ (column1, column2, ...)]
{VALUES(value1 [,...],(value2 [,...]),...) | subquery}

partition_spec ::=
    partition_column_name = partition_column_val [ , ... ]

Parameter Details

INSERT INTO: Insert data in append mode.
INSERT OVERWRITE: Delete the data in the target table first, then insert new data.
- For partitioned tables, it will overwrite the data of the specified partition.
- For non-partitioned tables, it will overwrite the entire table's data.
TABLE: (Optional) Keyword used to specify the target table.
partition_spec: (Optional) Partition specification used to specify the partition for inserting data.
- Static partition: Directly specify the value of the partition column, e.g., PARTITION (dt='shanghai').
- Dynamic partition: The system automatically maps the values of the VALUES or SELECT statement to the corresponding partition.
- If no partition specification is provided, the system will automatically select the partition based on the values of the partition columns.
column_list: Specify the columns to insert data into, ensuring that the order of the columns in the input query matches the order of the columns in the table.

User Guide

Data Type Matching: Ensure that the data types being inserted match the column types defined in the table.
Query Result Matching: When using a SELECT statement to insert data, the number and order of columns returned by the query should match the columns in the target table.
Partition Specification: When inserting data into a partitioned table, if no partition specification is provided, the system will automatically select the partition based on the values of the partition columns. Ensure that the data being inserted contains valid partition values.
INSERT OVERWRITE: When using this statement to insert data, ensure that the target table or partition exists, otherwise the operation will fail.
Data Check: Before performing the insert operation, check whether the data types and the number of columns match the target table to avoid data insertion errors. Note that Apache Hive requires partition columns to be in the last position. There is no such mandatory requirement, so when adding columns, be particularly careful to specify the position, otherwise it may cause data errors.
Automatic Partition Handling: When using a table with partition fields as functions, there is no need to specify the PARTITION clause, the system will automatically handle the partition based on the function's return value.
Bulk Data Import: In a lakehouse environment, it is not recommended to use the INSERT INTO...VALUES method for bulk data import, as this method is more suitable for testing scenarios. For bulk data import, please refer to the Data Import Guide.

Usage Example

Insert into a Regular Table Assume the test table contains two columns c1, c2.

Import a row of data into the test table


INSERT INTO test VALUES (1, 2);
INSERT INTO test (c1, c2) VALUES (1, 2);
INSERT INTO test (c1) VALUES (1);

The first and second statements have the same effect. When the target column is not specified, the column order in the table is used as the default target column. The third statement does not specify c2, so null is used to fill it. If the c2 column has a default value, the default value is used to fill it.

Import multiple rows of data into the test table at once


INSERT INTO test VALUES (3, 5), (5, 5 + 2);
INSERT INTO test (c1, c2) VALUES (6, 7), (6, 7* 2);
INSERT INTO test (c1) VALUES (1), (3);

其中第一条、第二条语句效果一样，向test表中一次性导入两条数据第三条语句未指定c2则使用null来填充，默认值向test表中导入两条数据

Import a query result into the test table


INSERT INTO test SELECT * FROM test2;
INSERT INTO test (c1, c2) SELECT * from test2;

Insert Data into Partition Table


-- Create partitioned table
CREATE TABLE sales (
  id BIGINT,
  name VARCHAR(64),
  dt STRING
)
PARTITIONED BY (dt);

-- Insert a single row of data
INSERT INTO sales VALUES (1001, 'Alice', '2021-01-01');

-- Insert multiple rows of data
INSERT INTO sales VALUES (1002, 'Bob', '2021-01-02'), (1003, 'Charlie', '2021-01-03');

-- Insert data into a specified partition
INSERT INTO sales PARTITION (dt='2021-01-04') (id, name) VALUES (1004, 'David');

-- Insert data using SELECT statement
INSERT INTO sales SELECT * FROM sales_temp;

Specify Column Order When Inserting Data


-- Assume the table structure is (id, name, dt)
CREATE TABLE sales (id BIGINT, name VARCHAR(64), dt STRING);

-- Only specify the id and name columns when inserting data. The unspecified columns will display null
INSERT INTO sales (id, name) VALUES (1005, 'Eve');
+------+------+----+
|  id  | name | dt |
+------+------+----+
| 1005 | Eve  |    |
+------+------+----+

Using INSERT OVERWRITE to Insert Data


-- Create partitioned table
CREATE TABLE sales (
  id BIGINT,
  name VARCHAR(64),
  dt STRING
)
PARTITIONED BY (dt);

-- Insert a single row of data
INSERT OVERWRITE sales VALUES (1001, 'Alice', '2021-08-01');

-- Insert multiple rows of data
INSERT OVERWRITE sales VALUES (1002, 'Bob', '2021-08-02'), (1003, 'Charlie', '2021-08-03');

-- Insert data into a specified partition
INSERT OVERWRITE sales PARTITION (dt='2021-08-04') VALUES (1004, 'David', '2021-08-04');

-- Insert data using query results
INSERT OVERWRITE sales SELECT * FROM sales_temp;