Description
The INSERT INTO
statement is used to insert data into a table. You can explicitly specify values for each column in the table, or use the result of a SELECT
query as the data source for the insertion.
Syntax
Parameter Details
- INSERT INTO: Insert data in append mode.
- INSERT OVERWRITE: Delete the data in the target table first, then insert new data.
- For partitioned tables, it will overwrite the data of the specified partition.
- For non-partitioned tables, it will overwrite the entire table's data.
- TABLE: (Optional) Keyword used to specify the target table.
- partition_spec: (Optional) Partition specification used to specify the partition for inserting data.
- Static partition: Directly specify the value of the partition column, e.g.,
PARTITION (dt='shanghai')
. - Dynamic partition: The system automatically maps the values of the
VALUES
orSELECT
statement to the corresponding partition. - If no partition specification is provided, the system will automatically select the partition based on the values of the partition columns.
- Static partition: Directly specify the value of the partition column, e.g.,
- column_list: Specify the columns to insert data into, ensuring that the order of the columns in the input query matches the order of the columns in the table.
User Guide
-
Data Type Matching: Ensure that the data types being inserted match the column types defined in the table.
-
Query Result Matching: When using a
SELECT
statement to insert data, the number and order of columns returned by the query should match the columns in the target table. -
Partition Specification: When inserting data into a partitioned table, if no partition specification is provided, the system will automatically select the partition based on the values of the partition columns. Ensure that the data being inserted contains valid partition values.
-
INSERT OVERWRITE: When using this statement to insert data, ensure that the target table or partition exists, otherwise the operation will fail.
-
Data Check: Before performing the insert operation, check whether the data types and the number of columns match the target table to avoid data insertion errors. Note that Apache Hive requires partition columns to be in the last position. There is no such mandatory requirement, so when adding columns, be particularly careful to specify the position, otherwise it may cause data errors.
-
Automatic Partition Handling: When using a table with partition fields as functions, there is no need to specify the
PARTITION
clause, the system will automatically handle the partition based on the function's return value. -
Bulk Data Import: In a lakehouse environment, it is not recommended to use the
INSERT INTO...VALUES
method for bulk data import, as this method is more suitable for testing scenarios. For bulk data import, please refer to the Data Import Guide.
Usage Example
Insert into a Regular Table
Assume the test
table contains two columns c1
, c2
.
- Import a row of data into the
test
table
The first and second statements have the same effect. When the target column is not specified, the column order in the table is used as the default target column. The third statement does not specify c2
, so null is used to fill it. If the c2
column has a default value, the default value is used to fill it.
- Import multiple rows of data into the
test
table at once
其中第一条、第二条语句效果一样,向test
表中一次性导入两条数据 第三条语句未指定c2
则使用null来填充,默认值向test
表中导入两条数据
- Import a query result into the
test
table
Insert Data into Partition Table
Specify Column Order When Inserting Data
Using INSERT OVERWRITE to Insert Data