Batch Upload Data Using Python SDK
This document details how to use the BulkloadStream in the Python SDK to batch load data into Lakehouse. This method is suitable for importing large amounts of data at once, supports custom data sources, and provides flexibility for data import. This example uses a local CSV file as an example. If the data source is within the object storage or the data integration range supported by Lakehouse Studio, it is recommended to use the COPY command or data integration.
Reference Documentation
Uploading Data with Python SDK
Application Scenarios
- Suitable for business scenarios that require batch data uploads.
- Suitable for developers familiar with Python and needing to customize data import logic.
Usage Restrictions
- BulkloadStream does not support writing to primary key (pk) tables.
- Not suitable for frequent data upload scenarios with intervals of less than five minutes.
Use Case
This example uses the olist_order_payments_dataset
from the Brazilian E-commerce public dataset.
Prerequisites
- Create the target table
bulk_order_payments
:
- Have INSERT permission on the target table.
Parameter | Required | Description |
---|---|---|
username | Y | Username |
password | Y | Password |
service | Y | Address to connect to the lakehouse, region.api.clickzetta.com. You can see the JDBC connection string in Lakehouse Studio management -> workspace![]() |
instance | Y | You can see the JDBC connection string in Lakehouse Studio management -> workspace![]() |
workspace | Y | Workspace in use |
vcluster | Y | VC in use |
schema | Y | Name of the schema to access |
Develop with Python Code
Use pip to install the Python package dependencies for Lakehouse. Python version 3.6 or above is required: