DataX ClickZettaWriter Plugin

DataX Introduction

DataX is an open-source data synchronization tool by Alibaba, supporting multiple data sources including relational databases, HDFS, Hive, MaxCompute, HBase, FTP, and local files. This document will introduce how to use the DataX ClickZettaWriter plugin to synchronize DataX data to ClickZetta LakeHouse.

Preparations

  1. Please ensure that DataX is installed. For specific installation methods, please refer to the DataX User Guide.
  2. Download the DataX ClickZettaWriter plugin from the following address: DataX ClickzettaWriter Plugin. Unzip the plugin into the plugin/writer directory under the DataX installation directory.
  3. Before using the DataX ClickZettaWriter plugin, please ensure that the corresponding table has been created in ClickZetta LakeHouse.

Using the DataX ClickZettaWriter Plugin

1. Create Configuration File

The following example demonstrates how to use the DataX ClickZettaWriter plugin to synchronize MySQL data to ClickZetta LakeHouse.

{
  "job": {
    "content": [
      {
        "reader": {
            "name": "mysqlreader",
            "parameter": {
                "column": ["*"],
                "connection": [
                    {
                        "jdbcUrl": ["jdbc:mysql://mysql_host:mysql_port/database?useSSL=false"],
                        "table": ["test_table"]
                    }
                ],
                "password": "example",
                "username": "example",
                "where": ""
            }
        },
        "writer": {
          "name": "clickzettawriter",
          "parameter": {
              "column": ["*"],
              "connection": [
                  {
                      "jdbcUrl": "jdbc:clickzetta://instance.service/workspace?schema=example&username=example&password=example&vcluster=example",
                      "table": ["test_table"]
                  }
              ],
              "password": "example",
              "username": "example",
              "preSql": [],
              "postSql": [],
              "writeMode": "overwrite",
              "tableNumber": "1",
              "partitionColumns": {
                  "region" : "example"
              }
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 1
      }
    }
  }
}

Configuration Instructions:

  • mysqlreader: The built-in mysqlreader plugin in DataX, used for reading MySQL data. For specific usage, please refer to the mysqlreader plugin documentation.
  • clickzettawriter parameter instructions:
    • jdbcUrl: LakeHouse JDBC connection information.
    • table: The name of the table to write to (only supports writing to one table).
    • column: The names of the columns to write to (* asterisk indicates all columns).
    • partitionColumns: The names of the partition columns, used for partitioned table writing (the columns specified in column plus the partition columns must be all columns of the table).
    • writeMode: The write mode, optional values are append, overwrite, and upsert, default is append.
    • username: LakeHouse username.
    • password: LakeHouse password.
    • preSql: SQL statements to be executed before writing.
    • postSql: SQL statements to be executed after writing.

2. Execute the Synchronization Task

Run the following command to execute the synchronization task:

python bin/datax.py job.json

Usage Example

Example 1: Sync MySQL Data to ClickZetta LakeHouse

The following configuration file example synchronizes the test_table data in MySQL to the example_table in ClickZetta LakeHouse.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
			"column": ["*"],
               		"connection": [
                  	  {
                        	"jdbcUrl": ["jdbc:mysql://mysql_host:mysql_port/database?useSSL=false"],
                      		"table": ["test_table"]
                    }
                ],
                "password": "example",
                "username": "example",
                "where": ""
                    }
                },
                "writer": {
                	          "name": "clickzettawriter",
          "parameter": {
              "column": ["*"],
              "connection": [
                  {
                      "jdbcUrl": "jdbc:clickzetta://your_instance_name.api.singdata.com/your_workspace_name?schema=sample&username=your_user_name&password=your_password&vcluster=your_vcluster_name",
                      "table": ["example_table"]
                  }
              ],
                      "partitionColumns": {
                        "region" : "example"
                      },
              "password": "your_password",
              "preSql": [],
              "session": [],
              "username": "your_user_name",
              "writeMode": "append",
              "tableNumber": "1"
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 1
       }
    }
  }
}