Using SQLAlchemy to Connect and Use Clickzetta Lakehouse

Introduction

SQLAlchemy is an SQL toolkit and Object-Relational Mapping (ORM) system for the Python programming language. It provides comprehensive and flexible SQL functionality for Python application development, making database operations more convenient. Clickzetta Lakehouse, as a high-performance data warehouse service, now supports SQLAlchemy, making it easier for users to perform data operations and analysis.

Installation

To use SQLAlchemy to connect to Clickzetta Lakehouse, you first need to install the clickzetta-sqlalchemy package in your Python environment. Use the following command to install: Installation command (ensure the current environment does not need to use clickzetta-sqlalchemy and clickzetta-connector, uninstall them to avoid dependency conflicts):

pip uninstall -y clickzetta-sqlalchemy clickzetta-connector && pip install clickzetta-connector-python

Configure Connection Parameters

When using SQLAlchemy to connect to Clickzetta Lakehouse, you need to provide the correct connection parameters. The format of the connection parameters is as follows:

clickzetta://<user_login_name>:<password>@<lakehouse_instance_name>.api.singdata.com/<workspace_name>?schema=<target_schema>&virtualcluster=<your_vcluster_name>

The meanings of each parameter are as follows:

  • <user_login_name>: Your Clickzetta Lakehouse login username.
  • <password>: Your Clickzetta Lakehouse login password.
  • <lakehouse_instance_name>: Your Clickzetta Lakehouse instance name.
  • <workspace_name>: Your Clickzetta Lakehouse workspace name.
  • <target_schema>: The name of the target schema you wish to access.
  • <your_vcluster_name>: The name of your virtual cluster.

Connection Example:

clickzetta://Alice:xxxx@1a2b3c4d.api.singdata.com/myworkspace?schema=public&virtualcluster=default_vc

Using Apache Superset to Connect to Clickzetta Lakehouse

In this section, we will introduce how to use Apache Superset to connect to Clickzetta Lakehouse for data queries and BI analysis.

Prerequisites

  • Ensure that the clickzetta-sqlalchemy package has been successfully installed.
  • Ensure that Apache Superset has been successfully installed and started.

Configure the Connection

  1. Open Apache Superset and go to the database list page.
  2. Click the "Add Database" button in the upper right corner and select "Other" database type.
  3. In the "SQLALCHEMY URI" field, fill in the Clickzetta Lakehouse connection parameters configured above.
  4. Click "Test Connection" to ensure the connection is successful.

Configure Superset Connection

Data Query and BI Analysis

Once the connection is successful, you can use Apache Superset for data queries and BI analysis. For example:

  1. Create a new dashboard and add chart components.
  2. In the chart configuration page, select the Clickzetta Lakehouse database connection that was just configured.
  3. Write SQL query statements, for example:
-- Example SQL query
SELECT * FROM your_table_name;
```sql
SELECT
  orders.order_id,
  orders.customer_id,
  orders.order_date,
  orders.total
FROM
  orders
WHERE
  orders.order_date BETWEEN '2022-01-01' AND '2022-12-31';
  1. Click "Execute Query" to view the query results.
  2. Adjust the chart style and configuration as needed to complete the BI analysis.

Superset Data Query and BI Analysis

By following the above steps, you can easily use Apache Superset to connect to Clickzetta Lakehouse for data querying and BI analysis.