March 03, 2025 — 1.0 Lakehouse Platform Product Update Release Notes

This release introduces a series of new features, enhancements, and bug fixes. These updates will be rolled out gradually to the following regions, expected to complete within one to two weeks from the release date. The exact timing depends on your region.

Alibaba Cloud Shanghai
Tencent Cloud Shanghai
Tencent Cloud Beijing
Tencent Cloud Guangzhou
AWS Beijing
International - Alibaba Cloud Singapore
International - AWS Singapore

New Features

Federated Query Updates [Preview Release]

Architecture Extension: Singdata Lakehouse now supports using External Schema to map and mirror external data sources at the Catalog level, enabling querying Hive (previously only Hive object storage architecture was supported; this release adds support for Hive HDFS architecture). See EXTERNAL SCHEMA
Enhanced Delta Lake Format Reading: Automatic schema inference when creating external tables in Delta Lake format — no need to manually declare field information when creating external tables; the system automatically parses metadata. See Delta Lake External Table

Import/Export Updates

COPY Command Enhancement

Supports two-character CSV delimiters (e.g., ||), breaking the previous single-character limitation and improving compatibility with complex data. See COPY INTO Import

Pipe Feature Enhancement

Supports direct ingestion via Kafka Table Stream. With Table Stream, you can achieve Exactly Once semantics, and connection information can be stored in a Connection.
Pipe SQL Command Optimization
- Supports retrieving a Pipe's DDL statement via SHOW CREATE PIPE pipe_name
- Pipe DESC output optimization. DESC PIPE pipe_name now displays the input object name and output object name of the Pipe task, along with Kafka offset consumption information.
- Supports ALTER command to modify Pipe ingestion parameters, such as changing the compute cluster.
Pipe ingestion from object storage. Supports filtering files or directories starting with . and _temporary. Parameter IGNORE_TMP_FILE=FALSE|TRUE. For example:
s3://my_bucket/a/b/.SUCCESS oss://my_bucket/a/b/_temporary oss://my_bucket/a/b/_temporary_123/

Compute Cluster

Fine-Grained Resource Control

Added single-job resource ratio configuration for GP-type compute clusters, limiting the maximum resource ratio of a single job to 10%. This prevents resource contention caused by large queries and improves cluster stability:

ALTER VCLUSTER sample_vc SET QUERY_RESOURCE_LIMIT_RATIO='0.1';

SQL Syntax

Supports creating SQL FUNCTION. With this feature, users can use SQL DDL (Data Definition Language) statements to define custom SQL functions, enhancing the flexibility of data processing and analysis.
Lakehouse officially launches the Column-level Security feature, supporting fine-grained control over sensitive data through Dynamic Data Masking. This feature allows administrators to dynamically hide, partially display, or replace sensitive information in columns (such as ID numbers, credit card numbers, etc.) based on user roles or attributes, effectively protecting data privacy. Users can implement dynamic masking using the following SQL statement:

ALTER TABLE <table_name> MODIFY COLUMN <column_name> SET MASKING POLICY <policy_name>;

By defining a masking policy and applying it to table columns, the masking policy will be applied at every position where the column appears during query execution, masking data based on the policy conditions, the SQL execution context role, or the user.

Lakehouse data storage optimization feature. Use the advanced parameters of the OPTIMIZE command to perform small file compaction.
DML Enhancement: UPDATE now supports ORDER BY ... LIMIT syntax.
[Preview Release] Multi-Dialect Compatibility: Supports partial syntax of dialects such as PostgreSQL/MySQL/Hive/Presto through SQLGlot integration.

UDF Feature

Added the cz.sql.remote.udf.lookup.policy configuration parameter, supporting dynamic switching of resolution priority between UDFs and built-in functions.

-- Policy 1: Prioritize built-in functions (compatible with traditional OLAP system behavior) SET cz.sql.remote.udf.lookup.policy = builtin_first; -- Policy 2: Prioritize UDFs (compatible with MC/Spark job scenarios) SET cz.sql.remote.udf.lookup.policy = udf_first; -- Default Policy: Require UDFs to be prefixed with Schema (maintain historical compatibility) SET cz.sql.remote.udf.lookup.policy = schema_only;

Permission Management [Preview Release]

Added Instance-level roles and cross-workspace authorization capability: Supports creating roles at the Instance granularity and granting global permissions, enabling unified permission management across Workspaces to meet fine-grained access control requirements in multi-team collaboration scenarios.

Functions

New function list:

Function	Description
collect_list_on_array	Collects elements from input arrays into a new array and returns the new array.
collect_set_on_array	Extracts distinct elements from input array expressions and combines them into a new array.
str_to_date_mysql	Converts a string to a date, with implementation compatible with MySQL's `str_to_date` function.
make_date	Constructs a date type from year, month, and day.
to_start_of_interval	Truncates time `ts` by the specified `interval`. Note: when `interval` is in minutes, it must be evenly divisible into 1 day.
json_remove	Removes elements matching `jsonPath` from `jsonObject` and returns the remaining elements.
element_at	Extracts the element at a specified position from an array or a specified key from a map.
map_from_arrays	Creates a map from two arrays, where keys and values correspond one-to-one in the order of the parameter arrays.
endswith	Determines whether a string or binary expression ends with another specified string or binary expression. Returns `TRUE` if the condition is met, otherwise `FALSE`. Supports string and binary data, suitable for string processing and pattern matching scenarios.
format_string	Formats a string. Generates a formatted string based on `printf`-style format strings.
is_ascii	Determines whether `str` contains only ASCII-encoded characters.
is_utf8	Determines whether `str` contains only UTF-8 encoded characters.
regexp_extract_all	Extracts all substrings from a string that match a regular expression.
sha1	Computes the SHA1 hash value of a given string.
startswith	Determines whether a string starts with another specified string. Returns `TRUE` if the condition is met, otherwise `FALSE`. Supports string and binary data, suitable for string processing and pattern matching scenarios.

SDK

JDBC

Internal Endpoint Optimization: Added use_oss_internal_endpoint=true URL parameter configuration. If you use Alibaba Cloud services, this forces the use of OSS internal endpoints during queries. See JDBC Driver

Java SDK

The real-time ingestion interface now fully supports the Vector data type, using array type mapping in the Java client. This meets vector retrieval requirements in AI scenarios. Requires clickzetta-java version greater than 2.0.0. See Java SDK

Python SDK

Supports real-time ingestion: Provides the clickzetta-ingestion-python-v2 module (pip install clickzetta-ingestion-python-v2), supporting real-time data ingestion into Lakehouse storage.
Supports async submission: The clickzetta-connector-python module's execute_async() method supports asynchronous SQL query execution, especially suitable for long-running queries.
Supports parameter binding: The clickzetta-connector-python module's execute() method supports qmark and pyformat style parameter binding, suitable for more flexible queries.

Python SDK Reference

Bug Fixes

SQL command SHOW PARTITION EXTENDED: Fixed an issue where the filesize was displayed incorrectly in the results.
Generated Column compatibility optimization: Fixed a validation error when Bulkload writes to generated columns in historical versions.
Fixed an issue where the quote parameter did not take effect when exporting data using the COPY command with CSV file format.
Federated query: Fixed an issue where the Schema specified in External Schema Options did not take effect.
Fixed an issue where Volume query regex matching did not work.

Behavior Changes

Default data retention period adjustment: data_retention_days changed from 7 days to 1 day by default.
To enhance development flexibility and data management efficiency, Lakehouse introduces primary key table SQL write support in this release! You can now directly operate tables with defined primary keys using standard SQL statements (INSERT/UPDATE/DELETE).