2025-03-03—Lakehouse Platform 1.0 Product Update Release Notes
In this release, we have introduced a series of new features, enhancements, and fixes. These updates will be rolled out in phases to the following regions and are expected to be completed within one to two weeks from the release date, depending on your specific region.
- Alibaba Cloud Shanghai Region
- Tencent Cloud Shanghai Region
- Tencent Cloud Beijing Region
- Tencent Cloud Guangzhou Region
- Amazon Beijing Region
- Alibaba Cloud Singapore Region (International Site)
- AWS Singapore Region (International Site)
New Features and Enhancements
Federated Query Update [Preview Release]
- Architecture Extension: Cloud Lakehouse supports mapping and mirroring external data sources at the Catalog level using External Schema, enabling queries for Hive. (Previously, only the Hive object storage architecture was supported; this release adds support for the Hive HDFS architecture.) For details, see EXTERNAL SCHEMA
- Enhanced Delta Lake Format Reading: When creating external tables in Delta Lake format, automatic schema inference is enabled. This eliminates the need to manually declare field information during external table creation, as the system automatically parses the metadata. For details, see Delta Lake External Tables
Import and Export Updates
COPY Command Enhancement
Support for two-character CSV delimiters (e.g., ||
), breaking the previous single-character limitation and improving compatibility with complex data. For details, see COPY INTO Table
Pipe Function Enhancement
-
Direct import via Kafka Table Stream is now supported. With Table Stream, you can achieve Exactly Once semantics and store connection information in a Connection.
-
- Obtain the DDL statement of a Pipe using
SHOW CREATE PIPE pipe_name
- Optimized DESC PIPE output. DESC PIPE pipe_name now displays result fields, including the input and output object names of the Pipe task, and adds Kafka consumption information.
- Modify Pipe import parameters using ALTER commands, such as changing the compute cluster.
- Obtain the DDL statement of a Pipe using
-
Pipe can import data from object storage. Files or directories starting with
.
or_temporary
can be filtered using the parameterIGNORE_TMP_FILE=FALSE|TRUE
. Examples:
Compute Clusters
Fine-Grained Resource Control
Added configuration for GP Compute Cluster Single Job Resource Ratio, allowing control of the maximum resource usage per job at 10%. This prevents resource preemption by large queries and enhances cluster stability:
SQL Syntax
- Support for creating SQL FUNCTIONS. This feature enables users to define custom SQL functions using SQL DDL statements, enhancing flexibility in data processing and analysis.
- Lakehouse officially launched the Column-level Security feature, supporting fine-grained control of sensitive data through Dynamic Data Masking. Administrators can dynamically mask, partially display, or replace sensitive information in columns (e.g., ID numbers, credit card numbers) based on user roles or attributes, effectively protecting data privacy. Users can achieve dynamic data masking using the following SQL statement:
The masking policy is applied to every occurrence of the column during query execution, based on the policy conditions, SQL execution context roles, or users.
- Lakehouse Data Storage Optimization. Users can perform small file merging using advanced parameters in the OPTIMIZE Command
- DML Enhancement:
UPDATE
supportsORDER BY ... LIMIT
syntax - [Preview Release] Multi-Dialect Compatibility: Partial syntax support for PostgreSQL/MySQL/Hive/Presto dialects is enabled via SQLGlot integration.
UDF Features
Added the cz.sql.remote.udf.lookup.policy
configuration parameter, which supports dynamic switching of UDF and built-in function resolution priority.
Permission Management [Preview Release]
Added Instance-level Roles and Cross-Workspace Authorization Capabilities: Supports creating roles at the instance level and granting global permissions, enabling unified permission management across workspaces and meeting the needs of fine-grained access control in multi-team collaboration scenarios.
Functions
List of New Functions:
Function | Description |
---|---|
collect_list_on_array | Collects elements from an input array into a new array and returns the new array. |
collect_set_on_array | Extracts unique elements from an input array expression and forms a new array. |
str_to_date_mysql | Converts a string to a date, compatible with the str_to_date function in MySQL. |
make_date | Constructs a date type from year, month, and day. |
to_start_of_interval | Truncates timestamp ts according to interval. Note that when interval is in minutes, it must divide evenly into 1 day. |
json_remove | Removes elements from jsonObject that match the jsonPath and returns the remaining elements. |
element_at | Extracts elements at specified positions or keys from arrays or maps. |
map_from_arrays | Creates a map from two arrays, with keys and values in the map corresponding to the order of elements in the parameter arrays. |
endswith | Determines whether a string or binary expression ends with another specified string or binary expression. Returns TRUE if matched, otherwise FALSE. Supports string and binary data. |
format_string | Formats a string based on a printf-style format string. |
is_ascii | Checks if str contains only ASCII-encoded characters. |
is_utf8 | Checks if str contains only UTF-8 encoded characters. |
regexp_extract_all | Extracts all substrings from a string that match a regular expression. |
sha1 | Calculates the SHA1 hash value of a given string. |
startswith | Checks if a string starts with another specified string. Returns TRUE if matched, otherwise FALSE. Supports string and binary data. |
SDK
JDBC
Internal Endpoint Optimization: Added the use_oss_internal_endpoint=true
URL parameter configuration. If the service you use supports querying with OSS internal Endpoint from Alibaba Cloud, this parameter enforces the use of the OSS internal Endpoint. For details, see JDBC Driver
Java SDK
The real-time write interface now fully supports vector data types, with array types mapped in the Java client. This meets the needs of vector retrieval in AI scenarios. Requires clickzetta-java version greater than 2.0.0. For details, see Java SDK
Python SDK
- Real-time write support: Provides the
clickzetta-ingestion-python-v2
module (pip install clickzetta-ingestion-python-v2
), enabling real-time data write to Lakehouse storage. - Asynchronous submission support: The
clickzetta-connector-python
module supports asynchronous SQL query execution using theexecute_async()
method, suitable for long-running queries. - Parameter binding support: The
clickzetta-connector-python
module supports qmark and pyformat-style parameter binding using theexecute()
method for more flexible queries.
Bug Fixes
- Fixed the issue of incorrect filesize display in the results of the SQL command
SHOW PARTITION EXTENDED
. - Optimized compatibility for generated columns. Resolved the validation errors when Bulkload writes to generated columns in historical versions.
- Fixed the issue of the quote parameter not working when exporting data using the Copy command with the specified CSV file format.
- Fixed the issue of the Schema specified in the Options of External Schema not taking effect in federated queries.
- Resolved the issue of the regular expression match not working in Volume queries.
Behavioral Changes
- Default data retention period adjustment: The default value of
data_retention_days
has been changed from 7 days to 1 day. - To enhance development flexibility and data management efficiency, Lakehouse has introduced SQL write support for primary key tables in this version. You can now directly manipulate tables with primary keys using standard SQL statements (
INSERT
/UPDATE
/DELETE
).