March 03, 2025 — 1.0 Lakehouse Platform Product Update Release Notes
This release introduces a series of new features, enhancements, and bug fixes. These updates will be rolled out gradually to the following regions, expected to complete within one to two weeks from the release date. The exact timing depends on your region.
- Alibaba Cloud Shanghai
- Tencent Cloud Shanghai
- Tencent Cloud Beijing
- Tencent Cloud Guangzhou
- AWS Beijing
- International - Alibaba Cloud Singapore
- International - AWS Singapore
New Features
Federated Query Updates [Preview Release]
- Architecture Extension: Singdata Lakehouse now supports using External Schema to map and mirror external data sources at the Catalog level, enabling querying Hive (previously only Hive object storage architecture was supported; this release adds support for Hive HDFS architecture). See EXTERNAL SCHEMA
- Enhanced Delta Lake Format Reading: Automatic schema inference when creating external tables in Delta Lake format — no need to manually declare field information when creating external tables; the system automatically parses metadata. See Delta Lake External Table
Import/Export Updates
COPY Command Enhancement
Supports two-character CSV delimiters (e.g., ||), breaking the previous single-character limitation and improving compatibility with complex data. See COPY INTO Import
Pipe Feature Enhancement
-
Supports direct ingestion via Kafka Table Stream. With Table Stream, you can achieve Exactly Once semantics, and connection information can be stored in a Connection.
-
- Supports retrieving a Pipe's DDL statement via
SHOW CREATE PIPE pipe_name - Pipe DESC output optimization.
DESC PIPE pipe_namenow displays the input object name and output object name of the Pipe task, along with Kafka offset consumption information. - Supports ALTER command to modify Pipe ingestion parameters, such as changing the compute cluster.
- Supports retrieving a Pipe's DDL statement via
-
Pipe ingestion from object storage. Supports filtering files or directories starting with
.and_temporary. ParameterIGNORE_TMP_FILE=FALSE|TRUE. For example:
Compute Cluster
Fine-Grained Resource Control
Added single-job resource ratio configuration for GP-type compute clusters, limiting the maximum resource ratio of a single job to 10%. This prevents resource contention caused by large queries and improves cluster stability:
SQL Syntax
- Supports creating SQL FUNCTION. With this feature, users can use SQL DDL (Data Definition Language) statements to define custom SQL functions, enhancing the flexibility of data processing and analysis.
- Lakehouse officially launches the Column-level Security feature, supporting fine-grained control over sensitive data through Dynamic Data Masking. This feature allows administrators to dynamically hide, partially display, or replace sensitive information in columns (such as ID numbers, credit card numbers, etc.) based on user roles or attributes, effectively protecting data privacy. Users can implement dynamic masking using the following SQL statement:
By defining a masking policy and applying it to table columns, the masking policy will be applied at every position where the column appears during query execution, masking data based on the policy conditions, the SQL execution context role, or the user.
- Lakehouse data storage optimization feature. Use the advanced parameters of the OPTIMIZE command to perform small file compaction.
- DML Enhancement:
UPDATEnow supportsORDER BY ... LIMITsyntax. - [Preview Release] Multi-Dialect Compatibility: Supports partial syntax of dialects such as PostgreSQL/MySQL/Hive/Presto through SQLGlot integration.
UDF Feature
Added the cz.sql.remote.udf.lookup.policy configuration parameter, supporting dynamic switching of resolution priority between UDFs and built-in functions.
Permission Management [Preview Release]
Added Instance-level roles and cross-workspace authorization capability: Supports creating roles at the Instance granularity and granting global permissions, enabling unified permission management across Workspaces to meet fine-grained access control requirements in multi-team collaboration scenarios.
Functions
New function list:
| Function | Description |
|---|---|
| collect_list_on_array | Collects elements from input arrays into a new array and returns the new array. |
| collect_set_on_array | Extracts distinct elements from input array expressions and combines them into a new array. |
| str_to_date_mysql | Converts a string to a date, with implementation compatible with MySQL's str_to_date function. |
| make_date | Constructs a date type from year, month, and day. |
| to_start_of_interval | Truncates time ts by the specified interval. Note: when interval is in minutes, it must be evenly divisible into 1 day. |
| json_remove | Removes elements matching jsonPath from jsonObject and returns the remaining elements. |
| element_at | Extracts the element at a specified position from an array or a specified key from a map. |
| map_from_arrays | Creates a map from two arrays, where keys and values correspond one-to-one in the order of the parameter arrays. |
| endswith | Determines whether a string or binary expression ends with another specified string or binary expression. Returns TRUE if the condition is met, otherwise FALSE. Supports string and binary data, suitable for string processing and pattern matching scenarios. |
| format_string | Formats a string. Generates a formatted string based on printf-style format strings. |
| is_ascii | Determines whether str contains only ASCII-encoded characters. |
| is_utf8 | Determines whether str contains only UTF-8 encoded characters. |
| regexp_extract_all | Extracts all substrings from a string that match a regular expression. |
| sha1 | Computes the SHA1 hash value of a given string. |
| startswith | Determines whether a string starts with another specified string. Returns TRUE if the condition is met, otherwise FALSE. Supports string and binary data, suitable for string processing and pattern matching scenarios. |
SDK
JDBC
Internal Endpoint Optimization: Added use_oss_internal_endpoint=true URL parameter configuration. If you use Alibaba Cloud services, this forces the use of OSS internal endpoints during queries. See JDBC Driver
Java SDK
The real-time ingestion interface now fully supports the Vector data type, using array type mapping in the Java client. This meets vector retrieval requirements in AI scenarios. Requires clickzetta-java version greater than 2.0.0. See Java SDK
Python SDK
- Supports real-time ingestion: Provides the
clickzetta-ingestion-python-v2module (pip install clickzetta-ingestion-python-v2), supporting real-time data ingestion into Lakehouse storage. - Supports async submission: The
clickzetta-connector-pythonmodule'sexecute_async()method supports asynchronous SQL query execution, especially suitable for long-running queries. - Supports parameter binding: The
clickzetta-connector-pythonmodule'sexecute()method supports qmark and pyformat style parameter binding, suitable for more flexible queries.
Bug Fixes
- SQL command
SHOW PARTITION EXTENDED: Fixed an issue where the filesize was displayed incorrectly in the results. - Generated Column compatibility optimization: Fixed a validation error when Bulkload writes to generated columns in historical versions.
- Fixed an issue where the
quoteparameter did not take effect when exporting data using the COPY command with CSV file format. - Federated query: Fixed an issue where the Schema specified in External Schema Options did not take effect.
- Fixed an issue where Volume query regex matching did not work.
Behavior Changes
- Default data retention period adjustment:
data_retention_dayschanged from 7 days to 1 day by default. - To enhance development flexibility and data management efficiency, Lakehouse introduces primary key table SQL write support in this release! You can now directly operate tables with defined primary keys using standard SQL statements (
INSERT/UPDATE/DELETE).
