February 5, 2024 Lakehouse Platform Release Notes
This version update (Release 2024.02.5) brings you a series of new features, enhancements, and fixes. The update will be gradually rolled out to the following regions:
- Alibaba Cloud Shanghai Region
- Tencent Cloud Shanghai Region
- Alibaba Cloud Singapore Region
- Tencent Cloud Beijing Region
Note: The update will be completed within one to two weeks from the release date, depending on your region.
New Features and Enhancements
Stream Processing Tasks
Table Stream Change Data Capture Enhancement
The STANDARD mode of Table Stream now supports complete capture of INSERT/UPDATE/DELETE change records for tables with streaming writes. With the Table Stream solution, you can achieve real-time change data processing.
Timeliness Improvement for Change Data Capture in Streaming Write Scenarios
The default version commit time for target tables written through the streaming API has been adjusted from 10 minutes to 1 minute. The minimum window for Table Stream and Dynamic Table to capture data changes has been shortened to 1 minute, thereby improving the overall timeliness of the real-time processing chain.
Dynamic Table
We have introduced the Dynamic Table object, which will replace the existing incremental computation materialized views (the logic and behavior of existing incremental materialized views will remain unchanged). Materialized views will focus on data consistency and be oriented towards query rewriting and performance optimization scenarios. Dynamic Table, on the other hand, is aimed at real-time data processing scenarios, providing richer functionality and operational management capabilities.
- ALTER DYNAMIC TABLE SUSPEND/RESUME: New syntax to support manual scheduling suspension and resumption of scheduling cycle tasks defined by DYNAMIC TABLE DDL.
- CREATE OR REPLACE DYNAMIC TABLE: New syntax to support modifying the computation logic while retaining historical data and applying it to new change data.
- DESCRIBE HISTORY Command: Supports viewing the historical version changes of the table, returning the source information of each write to the table, such as data operation type, user, number of record changes, source job, etc. This feature can be used to trace data changes and find jobs that affect data correctness in data correction scenarios.
- TABLE_CHANGE Table Value Function: Supports viewing change data between different data versions.
- RESTORE TABLE TO TIMESTAMP AS OF: Supports data recovery to the snapshot data of a specified time version. Suitable for data correction scenarios in real-time processing.
Semi-Structured Data
JSON Data Type
We have introduced a native JSON data type to optimize JSON data storage and query analysis efficiency. At the same time, we provide supporting JSON functions, such as json_array
, json_object
, parse_json
, json_extract
, json_valid
, etc., to facilitate field extraction and analysis. Tests have shown that using JSON type fields can significantly reduce data scanning and improve query performance by several times.
Data Sharing
- Share Object: Added
DESC SHARE
command to view the list of data objects wrapped in SHARE. - Consumer-Side Optimization: Optimized error messages when creating tables under the read-only attribute of SHARED SCHEMA on the consumer side; resolved the issue where the consumer side was prompted that the object did not exist when re-authorizing tables through SHARED; optimized the authentication failure prompt for the consumer side without re-authorization.
Data Import & Export
- Supports clients to obtain query result addresses through JDBC and perform full downloads. Now supports CSV format downloads, including field data of complex types, BINARY types, and JSON types.
- Java SDK batch and real-time import interfaces now support JSON STRING import into JSON type fields.
Security Management
- Access Control: Data source connection objects CONNECTION now support GRANT/REVOKE permission control.
SQL Capability Updates
Primary Key Constraint Behavior Changes
- Primary keys are defined using the
PRIMARY KEY
keyword. - Primary keys must have unique values and cannot be NULL. The system will perform unique value validation on primary keys (default behavior is ENABLE VALIDATE), and records that conflict with existing primary keys cannot be written.
- Primary keys cannot exceed 16 columns.
- Supported data types for primary keys include: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, CHAR, VARCHAR, STRING, BOOLEAN, DATE, and TIMESTAMP.
- Modifying primary keys, including primary key names and data types, is not supported.
- Primary key columns cannot contain default values.
- The content length of primary keys is limited, and the encoded length must not exceed 128 bytes. The empirical value should not exceed 20 bytes before encoding.
New SQL Functions
json_array
json_object
parse_json
json_valid
json_extract
json_extract_boolean
json_extract_float
json_extract_double
json_extract_int
json_extract_bigint
json_extract_string
json_extract_date
json_extract_timestamp
Platform Optimization
- By optimizing the data import and DML job table statistics collection logic, enhance the statistical information of the data tables. Solve the problem of missing statistical information for some data objects in DESC EXTENDED and INFORMATION\_SCHEMA.
- Optimize the query capability of the metadata service, improving the query efficiency of SHOW JOBS and SHOW GRANTS.