November 12, 2024 Lakehouse Platform Release Notes

This release (Release 2024.11.12) introduces a series of new features, enhancements, and fixes. These updates will be gradually rolled out to the following regions, expected to be completed within one to two weeks from the release date, depending on your region.

  • Alibaba Cloud Shanghai Region
  • Tencent Cloud Shanghai Region
  • Tencent Cloud Beijing Region
  • Tencent Cloud Guangzhou Region
  • Amazon Beijing Region
  • International Site - Alibaba Cloud - Singapore Region
  • International Site - AWS - Singapore Region

Data Lake Usability Enhancements

  • Automatic Schema Detection: Supports automatic schema detection for structured file formats (such as csv, parquet, orc files) in Volume storage, without the need to pre-know column names and data type information.

  • Federated Query Function Expansion:

    • Added federated query support for Databricks Unity Catalog
    • Hive federated query: When the hive table is in iceberg format, iceberg format reading is supported

Intelligent Features

  • Auto Index: Automatically recommends cluster key and sort key. The recommended columns can be used as sort keys, selecting columns that frequently appear in filter statements. Setting these columns as the table's sort key can speed up query execution.

Incremental Computing

  • Dynamic Table Supports DML Commands: Supports using DML commands to directly correct data. After modifying data with DML, the next refresh will be a full refresh. Currently supports INSERT, DELETE, TRUNCATE, but does not support MERGE INTO, UPDATE. By default, DML modifications to DT content will report an error to prevent user misoperation. To operate, please set set cz.sql.dt.allow.dml = true;.
  • New Partitioned Dynamic Table: Dynamic partition tables are defined through SESSION_CONFIGS()['dt.arg.xx'], and will incrementally refresh during refresh. The refresh must use the explicitly specified partition refresh command REFRESH DYNAMIC TABLE dt PARTITION partition_spec;. If the parameter is used in a regular table, although lakehouse does not restrict the syntax, the regular table will perform a full refresh. The refresh syntax is REFRESH DYNAMIC TABLE dt ;

SQL Capability Updates

Syntax Support

  • Supports insert into at the beginning in cte syntax. As shown in the following example
-- New syntax
insert into insert_dest with data as (select 2)
## Function Support
| Function Name | Functionality       |
| ------------- | ------------------- |
| unnest        | The function is used to expand elements in an array into multiple rows |

# **SDK Interface**

* Added support for vector type in JDBC interface.

# **Behavior Changes**

* **Incremental Calculation Dynamic Table**: In the new version, if the user does not simply delete columns/add columns/modify the SELECT definition statement, the added column definition can only be transparently transmitted from the table via SELECT without participating in any calculations that would affect other columns, then it will be incrementally refreshed. If the new column participates in calculations after Create Or Replace, the REFRESH task will degrade to a full refresh.

* **Quota Constraints**:

  * **Trial Account Limitations**: The total number of data objects under a single instance is limited to 1000

* **Data Import:**

  * **Kafka Pipe Adjustment**: The minimum interval is adjusted from the original 1 second to 10 seconds