April 22, 2025 — 1.1 Lakehouse Platform Product Update Release Notes

This release (Release 2025.04.22) introduces a series of new features, enhancements, and bug fixes. These updates will be rolled out gradually to the following regions, expected to complete within one to two weeks from the release date. The exact timing depends on your region.

Domestic Regions

  • Alibaba Cloud (Shanghai)
  • Tencent Cloud (Shanghai/Beijing/Guangzhou)
  • AWS (Beijing)

International Regions

  • Alibaba Cloud (Singapore)
  • AWS (Singapore)

New Features

Federated Query Enhancement

  • [Preview Release] ORC Format Support: External Catalog/External Schema now supports reading and writing ORC file format tables from HMS.

Data Import/Export Optimization

Import Command COPY INTO <table>

  • Intelligent newline recognition: Supports both \r\n and \n formats simultaneously.
  • on_error=abort|continue policy:
    • continue mode: Skips compressed format errors and continues execution.
    • abort mode: Terminates immediately upon encountering an error.
    • After execution, the list of imported files can be displayed.

Export Command COPY INTO <location>

  • writebom=true parameter: Adds a BOM header when exporting CSV files, resolving garbled Chinese characters when opening in Excel and improving cross-platform compatibility.
  • overwrite=true parameter: Clears the target folder (including subdirectories) before importing.

Pipe Feature Enhancement

  • Pipe continuous file ingestion: Supports ignoring errors and continuing execution via the on_error=continue parameter (skips compressed format errors and continues).

SQL Features

  • INTERVAL Type Extension: Allows INTERVAL expr unit where expr can be an expression, e.g., interval 1+2 year.
  • Metadata Display: DESC/SHOW commands now display schema information for share types.
  • Data Sampling: Added TABLESAMPLE sampling syntax support for efficient data sampling analysis.
  • Vector Search: Supports the ef parameter; configure the ef parameter before executing queries: set cz.vector.index.search.ef=64;
  • Inverted Index: When creating an inverted index, supports the 'mode' = 'max_word' parameter for finer-grained word segmentation: properties ('analyzer' = 'chinese', 'mode' = 'max_word');

Functions

Built-in Functions

  • GET_JSON_OBJECT Performance Improvement: Optimized the implementation of GET_JSON_OBJECT, improving JSON parsing efficiency.

Custom SQL Functions

SQL Function Enhancement: Supports creating user-defined SQL functions of RETURNS TABLE type.

UDF

External Functions now support referencing resource files via VOLUME addresses.

  • User Volume format address: volume:user://~/upper.jar
    • user indicates use of the User Volume protocol.
    • ~ represents the current user (a fixed value).
    • upper.jar is the target file name.
  • Table Volume format address: volume:table://table_name/upper.jar
    • table indicates use of the Table Volume protocol.
    • table_name is the table name, to be filled in based on the actual situation.
    • upper.jar is the target file name.
  • Volume format address: volume://volume_name/upper.jar
    • volume_name is the name of the created Volume.
    • upper.jar is the target file name.

Volume

  • [Preview Release] Added internal named Volume object: Named Volume is a user-defined storage location, primarily used for staging data files before importing them into tables. Compared to automatically created user-level (User Volume) and table-level (Table Volume) volumes, Named Volumes must be explicitly created by users and offer more flexible configuration options, better meeting the needs of team collaboration and complex data loading scenarios. Additionally, internal Volumes are stored within Singdata Lakehouse-managed internal storage, requiring no additional cloud storage configuration, providing users with a more convenient and efficient storage solution.

Cache

  • Preload Cache Status Query: Previously, SHOW VCLUSTER vcname PRELOAD CACHED STATUS could only be used to view cache usage when the virtual cluster was in a running state. This release adds support for viewing cache status via this command when the virtual cluster is in a suspended state, along with an indication of the cluster's running state.

INFORMATION SCHEMA

  • INFORMATION_SCHEMA now includes the OBJECT_PRIVILEGES view, which allows querying all data object privilege grants in the system:
    • Can directly query all privileges granted to a specified user, including those indirectly obtained through roles;
    • Can directly query which users have been granted privileges on a specified object (such as tables, views, etc.), including users who received privileges indirectly through roles;
    • Querying function-level privilege grants is not yet supported;
    • The privilege grant data in this view may have a latency of up to 15 minutes compared to real-time data.

SDK

Python SDK

  • Python SDK Enhancement: Python SDK now supports the SQLAlchemy 2 interface.

Bug Fixes

  • Python SDK
    • Fixed an issue where the hints parameter was ineffective in the executemany method.
    • Fixed an error when Bulkload SDK writes to partitioned tables.
    • Fixed an error when the Python SDK executes the optimize syntax.

Behavior Changes

  • External Function Authorization Simplified: Changed from requiring both USE FUNCTION and VOLUME usage permissions to requiring only USE FUNCTION permission.
  • PIPE Feature: Previously, only RUNNING and PAUSED states existed; a failed state has been added, making it easier for monitoring systems to capture exceptions. Alert mechanisms can be set up to notify on FAILED status. Viewable via DESC <pipe_name>.