Dynamic Table
[Preview Release] This feature is currently in an invitation-only preview stage. If you would like to try it, please contact Singdata Technology through the contact information on the official website. Dynamic Table is an innovative feature in the Singdata Lakehouse solution, providing users with a flexible and efficient method for data processing and storage. Unlike traditional static tables, Dynamic Tables do not store data directly but dynamically generate data by defining query statements. This approach shows significant advantages in handling large-scale datasets and achieving real-time data updates.
Core Features of Dynamic Table
- Query Results as Table: Dynamic Tables dynamically generate datasets based on executed query statements, and each query may produce a completely new dataset, i.e., the content of the Dynamic Table.
- Incremental Refresh Mechanism: Dynamic Tables support incremental refresh, triggering query execution and content updates only when the underlying data source changes, greatly improving data processing efficiency, especially for large-scale data.
- Operation Restrictions: Since data is generated by queries, Dynamic Tables do not support traditional
UPDATE
,DELETE
,INSERT
, and other direct data operations, ensuring data consistency and accuracy.
Application Scenarios of Dynamic Table
- Data Warehouse: Dynamic Tables can be used to store processed and transformed data, facilitating subsequent data analysis and report generation.
- Real-time Data Applications: For scenarios requiring real-time data updates, Dynamic Tables can provide rapid data refresh capabilities, maintaining data timeliness.
- Data ETL Process: During data extraction, transformation, and loading processes, Dynamic Tables can serve as intermediate storage, holding transformed datasets.
Differences Between Materialized View and Dynamic Table
From the implementation mechanism of Lakehouse, Dynamic Tables evolve from traditional materialized views. Although they share some commonalities, there are significant differences in their positioning.
Materialized View: A Performance Optimization Tool Materialized views are special views that precompute and store query results, significantly improving query efficiency. Unlike traditional views, materialized views store the computation results in a table-like structure, allowing subsequent similar queries to be automatically recognized and directly use the pre-stored results, reducing redundant computations and speeding up query performance.
Dynamic Table: An Efficient Data Processing Tool Dynamic Tables focus on data processing. While they can also improve query performance, they do not rely on the query optimizer's automatic rewrite. Materialized views require data to be up-to-date for query rewriting, whereas Dynamic Tables may not always have the latest data in processing scenarios.
Traditional materialized views have some limitations that Lakehouse currently does not have:
-
Only a single base table can be used. Materialized views cannot be based on complex queries (i.e., queries with joins or nested views).
-
Materialized views do not necessarily support incremental refresh; each system implements it differently. Some systems implement incremental computation for certain operators to save computational resources and speed up processing.
Advanced Features of Dynamic Table:
- Incremental Identification and Optimization: Supports file-level incremental identification in big data environments, achieving real-time incremental reading and computation optimization, as well as write optimization.
- Intelligent Decision Making: Uses a cost-based optimizer (COST-BASED OPTIMIZER) to intelligently choose between full, incremental, or partial incremental computation to meet different query needs.
- Real-time Task Operations: Provides mechanisms for data version management, rollback, and rerun, offering comprehensive support for real-time task operations.
Data Timeliness Considerations:
- Materialized views require immediate data updates to ensure the accuracy of query rewriting. When base table data changes, Lakehouse uses a scheduled refresh mechanism to ensure the materialized view data is always up-to-date.
- Dynamic Tables focus more on the flexibility of data processing, adjusting data latency based on business definitions without enforcing real-time data updates.
Dynamic Table Management Operation Guide
- Create Dynamic Table: Users can define new Dynamic Tables through specific syntax and specify SQL query statements for data transformation. For detailed syntax and operation steps, refer to the Create Dynamic Table document.
- Delete Dynamic Table: If a Dynamic Table is no longer needed, use the
DROP TABLE
command to delete it. Before performing the delete operation, ensure the data is backed up or confirmed to be no longer needed, as this operation will remove the Dynamic Table definition and its data. For detailed syntax and operation steps, refer to the Delete Dynamic Table document. - View Dynamic Table Refresh History: Use the
SHOW DYNAMIC TABLE REFRESH HISTORY
command to view the refresh history of a specified Dynamic Table, monitoring the success status of refresh jobs, runtime, incremental/full refresh mode, the number of records processed, and other information.
Data Processing and Incremental Computation
Dynamic Tables play an important role in data processing. By defining query statements, users can achieve data extraction, transformation, and loading (ETL). At the same time, Dynamic Tables support incremental computation, triggering data updates only when the data source changes, effectively reducing data processing time and resource consumption. For detailed information on incremental computation and data processing with Dynamic Tables, refer to the Data Processing and Incremental Computation document.