Lakehouse CDC Change Data Capture Guide (Table Stream)
Overview
Change Data Capture (CDC) is the foundation of real-time data pipelines. Singdata Lakehouse provides the TABLE STREAM object, which automatically tracks INSERT, UPDATE, and DELETE operations on a table and presents them as an incremental data stream for downstream consumption. This guide is organized by business scenario to help you quickly master Table Stream creation and consumption methods.
Quick Navigation
- Create Table Stream -- Bind a source table to enable change tracking
- Consume Change Data -- Query the Stream to obtain incremental records
- Offset Advancement After Consumption -- Understand the automatic advancement mechanism
- View Stream Information -- Monitor Stream status
- Drop Table Stream -- Clean up Streams no longer needed
SQL Commands Covered
| Command | Purpose | Use Case |
|---|---|---|
CREATE TABLE STREAM | Create a change data stream | Bind to a source table and enable CDC tracking |
SELECT * FROM stream_name | Query incremental data | Consume change data into downstream tables |
SHOW TABLE STREAMS | View Stream list | Monitor Stream status and lag |
DROP TABLE STREAM | Drop a Stream | Clean up CDC objects no longer needed |
Prerequisites
The following examples use a simulated user table users_cdc:
Create Table Stream
Use CREATE TABLE STREAM to bind to a source table. The Stream records all changes since its creation or since the last consumption.
Consume Change Data
Query the Stream to retrieve incremental change records. The rows returned by the Stream include all columns from the original table as well as the metadata column __change_type (operation type).
Result Explanation:
| user_id | user_name | status | __change_type |
|---|---|---|---|
| 2 | Bob | active | UPDATE_BEFORE |
| 2 | Bob | inactive | UPDATE_AFTER |
| 3 | Carol | active | INSERT |
Offset Advancement After Consumption
The Table Stream offset automatically advances after downstream DML operations consume the Stream. This means that once you insert Stream data into a target table, the Stream's cursor advances, and the next query will only return new changes.
Result Explanation:
| COUNT(*) |
|---|
| 0 |
View Stream Information
Use SHOW TABLE STREAMS to view a Stream's status, bound table, and consumption lag.
Key Field Descriptions:
table_name: The bound source tablemode: Consumption mode (STANDARD / APPEND_ONLY)stale_after: Stream expiration time (the Stream becomes invalid if not consumed beyond this time)
Drop Table Stream
Use DROP TABLE STREAM to remove a Stream object that is no longer needed.
Clean Up Test Data
After completing CDC verification, it is recommended to clean up test tables:
Notes
- Offset Advancement Mechanism: The Stream offset only advances when consumed by DML statements (such as
INSERT INTO ... SELECT FROM stream). PureSELECTqueries do not advance the offset. - Expiration Time: Streams depend on the Time Travel retention period. If left unconsumed beyond
data_retention_days, the Stream becomesSTALEand cannot be read further. - APPEND_ONLY Mode: If the source table is append-only (no UPDATE/DELETE), you can create a Stream in
APPEND_ONLYmode for better performance:CREATE TABLE STREAM ... WITH PROPERTIES ('TABLE_STREAM_MODE' = 'APPEND_ONLY'). - Dynamic Table Consumption: Dynamic Tables can build incremental pipelines based on Table Streams, enabling end-to-end real-time data warehousing.
