DBT Practice Series
dbt (Data Build Tool) is currently the most popular data transformation tool. It brings software engineering best practices — version control, testing, documentation, and modularity — into data modeling. You write transformation logic in SQL, and dbt handles dependency management, incremental computation, data quality testing, and documentation generation.
Singdata Lakehouse natively supports dbt through the dbt-clickzetta adapter, and provides several Singdata-specific capabilities on top of standard dbt features:
| Capability | Description |
|---|---|
| Dynamic Table | Declarative incremental computation; the system automatically refreshes on a schedule with no external orchestration needed |
| Table Stream | Row-level change capture (CDC), tracking INSERT/UPDATE/DELETE |
| Automatic index creation | Automatically creates Bloomfilter / inverted / vector indexes when a table is built |
| Zero-copy clone | Clone a table in milliseconds with no extra storage, ideal for CI/CD environment isolation |
| VCluster per-model | Assign a compute cluster to individual models, isolating ETL write resources from aggregation query resources |
Companion Projects
All code in this series comes from the following three directly runnable open-source projects — these are not made-up examples:
jaffle-shop-clickzetta A coffee shop order dataset — the Singdata Lakehouse version of the official dbt sample project. Includes 6 seed tables, 6 staging views, 7 mart tables, 27 data tests, and 3 unit tests. Great for getting started quickly; from zero to running in about 1 minute.
snowflake-dbt2lakehouse-dbt A TPC-H order data warehouse migrated from Snowflake to Singdata Lakehouse. Covers advanced scenarios including Dynamic Table, Table Stream CDC, incremental pipelines, and SCD dimension tables. Uses the built-in TPC-H shared dataset in Singdata Lakehouse (150 million order rows) — no data import needed.
dbt-clickzetta examples (examples/ directory)
A feature demonstration project for the dbt-clickzetta adapter, covering all supported materialization types and advanced features.
bigquery2lakehouse-retail A UK e-commerce retail dataset migrated from BigQuery + Airflow + Cosmos + Soda to Singdata Lakehouse. Covers dbt-bigquery syntax differences, Dynamic Table materialization, and Studio Tasks replacing Airflow orchestration. Data is loaded via dbt seed — no GCS or service account configuration needed.
Choose your entry point based on your situation:
New to dbt and want to get up and running quickly → Start with DBT Quickstart (Jaffle Shop) — get a complete project running in 30 minutes
Already have a dbt project and want to understand incremental processing → DBT Incremental Processing in Practice — strategy selection and real code for 4 strategies
Want to use Dynamic Table or Table Stream for real-time pipelines → DBT Real-Time Data Pipeline in Practice
Want to add tests to your data pipeline → DBT Data Quality in Practice — data test + unit test, verified 30/30 passing
Want to use Singdata-specific features like indexes, clone, and VCluster → DBT Advanced Features in Practice
Migrating from Snowflake → DBT Snowflake Migration in Practice: TPC-H Data Warehouse Pipeline
Migrating from BigQuery → DBT BigQuery Migration in Practice: Retail Data Warehouse Pipeline
Series Articles
- DBT Incremental Processing in Practice — merge / append / delete+insert / insert_overwrite strategies, incremental filter patterns, schema change handling
- DBT Real-Time Data Pipeline in Practice — Dynamic Table auto-refresh, Table Stream CDC, Stream offset management
- DBT Data Quality in Practice — data test + unit test, verified 30/30 passing
- DBT Snowflake Migration in Practice: TPC-H Data Warehouse Pipeline — 6 platform differences, full coverage of Dynamic Table, Stream CDC, Surrogate Key, Python model
- DBT BigQuery Migration in Practice: Retail Data Warehouse Pipeline — eliminating GCS + Airflow + Cosmos, Dynamic Table materialization, Studio Tasks orchestration
Version Requirements
It is recommended to use dbt-clickzetta >= 1.7.10, which fixes all known issues:
| Version | Key Fixes |
|---|---|
| 1.6.2 | seed float8 type error, timestamp seed hang, unit test safe_cast syntax error |
| 1.6.3 | this.database returning None in macros |
| 1.6.5 | Stream system column injection generating invalid SQL; SELECT * EXCEPT(...) working correctly |
| 1.7.0 | seed switched to COPY INTO, 3-5x speed improvement |
| 1.7.5 | dbt seed --full-refresh reporting "already exists"; unit test DROP TABLE on VIEW error |
| 1.7.10 | unit test cast(null as string not null) syntax error; Python model support improvements |
