DBT Practice Series

dbt (Data Build Tool) is currently the most popular data transformation tool. It brings software engineering best practices — version control, testing, documentation, and modularity — into data modeling. You write transformation logic in SQL, and dbt handles dependency management, incremental computation, data quality testing, and documentation generation.

Singdata Lakehouse natively supports dbt through the dbt-clickzetta adapter, and provides several Singdata-specific capabilities on top of standard dbt features:

CapabilityDescription
Dynamic TableDeclarative incremental computation; the system automatically refreshes on a schedule with no external orchestration needed
Table StreamRow-level change capture (CDC), tracking INSERT/UPDATE/DELETE
Automatic index creationAutomatically creates Bloomfilter / inverted / vector indexes when a table is built
Zero-copy cloneClone a table in milliseconds with no extra storage, ideal for CI/CD environment isolation
VCluster per-modelAssign a compute cluster to individual models, isolating ETL write resources from aggregation query resources

Companion Projects

All code in this series comes from the following three directly runnable open-source projects — these are not made-up examples:

jaffle-shop-clickzetta A coffee shop order dataset — the Singdata Lakehouse version of the official dbt sample project. Includes 6 seed tables, 6 staging views, 7 mart tables, 27 data tests, and 3 unit tests. Great for getting started quickly; from zero to running in about 1 minute.

snowflake-dbt2lakehouse-dbt A TPC-H order data warehouse migrated from Snowflake to Singdata Lakehouse. Covers advanced scenarios including Dynamic Table, Table Stream CDC, incremental pipelines, and SCD dimension tables. Uses the built-in TPC-H shared dataset in Singdata Lakehouse (150 million order rows) — no data import needed.

dbt-clickzetta examples (examples/ directory) A feature demonstration project for the dbt-clickzetta adapter, covering all supported materialization types and advanced features.

bigquery2lakehouse-retail A UK e-commerce retail dataset migrated from BigQuery + Airflow + Cosmos + Soda to Singdata Lakehouse. Covers dbt-bigquery syntax differences, Dynamic Table materialization, and Studio Tasks replacing Airflow orchestration. Data is loaded via dbt seed — no GCS or service account configuration needed.


Choose your entry point based on your situation:

New to dbt and want to get up and running quickly → Start with DBT Quickstart (Jaffle Shop) — get a complete project running in 30 minutes

Already have a dbt project and want to understand incremental processingDBT Incremental Processing in Practice — strategy selection and real code for 4 strategies

Want to use Dynamic Table or Table Stream for real-time pipelinesDBT Real-Time Data Pipeline in Practice

Want to add tests to your data pipelineDBT Data Quality in Practice — data test + unit test, verified 30/30 passing

Want to use Singdata-specific features like indexes, clone, and VClusterDBT Advanced Features in Practice

Migrating from SnowflakeDBT Snowflake Migration in Practice: TPC-H Data Warehouse Pipeline

Migrating from BigQueryDBT BigQuery Migration in Practice: Retail Data Warehouse Pipeline


Series Articles


Version Requirements

It is recommended to use dbt-clickzetta >= 1.7.10, which fixes all known issues:

VersionKey Fixes
1.6.2seed float8 type error, timestamp seed hang, unit test safe_cast syntax error
1.6.3this.database returning None in macros
1.6.5Stream system column injection generating invalid SQL; SELECT * EXCEPT(...) working correctly
1.7.0seed switched to COPY INTO, 3-5x speed improvement
1.7.5dbt seed --full-refresh reporting "already exists"; unit test DROP TABLE on VIEW error
1.7.10unit test cast(null as string not null) syntax error; Python model support improvements