DBT + Singdata Lakehouse Quickstart

This guide uses the jaffle-shop-clickzetta sample project to walk you through the complete workflow for connecting dbt to Singdata Lakehouse: installing dependencies, configuring the connection, loading data, building models, and running tests.

jaffle-shop is the standard sample project maintained by dbt, simulating a coffee shop's order data. It includes 6 raw data tables, 6 staging views, 7 mart tables, and 27 data quality tests.

Prerequisites

  • Python 3.10 or higher (3.12 recommended)
  • A working Singdata Lakehouse instance (requires a workspace, vcluster, username, and password)

Step 1: Clone the Project

git clone https://github.com/clickzetta/jaffle-shop-clickzetta.git cd jaffle-shop-clickzetta

Step 2: Install dbt-clickzetta

Create a virtual environment and install the adapter:

python3 -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install "dbt-clickzetta>=1.7.8"

Verify the installation:

dbt --version

Example output:

Core: - installed: 1.10.x Plugins: - clickzetta: 1.6.5 - Up to date!

Step 3: Configure the Connection

Copy the connection template and fill in your instance information:

cp profiles.yml.example profiles.yml

Edit profiles.yml and replace the placeholders in angle brackets:

jaffle_shop: target: dev outputs: dev: type: clickzetta service: <your-service-endpoint> # e.g. cn-shanghai-alicloud.api.clickzetta.com, without https:// instance: <your-instance-id> workspace: <your-workspace> username: <your-username> password: <your-password> schema: dbt_jaffle vcluster: default

Verify the connection:

dbt debug --profiles-dir .

All checks showing OK means the connection is successful.

Step 4: Install Dependency Packages

dbt deps --profiles-dir .

This project depends on dbt_utils (1.3.3) and dbt_date (0.17.1). After installation:

Installing dbt-labs/dbt_utils Installed from version 1.3.3 Installing godatadriven/dbt_date Installed from version 0.17.1

Step 5: Load Raw Data

dbt seed --profiles-dir .

Loads the 6 CSV files under seeds/jaffle-data/ into the dbt_jaffle.raw schema, approximately 150,000 rows total. This normally completes in about 1 minute:

1 of 6 OK loaded seed file raw.raw_customers ......... INSERT 935 2 of 6 OK loaded seed file raw.raw_items ............. INSERT 90900 3 of 6 OK loaded seed file raw.raw_orders ............ INSERT 61948 4 of 6 OK loaded seed file raw.raw_products .......... INSERT 10 5 of 6 OK loaded seed file raw.raw_stores ............ INSERT 6 6 of 6 OK loaded seed file raw.raw_supplies .......... INSERT 65 Done. PASS=6 WARN=0 ERROR=0 SKIP=0 TOTAL=6

Step 6: Build Models

dbt run --profiles-dir .

Builds 13 models in dependency order (6 staging views + 7 mart tables):

1 of 13 OK created sql table model dbt_jaffle.metricflow_time_spine 2 of 13 OK created sql view model dbt_jaffle.stg_customers 3 of 13 OK created sql view model dbt_jaffle.stg_locations 4 of 13 OK created sql view model dbt_jaffle.stg_order_items 5 of 13 OK created sql view model dbt_jaffle.stg_orders 6 of 13 OK created sql view model dbt_jaffle.stg_products 7 of 13 OK created sql view model dbt_jaffle.stg_supplies 8 of 13 OK created sql table model dbt_jaffle.locations 9 of 13 OK created sql table model dbt_jaffle.products 10 of 13 OK created sql table model dbt_jaffle.order_items 11 of 13 OK created sql table model dbt_jaffle.supplies 12 of 13 OK created sql table model dbt_jaffle.orders 13 of 13 OK created sql table model dbt_jaffle.customers Done. PASS=13 WARN=0 ERROR=0 SKIP=0 TOTAL=13

After the build completes, the following objects will appear in the dbt_jaffle schema of Singdata Lakehouse:

ObjectTypeDescription
stg_customersViewCleaned customer data
stg_locationsViewCleaned store data
stg_order_itemsViewCleaned order line item data
stg_ordersViewCleaned order data
stg_productsViewCleaned product data
stg_suppliesViewCleaned supply data
customersTableCustomer wide table (with spending summary)
ordersTableOrder wide table (with product and amount details)
order_itemsTableOrder line item wide table
locationsTableStore dimension table
productsTableProduct dimension table
suppliesTableSupply dimension table
metricflow_time_spineTableMetricFlow time axis helper table

Step 7: Run Data Quality Tests

dbt test --profiles-dir .

Runs 27 data quality tests covering uniqueness, non-null, referential integrity, and business logic validation:

Done. PASS=27 WARN=0 ERROR=0 SKIP=0 TOTAL=27

Step 8: View Project Documentation (Optional)

dbt docs generate --profiles-dir . dbt docs serve

Open http://localhost:8080 in your browser to view the complete data lineage graph and field descriptions.

Project Structure

jaffle-shop-clickzetta/ ├── models/ │ ├── staging/ │ └── marts/ ├── seeds/ │ └── jaffle-data/ ├── macros/ │ └── clickzetta_seed_batch_size.sql ├── profiles.yml.example └── dbt_project.yml

  • models/staging/: Raw data cleaning layer, materialized as views
  • models/marts/: Business wide table layer, materialized as tables
  • seeds/jaffle-data/: 6 CSV raw data files
  • macros/clickzetta_seed_batch_size.sql: Singdata Lakehouse adaptation macro, controls batch write size
  • profiles.yml.example: Connection configuration template

Frequently Asked Questions

Q: What should I do if dbt debug reports a connection failure?

Check that service, instance, and workspace in profiles.yml are filled in correctly. service is the API endpoint domain name, not an HTTP URL — do not add the https:// prefix.

Q: Why are unit tests disabled?

The original jaffle-shop project includes 3 dbt unit tests. The SQL they generate uses cast(null as string not null) syntax, which Singdata Lakehouse does not currently support in cast expressions with not null constraints. Therefore, unit tests are uniformly disabled in dbt_project.yml. All 27 data tests run normally.

Command Reference

CommandPurpose
dbt depsInstall dependency packages declared in packages.yml
dbt seedLoad CSV files as database tables
dbt runCompile and execute all models
dbt testRun data quality tests
dbt docs generateGenerate project documentation
dbt docs serveView documentation in a local browser