DBT Data Quality in Practice
Two Types of Tests
dbt provides two testing mechanisms to address data quality issues at different levels:
| Type | Test Target | How It Runs | Use Cases |
|---|---|---|---|
| data test | Actual data | Executes SQL assertions against real tables | Validates completeness, uniqueness, and referential integrity of production data |
| unit test | Model logic | Tests SQL transformation logic with mock data | Validates correctness of computation logic without depending on real data |
The fundamental difference: data tests ask "is the data correct?", unit tests ask "is the logic correct?"
Data tests depend on real data, so you must first run dbt run to build the tables before running them. They can detect problems that arise after the data pipeline runs, such as null values or duplicate keys caused by poor upstream data quality.
Unit tests are completely independent of real data — they test SQL logic using mock data you write by hand. They run during dbt build without needing real data first. This means you can validate logic during development rather than waiting until data has been produced to discover problems.
What to do when a test fails?
When dbt test fails, dbt prints the failing SQL query, which you can run directly in Studio or cz-cli to see exactly which rows violated the constraint. For example, if a unique test fails, running the failing SQL shows you which values are duplicated.
The two types are complementary: data tests find data problems, unit tests find logic problems.
Data Test
Built-in Test Types
dbt has 4 built-in general-purpose tests, declared in schema.yml:
This is the actual test configuration for the customers model in jaffle-shop-clickzetta.
Cross-Column Expression Tests
Built-in tests can only test a single column. Business rules spanning multiple columns require dbt_utils.expression_is_true:
These two tests come from the orders model in jaffle-shop-clickzetta, validating the order amount calculation logic:
- The sum of all order item subtotals equals the order subtotal
- The order total equals the subtotal plus tax
dbt_utils is an extension package officially maintained by dbt. Declare the dependency in packages.yml to use it:
Source Tests
Source tests validate data before it enters the dbt pipeline, catching problems earlier:
Once loaded_at_field specifies a timestamp field, you can use dbt source freshness to check data freshness — if the latest data has not been updated beyond a threshold time, it will issue a warning or error.
Running Tests
Verified results (from jaffle-shop-clickzetta):
All 27 data tests passed in approximately 5 seconds.
Unit Test
What Is a Unit Test
A unit test uses mock data to test the SQL transformation logic of a model, without depending on data in the real database.
Suitable scenarios:
- Validating complex CASE WHEN logic
- Validating aggregation calculations (SUM, COUNT, etc.)
- Validating timestamp handling (truncation, format conversion)
- Testing logic even when no real data is available
Basic Syntax
Unit tests are declared in schema.yml with a given (input mock data) + expect (expected output) structure:
This is the actual unit test for stg_locations in jaffle-shop-clickzetta. It validates that the timestamp "2079-10-27T23:59:59.9999" should truncate to "2079-10-27", not "2079-10-28". This kind of edge case is hard to cover with real data, but a unit test can construct it precisely.
Multiple Input Mocks
When a model references multiple upstream models, mock data must be provided for each input:
This test validates that product_id=2 has two supply records (3.50 + 5.00), which should sum to 8.50.
Running Unit Tests
Verified results (from jaffle-shop-clickzetta):
All 3 unit tests passed in approximately 2 seconds.
Complete Test Strategy
Test distribution in jaffle-shop-clickzetta:
| Layer | Test Content | Count |
|---|---|---|
| Source | not_null, unique (raw table primary keys) | 6 |
| Staging | not_null, unique (staging view primary keys) + 1 unit test (timestamp truncation) | 8 |
| Marts | not_null, unique, relationships, accepted_values, expression_is_true + 2 unit tests | 19 |
| Total | 30 |
Test coverage principles:
- Every table's primary key must have
not_null+unique - Foreign keys must have
relationshipstests - Enum fields use
accepted_values - Cross-column business rules use
expression_is_true - Complex transformation logic uses unit tests
