Multi-Channel Ad Attribution Data Warehouse Best Practices
Integrating exposure, click, and conversion events from Google, WeChat, Douyin, Weibo, and other channels to build a unified user touchpoint journey, outputting Last Touch, Linear, and Position-Based attribution model comparison reports and ad ROI analysis. Based on a dataset of 5 users and 20 multi-channel ad events, this guide provides an end-to-end demonstration of the complete OSS PIPE + Kafka PIPE → ODS → DWD → DWS → ADS build process. An additional 3 temporary events are used to validate Table Stream capture, covering three key platform capabilities: Dynamic Table incremental attribution computation, Inverted Index for activity name search, and Table Stream for conversion event capture.
Overview
The core challenge of multi-channel ad attribution is that event data formats differ across channels, user ID systems are inconsistent, and the same conversion may be "claimed" by multiple channels simultaneously.
Singdata Lakehouse addresses these core problems with the following combination:
Problem
Solution
Multi-channel raw event structures vary and need unified ingestion
OSS PIPE (GA file import) + Kafka PIPE (real-time click stream), both writing to the Bronze layer
Three layers auto-refresh incrementally along the dependency chain
REFRESH DYNAMIC TABLE
Manually trigger a single refresh
Used for initial build or debugging
Prerequisites
All examples in this guide run under the best_practice_ad_attribution schema.
CREATE SCHEMA IF NOT EXISTS best_practice_ad_attribution;
ODS Layer: Multi-Channel Raw Event Tables
Create Tables
CREATE TABLE IF NOT EXISTS best_practice_ad_attribution.bronze_ad_events (
event_id STRING,
user_id STRING,
channel STRING,
event_type STRING, -- impression / click / conversion
event_time TIMESTAMP,
campaign_id STRING,
creative_id STRING,
platform STRING, -- web / app
region STRING,
ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() -- automatically populated by PIPE
);
CREATE TABLE IF NOT EXISTS best_practice_ad_attribution.bronze_campaign_meta (
campaign_id STRING,
campaign_name STRING,
channel STRING,
budget DOUBLE,
start_date DATE,
end_date DATE,
creative_id STRING,
creative_type STRING -- text / image / video
);
Create BloomFilter Index
The DWD attribution SQL performs heavy JOIN filtering on user_id. Since user_id is a high-cardinality column, a BloomFilter Index is appropriate.
CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_user_id
ON TABLE bronze_ad_events (user_id);
⚠️ Note: CREATE BLOOMFILTER INDEX requires the same schema context as the target table. Before executing, run USE SCHEMA best_practice_ad_attribution or use the -s best_practice_ad_attribution parameter in cz-cli, otherwise you will get an "index and table must in the same schema" error.
💡 Tip: The examples below use cz-cli (the Singdata ClickZetta Lakehouse CLI tool). If cz-cli is not installed, see cz-cli Setup Guide. If you prefer not to use the command line, you can also run SQL in the Development → SQL Editor of Lakehouse Studio, and configure/trigger scheduled tasks in the Studio → Tasks page.
Create Inverted Index
Marketing teams need to search ad campaigns by keyword — for example, finding all activities containing "video". PROPERTIES('analyzer'='english') enables English word segmentation.
First, insert campaign metadata to populate campaign_name, creative_type, and budget for DWD layer enrichment, and for Inverted Index search:
INSERT INTO best_practice_ad_attribution.bronze_campaign_meta
(campaign_id, campaign_name, channel, budget, start_date, end_date, creative_id, creative_type)
VALUES
('camp_001','Google Search Spring Promo', 'google', 50000, CAST('2024-01-01' AS DATE), CAST('2024-01-31' AS DATE), 'cr_g1', 'text'),
('camp_001','Google Search Spring Promo', 'google', 50000, CAST('2024-01-01' AS DATE), CAST('2024-01-31' AS DATE), 'cr_g2', 'text'),
('camp_002','WeChat Moments Brand Awareness', 'wechat', 80000, CAST('2024-01-01' AS DATE), CAST('2024-01-31' AS DATE), 'cr_w1', 'video'),
('camp_002','WeChat Moments Brand Awareness', 'wechat', 80000, CAST('2024-01-01' AS DATE), CAST('2024-01-31' AS DATE), 'cr_w2', 'image'),
('camp_003','Douyin Short Video Retargeting', 'douyin', 120000, CAST('2024-01-01' AS DATE), CAST('2024-01-31' AS DATE), 'cr_d1', 'video'),
('camp_003','Douyin Short Video Retargeting', 'douyin', 120000, CAST('2024-01-01' AS DATE), CAST('2024-01-31' AS DATE), 'cr_d2', 'video'),
('camp_004','Weibo Topic Engagement', 'weibo', 30000, CAST('2024-01-01' AS DATE), CAST('2024-01-31' AS DATE), 'cr_wb1', 'image');
CREATE INVERTED INDEX IF NOT EXISTS idx_inv_campaign_name
ON TABLE bronze_campaign_meta (campaign_name)
PROPERTIES('analyzer'='english');
Build the index (to make existing data searchable):
BUILD INDEX idx_inv_campaign_name ON bronze_campaign_meta;
⚠️ Note: BUILD INDEX only accepts table names without schema prefix, so it must be executed in the best_practice_ad_attribution schema context or via the -s best_practice_ad_attribution parameter. CREATE INDEX only takes effect for new data; run BUILD INDEX to make existing data searchable.
Verify full-text search:
SELECT campaign_id, campaign_name, channel
FROM best_practice_ad_attribution.bronze_campaign_meta
WHERE MATCH_ALL(campaign_name, 'video');
campaign_id campaign_name channel
----------- ---------------------------------- -------
camp_003 Douyin Short Video Retargeting douyin
camp_003 Douyin Short Video Retargeting douyin
Two rows result from the same campaign having two creatives (cr_d1 and cr_d2), both matching the "video" keyword.
Configure OSS PIPE (GA File Import)
Google Analytics daily export log files land in an OSS bucket, and the OSS PIPE's LIST_PURGE mode automatically scans new files and writes them to the Bronze layer:
-- First create an OSS storage connection (replace with actual AK/SK and endpoint)
CREATE STORAGE CONNECTION IF NOT EXISTS conn_oss_ga
TYPE = 'OSS'
ACCESS_ID = '<your-access-id>'
ACCESS_KEY = '<your-access-key>'
ENDPOINT = 'oss-cn-hangzhou.aliyuncs.com';
-- Mount the bucket to a Volume
CREATE EXTERNAL VOLUME IF NOT EXISTS vol_ga_exports
TYPE = 'OSS'
BUCKET = '<your-bucket>'
PATH = 'ga-exports/'
CONNECTION = conn_oss_ga;
-- Create OSS PIPE with LIST_PURGE mode to scan new CSV files
CREATE PIPE IF NOT EXISTS best_practice_ad_attribution.pipe_ga_events
VIRTUAL_CLUSTER = 'DEFAULT'
INGEST_MODE = 'LIST_PURGE'
AS
COPY INTO best_practice_ad_attribution.bronze_ad_events
FROM (
SELECT
$1 AS event_id,
$2 AS user_id,
'google' AS channel,
$3 AS event_type,
CAST($4 AS TIMESTAMP) AS event_time,
$5 AS campaign_id,
$6 AS creative_id,
$7 AS platform,
$8 AS region,
CURRENT_TIMESTAMP() AS ingest_time
FROM @vol_ga_exports
)
USING csv
OPTIONS('header'='true', 'sep'=',');
💡 Tip: LIST_PURGE mode marks files as processed after they are successfully written to the Volume scan list, preventing duplicate imports. This is suitable for GA daily full-file scenarios. If precise deduplication or file re-runs are needed, switch to LIST_RETAIN mode and handle idempotency in the downstream Dynamic Table.
Configure Kafka PIPE (Real-Time Click Stream)
Real-time Web/App click streams are ingested via Kafka and written to the same Bronze table:
-- Create Kafka storage connection
CREATE STORAGE CONNECTION IF NOT EXISTS conn_kafka_clickstream
TYPE = 'KAFKA'
KAFKA_BROKERS = '<kafka-broker>:9092';
-- Create Kafka PIPE, batch-consuming every 60 seconds
CREATE PIPE IF NOT EXISTS best_practice_ad_attribution.pipe_kafka_clickstream
VIRTUAL_CLUSTER = 'DEFAULT'
BATCH_INTERVAL_IN_SECONDS = '60'
AS
COPY INTO best_practice_ad_attribution.bronze_ad_events
FROM (
SELECT
get_json_object(value, '$.event_id') AS event_id,
get_json_object(value, '$.user_id') AS user_id,
get_json_object(value, '$.channel') AS channel,
get_json_object(value, '$.event_type') AS event_type,
CAST(get_json_object(value, '$.event_time') AS TIMESTAMP) AS event_time,
get_json_object(value, '$.campaign_id') AS campaign_id,
get_json_object(value, '$.creative_id') AS creative_id,
get_json_object(value, '$.platform') AS platform,
get_json_object(value, '$.region') AS region,
CURRENT_TIMESTAMP() AS ingest_time
FROM READ_KAFKA(
'<kafka-broker>:9092',
'ad_clickstream', -- topic name
'', 'cz_ad_consumer',
'','','','',
'json', 'json', 0, map()
)
);
Insert Simulated Data
Import from local CSV (recommended)
-- Step 1: Upload local CSV to User Volume via SQL PUT
PUT '/path/to/your/bronze_ad_events.csv' TO USER VOLUME FILE 'bronze_ad_events.csv';
-- Step 2: COPY INTO the table from User Volume
COPY INTO best_practice_ad_attribution.bronze_ad_events
FROM USER VOLUME
USING csv
OPTIONS('header'='true', 'sep'=',', 'nullValue'='')
FILES ('bronze_ad_events.csv');
You can also insert small batches of test data inline (no CSV file needed):
INSERT INTO best_practice_ad_attribution.bronze_ad_events
(event_id, user_id, channel, event_type, event_time, campaign_id, creative_id, platform, region)
VALUES
('e001','u001','google', 'impression', CAST('2024-01-01 08:00:00' AS TIMESTAMP),'camp_001','cr_g1','web','cn-north'),
('e002','u001','google', 'click', CAST('2024-01-01 08:05:00' AS TIMESTAMP),'camp_001','cr_g1','web','cn-north'),
('e003','u001','wechat', 'impression', CAST('2024-01-01 09:00:00' AS TIMESTAMP),'camp_002','cr_w1','app','cn-north'),
('e004','u001','wechat', 'click', CAST('2024-01-01 09:10:00' AS TIMESTAMP),'camp_002','cr_w1','app','cn-north'),
('e005','u001','wechat', 'conversion', CAST('2024-01-01 10:30:00' AS TIMESTAMP),'camp_002','cr_w1','app','cn-north'),
('e006','u002','douyin', 'impression', CAST('2024-01-01 10:00:00' AS TIMESTAMP),'camp_003','cr_d1','app','cn-south'),
('e007','u002','douyin', 'click', CAST('2024-01-01 10:15:00' AS TIMESTAMP),'camp_003','cr_d1','app','cn-south'),
('e008','u002','google', 'click', CAST('2024-01-01 11:00:00' AS TIMESTAMP),'camp_001','cr_g2','web','cn-south'),
('e009','u002','google', 'conversion', CAST('2024-01-01 11:45:00' AS TIMESTAMP),'camp_001','cr_g2','web','cn-south'),
('e010','u003','weibo', 'impression', CAST('2024-01-01 12:00:00' AS TIMESTAMP),'camp_004','cr_wb1','web','cn-east'),
('e011','u003','weibo', 'click', CAST('2024-01-01 12:20:00' AS TIMESTAMP),'camp_004','cr_wb1','web','cn-east'),
('e012','u003','douyin', 'impression', CAST('2024-01-01 13:00:00' AS TIMESTAMP),'camp_003','cr_d2','app','cn-east'),
('e013','u003','douyin', 'click', CAST('2024-01-01 13:10:00' AS TIMESTAMP),'camp_003','cr_d2','app','cn-east'),
('e014','u003','douyin', 'conversion', CAST('2024-01-01 14:00:00' AS TIMESTAMP),'camp_003','cr_d2','app','cn-east'),
('e015','u004','google', 'impression', CAST('2024-01-01 14:00:00' AS TIMESTAMP),'camp_001','cr_g1','web','cn-north'),
('e016','u004','google', 'click', CAST('2024-01-01 14:05:00' AS TIMESTAMP),'camp_001','cr_g1','web','cn-north'),
('e017','u004','wechat', 'impression', CAST('2024-01-01 15:00:00' AS TIMESTAMP),'camp_002','cr_w2','app','cn-north'),
('e018','u004','wechat', 'conversion', CAST('2024-01-01 16:00:00' AS TIMESTAMP),'camp_002','cr_w2','app','cn-north'),
('e019','u005','douyin', 'click', CAST('2024-01-01 16:00:00' AS TIMESTAMP),'camp_003','cr_d1','app','cn-west'),
('e020','u005','douyin', 'conversion', CAST('2024-01-01 17:00:00' AS TIMESTAMP),'camp_003','cr_d1','app','cn-west');
Enable Change Tracking and Create Table Stream
Table Stream captures new conversion events in bronze_ad_events to trigger attribution recalculation in downstream Dynamic Tables. Enable change_tracking on the table first:
ALTER TABLE best_practice_ad_attribution.bronze_ad_events
SET TBLPROPERTIES ('change_tracking' = 'true');
CREATE TABLE STREAM IF NOT EXISTS best_practice_ad_attribution.stream_conversion_events
ON TABLE best_practice_ad_attribution.bronze_ad_events
WITH PROPERTIES ('TABLE_STREAM_MODE' = 'APPEND_ONLY');
💡 Tip: Choose APPEND_ONLY mode because the ad events table is append-only (no UPDATE/DELETE). This mode outperforms STANDARD mode. Stream only records rows written after it is created; historical data that existed before the stream was created will not appear in the Stream.
Verify Stream capture. The following 3 u006 events are only used to confirm that the Stream can capture INSERTs written after creation — they do not participate in subsequent DWD / DWS / ADS attribution statistics:
INSERT INTO best_practice_ad_attribution.bronze_ad_events
(event_id, user_id, channel, event_type, event_time, campaign_id, creative_id, platform, region)
VALUES
('e021','u006','google','impression', CAST('2024-01-02 09:00:00' AS TIMESTAMP),'camp_001','cr_g1','web','cn-north'),
('e022','u006','google','click', CAST('2024-01-02 09:05:00' AS TIMESTAMP),'camp_001','cr_g1','web','cn-north'),
('e023','u006','google','conversion', CAST('2024-01-02 09:30:00' AS TIMESTAMP),'camp_001','cr_g1','web','cn-north');
SELECT __change_type, event_id, user_id, channel, event_type
FROM best_practice_ad_attribution.stream_conversion_events;
__change_type event_id user_id channel event_type
------------- -------- ------- ------- ----------
INSERT e021 u006 google impression
INSERT e022 u006 google click
INSERT e023 u006 google conversion
The Stream captured 3 new records. To keep the main sample for subsequent attribution sections at 5 users and 20 events, clean up these 3 temporary validation records:
DELETE FROM best_practice_ad_attribution.bronze_ad_events
WHERE event_id IN ('e021', 'e022', 'e023');
Downstream Dynamic Tables will automatically detect new upstream data and compute incrementally in the next refresh cycle. Subsequent attribution results in this guide are based on the cleaned main dataset, and the google conversion count in Last Touch remains 1.
DWD Layer: User Touchpoint Journey Table
Create Table
The DWD layer JOINs the Bronze event table with campaign metadata to populate campaign_name and creative_type, and uses ROW_NUMBER to assign a touch_seq sequence number to each user's touchpoint series.
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_ad_attribution.dwd_user_journey
REFRESH INTERVAL 5 MINUTE VCLUSTER DEFAULT
AS
SELECT
e.user_id,
e.channel,
e.event_type,
e.event_time,
e.campaign_id,
e.creative_id,
e.platform,
e.region,
m.campaign_name,
m.creative_type,
m.budget,
ROW_NUMBER() OVER (PARTITION BY e.user_id ORDER BY e.event_time) AS touch_seq
FROM best_practice_ad_attribution.bronze_ad_events e
LEFT JOIN best_practice_ad_attribution.bronze_campaign_meta m
ON e.campaign_id = m.campaign_id AND e.creative_id = m.creative_id;
SELECT user_id, channel, event_type, event_time, campaign_name, touch_seq
FROM best_practice_ad_attribution.dwd_user_journey
ORDER BY user_id, touch_seq
LIMIT 10;
user_id channel event_type event_time campaign_name touch_seq
------- ------- ---------- ------------------- -------------------------------- ---------
u001 google impression 2024-01-01T08:00:00 Google Search Spring Promo 1
u001 google click 2024-01-01T08:05:00 Google Search Spring Promo 2
u001 wechat impression 2024-01-01T09:00:00 WeChat Moments Brand Awareness 3
u001 wechat click 2024-01-01T09:10:00 WeChat Moments Brand Awareness 4
u001 wechat conversion 2024-01-01T10:30:00 WeChat Moments Brand Awareness 5
u002 douyin impression 2024-01-01T10:00:00 Douyin Short Video Retargeting 1
u002 douyin click 2024-01-01T10:15:00 Douyin Short Video Retargeting 2
u002 google click 2024-01-01T11:00:00 Google Search Spring Promo 3
u002 google conversion 2024-01-01T11:45:00 Google Search Spring Promo 4
u003 weibo impression 2024-01-01T12:00:00 Weibo Topic Engagement 1
touch_seq is ordered chronologically. Before u001 converted on WeChat, they first experienced a Google impression and click (touch_seq=1,2), then a WeChat impression and click (touch_seq=3,4), and finally converted on WeChat (touch_seq=5).
DWS Layer: Three Attribution Models
Attribution models are maintained in parallel across three independent Dynamic Tables in the DWS layer, without interfering with each other, allowing marketing teams to compare results on demand.
Last Touch Attribution
Attributes 100% of the conversion value to the channel of the most recent touchpoint (click or impression) before the conversion.
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_ad_attribution.dws_attribution_last_touch
REFRESH INTERVAL 5 MINUTE VCLUSTER DEFAULT
AS
WITH conversions AS (
SELECT user_id, event_time AS conv_time, campaign_id, channel
FROM best_practice_ad_attribution.dwd_user_journey
WHERE event_type = 'conversion'
),
last_touch AS (
SELECT
c.user_id,
c.conv_time,
j.channel AS attributed_channel,
j.campaign_name AS attributed_campaign_name
FROM conversions c
JOIN best_practice_ad_attribution.dwd_user_journey j
ON c.user_id = j.user_id
AND j.event_type IN ('click', 'impression')
AND j.event_time <= c.conv_time
WHERE j.touch_seq = (
SELECT MAX(touch_seq)
FROM best_practice_ad_attribution.dwd_user_journey j2
WHERE j2.user_id = c.user_id
AND j2.event_type IN ('click', 'impression')
AND j2.event_time <= c.conv_time
)
)
SELECT
attributed_channel,
attributed_campaign_name,
COUNT(*) AS conversions,
1.0 * COUNT(*) AS attributed_value
FROM last_touch
GROUP BY attributed_channel, attributed_campaign_name;
REFRESH DYNAMIC TABLE best_practice_ad_attribution.dws_attribution_last_touch;
SELECT *
FROM best_practice_ad_attribution.dws_attribution_last_touch
ORDER BY conversions DESC, attributed_channel DESC;
attributed_channel attributed_campaign_name conversions attributed_value
------------------ -------------------------------- ----------- ----------------
wechat WeChat Moments Brand Awareness 2 2.0
douyin Douyin Short Video Retargeting 2 2.0
google Google Search Spring Promo 1 1.0
The Last Touch model fully attributes conversions to the last touchpoint, with WeChat and Douyin each receiving 2 conversions. However, this underestimates Google's upstream role in the user decision path (u001 saw a Google ad first before ultimately converting on WeChat).
Linear Attribution
Distributes conversion value equally across all touchpoints in the path.
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_ad_attribution.dws_attribution_linear
REFRESH INTERVAL 5 MINUTE VCLUSTER DEFAULT
AS
WITH conversions AS (
SELECT user_id, event_time AS conv_time
FROM best_practice_ad_attribution.dwd_user_journey
WHERE event_type = 'conversion'
),
touchpoints AS (
SELECT
c.user_id,
j.channel,
j.campaign_name,
j.event_time,
COUNT(*) OVER (PARTITION BY c.user_id) AS total_touches
FROM conversions c
JOIN best_practice_ad_attribution.dwd_user_journey j
ON c.user_id = j.user_id
AND j.event_type IN ('click', 'impression')
AND j.event_time <= c.conv_time
)
SELECT
channel AS attributed_channel,
campaign_name AS attributed_campaign_name,
COUNT(*) AS touch_count,
ROUND(SUM(1.0 / total_touches), 4) AS attributed_value
FROM touchpoints
GROUP BY channel, campaign_name
ORDER BY attributed_value DESC;
REFRESH DYNAMIC TABLE best_practice_ad_attribution.dws_attribution_linear;
SELECT * FROM best_practice_ad_attribution.dws_attribution_linear ORDER BY attributed_value DESC;
attributed_channel attributed_campaign_name touch_count attributed_value
------------------ -------------------------------- ----------- ----------------
douyin Douyin Short Video Retargeting 5 2.1667
google Google Search Spring Promo 5 1.5000
wechat WeChat Moments Brand Awareness 3 0.8333
weibo Weibo Topic Engagement 2 0.5000
The Linear model fully reflects cross-channel contributions: Douyin has the most touchpoints (5) and receives the highest attributed value. Google also has 5 touchpoints, but due to longer user paths, the averaged value is 1.5. Although Weibo never directly converted a user, it participated in the user decision path and still receives 0.5 in attributed value.
Position-Based Attribution (U-Shaped Attribution)
First and last touchpoints each receive 40% weight; the remaining 20% is split equally among middle touchpoints. This model emphasizes the dual value of "acquisition (first touch)" and "closing the conversion (last touch)".
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_ad_attribution.dws_attribution_position_based
REFRESH INTERVAL 5 MINUTE VCLUSTER DEFAULT
AS
WITH conversions AS (
SELECT user_id, event_time AS conv_time
FROM best_practice_ad_attribution.dwd_user_journey
WHERE event_type = 'conversion'
),
touchpoints AS (
SELECT
c.user_id,
j.channel,
j.campaign_name,
j.touch_seq,
j.event_time,
MIN(j.touch_seq) OVER (PARTITION BY c.user_id) AS first_touch,
MAX(j.touch_seq) OVER (PARTITION BY c.user_id) AS last_touch_seq,
COUNT(*) OVER (PARTITION BY c.user_id) AS total_touches
FROM conversions c
JOIN best_practice_ad_attribution.dwd_user_journey j
ON c.user_id = j.user_id
AND j.event_type IN ('click', 'impression')
AND j.event_time <= c.conv_time
),
with_weight AS (
SELECT
user_id,
channel,
campaign_name,
CASE
WHEN touch_seq = first_touch AND touch_seq = last_touch_seq THEN 1.0
WHEN touch_seq = first_touch OR touch_seq = last_touch_seq THEN 0.4
ELSE 0.2 / GREATEST(total_touches - 2, 1)
END AS weight
FROM touchpoints
)
SELECT
channel AS attributed_channel,
campaign_name AS attributed_campaign_name,
COUNT(*) AS touch_count,
ROUND(SUM(weight), 4) AS attributed_value
FROM with_weight
GROUP BY channel, campaign_name
ORDER BY attributed_value DESC;
REFRESH DYNAMIC TABLE best_practice_ad_attribution.dws_attribution_position_based;
SELECT * FROM best_practice_ad_attribution.dws_attribution_position_based ORDER BY attributed_value DESC;
attributed_channel attributed_campaign_name touch_count attributed_value
------------------ -------------------------------- ----------- ----------------
douyin Douyin Short Video Retargeting 5 2.1000
google Google Search Spring Promo 5 1.5000
wechat WeChat Moments Brand Awareness 3 0.9000
weibo Weibo Topic Engagement 2 0.5000
Compared to Linear, Position-Based gives WeChat a higher weight (0.9 vs 0.83), because WeChat is the last touchpoint for multiple conversions (40% weight), rather than being spread equally. This model is appropriate for performance evaluation in marketing strategies that emphasize both "acquisition" and "conversion".
ADS Layer: Ad Campaign ROI Report
The ADS layer consolidates output from the three attribution models, combines it with campaign budgets, and outputs complete ROI metrics for BI tool integration.
CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_ad_attribution.ads_campaign_roi
REFRESH INTERVAL 10 MINUTE VCLUSTER DEFAULT
AS
WITH evt_stats AS (
SELECT
campaign_id,
campaign_name,
channel,
creative_type,
budget,
COUNT(CASE WHEN event_type = 'impression' THEN 1 END) AS impressions,
COUNT(CASE WHEN event_type = 'click' THEN 1 END) AS clicks,
COUNT(CASE WHEN event_type = 'conversion' THEN 1 END) AS conversions,
COUNT(DISTINCT user_id) AS unique_users
FROM best_practice_ad_attribution.dwd_user_journey
GROUP BY campaign_id, campaign_name, channel, creative_type, budget
)
SELECT
campaign_id,
campaign_name,
channel,
creative_type,
budget,
impressions,
clicks,
conversions,
unique_users,
ROUND(CASE WHEN impressions > 0 THEN 100.0 * clicks / impressions ELSE 0 END, 2) AS ctr_pct,
ROUND(CASE WHEN clicks > 0 THEN 100.0 * conversions / clicks ELSE 0 END, 2) AS cvr_pct,
ROUND(CASE WHEN conversions > 0 THEN budget / conversions ELSE NULL END, 2) AS cost_per_conversion
FROM evt_stats
ORDER BY conversions DESC;
REFRESH DYNAMIC TABLE best_practice_ad_attribution.ads_campaign_roi;
SELECT campaign_name, channel, impressions, clicks, conversions, ctr_pct, cvr_pct, cost_per_conversion
FROM best_practice_ad_attribution.ads_campaign_roi
ORDER BY conversions DESC, channel, clicks DESC;
💡 Tip: The WeChat campaign appears in two rows because two different creatives (cr_w1 video, cr_w2 image) each have their own impression/click/conversion data, and creative_type is included in the GROUP BY. For campaign-level aggregation, remove the creative_type dimension and re-aggregate.
Douyin's cost_per_conversion = 60000 (budget 120,000 / 2 conversions), Google's is 50,000. The two have similar cost per conversion, but Douyin delivers twice the conversions, making its overall ROI better. Weibo generated no direct conversions and would be considered non-contributing under the Last Touch model — but it still has path value in Linear / Position-Based models, illustrating why multi-model comparison is essential for evaluating supporting channels.
Notes
BloomFilter Index schema context: Both CREATE BLOOMFILTER INDEX and BUILD INDEX require the same schema context as the target table. Run USE SCHEMA best_practice_ad_attribution or use -s best_practice_ad_attribution before executing.
Inverted Index existing data: CREATE INVERTED INDEX only takes effect for data written after the index is created. Run BUILD INDEX to make existing data searchable via MATCH_ALL.
Table Stream historical data: Stream only captures writes after creation and does not include historical records that existed before the stream was enabled. To run full attribution on historical data, query bronze_ad_events directly rather than the Stream.
Dynamic Table refresh order: The DWD layer dwd_user_journey must finish refreshing before the three DWS Dynamic Tables consume new data, and the ADS layer depends on DWS results. The system automatically determines refresh order based on reference relationships; no manual orchestration is needed.
Attribution window: The three attribution models in this guide count all touchpoints for each user before conversion, without setting a lookback day limit. In practice, it is recommended to add a time window filter (e.g., "touchpoints within 30 days before conversion") by adding AND j.event_time >= c.conv_time - INTERVAL 30 DAY to the JOIN condition.
cost_per_conversion is NULL: When a campaign has no conversions in the statistics window, cost_per_conversion returns NULL by design. BI tools can display this as "—".