Build a multi-layer data warehouse from a full stream of game behavior events to output user LTV segmentation, payment conversion funnels, and retention matrices. Using the Steam game catalog (15 mainstream games as a dimension table) as a base, and simulating player login, level completion, and payment behavior events, this guide provides an end-to-end walkthrough of the complete Kafka PIPE → Bronze → Silver → Gold pipeline, covering four key platform capabilities: Dynamic Table, BITMAP functions, LAG/LEAD window functions, and BloomFilter Index.
Overview
The core challenge in a gaming operations data warehouse is transforming a high-frequency, multi-type behavior event stream into actionable business metrics.
Challenge
Singdata Solution
High-frequency real-time writes of game events (login/payment/level), tens of millions per day
Kafka PIPE continuous ingestion — no need to write custom consumers
Client logs uploaded to OSS in daily batches, need automatic ingestion
OSS PIPE (LIST_PURGE) — triggers import when files land
System auto-refreshes following the dependency chain
LAG / LEAD
Get the previous / next event in the same session
Reconstruct payment path (previous step, next step)
GROUP_BITMAP
Precise cardinality statistics (DAU calculation)
Returns set cardinality, not the bitmap object itself
GROUP_BITMAP_STATE
Generate daily bitmap state snapshots
Used for cross-day deduplication merges (MAU)
GROUP_BITMAP_MERGE
Merge multiple bitmap states
Used together with GROUP_BITMAP_STATE
REFRESH DYNAMIC TABLE
Manually trigger a single refresh
Used during initial table creation or debugging
Prerequisites
All examples in this guide run under the best_practice_gaming_dw schema.
CREATE SCHEMA IF NOT EXISTS best_practice_gaming_dw
COMMENT 'Gaming Operations DW Best Practices';
Dimension Table: Game Catalog and Genre
Create Table
The game dimension table comes from the Steam Games Dataset (Kaggle, ~1 GB), containing game name, tags, price, review count, and other metadata. This guide uses 15 popular games as a representative sample.
CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.doc_dim_game (
app_id BIGINT COMMENT 'Steam App ID',
name STRING COMMENT 'Game title',
release_date DATE COMMENT 'Release date',
developer STRING COMMENT 'Developer name',
publisher STRING COMMENT 'Publisher name',
genres STRING COMMENT 'Comma-separated genre tags',
tags STRING COMMENT 'User-defined tags (top 20)',
price_usd DOUBLE COMMENT 'Current price in USD',
is_free BOOLEAN COMMENT 'True if free-to-play',
positive_reviews BIGINT COMMENT 'Number of positive reviews',
negative_reviews BIGINT COMMENT 'Number of negative reviews',
estimated_owners STRING COMMENT 'Estimated owner range',
avg_playtime_forever INT COMMENT 'Average playtime in minutes (lifetime)',
ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
) COMMENT 'Dim: Steam game catalog (sourced from Steam Games Dataset on Kaggle)';
CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.doc_dim_genre (
genre_id INT COMMENT 'Genre surrogate key',
genre_name STRING COMMENT 'Genre name',
category STRING COMMENT 'Meta-category: Core / Casual / Strategy'
) COMMENT 'Dim: Game genre taxonomy';
After loading the game dimension data (15 popular Steam games), query the top 5:
SELECT app_id, name, genres, price_usd, is_free, positive_reviews
FROM best_practice_gaming_dw.doc_dim_game
ORDER BY positive_reviews DESC
LIMIT 5;
The Steam game catalog covers both free-to-play (CS2, Dota 2) and paid (PUBG, BG3, Stardew Valley, Cyberpunk 2077) models, with review counts ranging from 140,000 to 1.2 million.
Bronze Layer: Raw Game Event Table
Create Table
The Bronze layer receives the real-time event stream from the Kafka PIPE. The schema covers common attributes for all event types; event-specific attributes are stored as JSON in extra_props.
CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.doc_bronze_game_events (
event_id STRING COMMENT 'Unique event identifier',
user_id STRING COMMENT 'Player user ID',
app_id BIGINT COMMENT 'Steam App ID (FK to dim_game)',
event_type STRING COMMENT 'login / logout / level_complete / purchase / achievement',
event_time TIMESTAMP COMMENT 'Client-side event timestamp (UTC)',
session_id STRING COMMENT 'Session identifier (pre-assigned by client SDK)',
level_id INT COMMENT 'Level/map number (null if not applicable)',
level_name STRING COMMENT 'Level name or map name',
amount_usd DOUBLE COMMENT 'Payment amount in USD (null if not a purchase)',
item_id STRING COMMENT 'Item SKU for purchase events',
item_name STRING COMMENT 'Item display name',
country_code STRING COMMENT 'ISO 3166-1 alpha-2 country code',
device_type STRING COMMENT 'PC / Console / Mobile',
client_version STRING COMMENT 'Game client version string',
extra_props STRING COMMENT 'JSON blob for event-specific extra properties',
ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
) COMMENT 'Bronze: raw in-game event stream (Kafka PIPE target)';
Create BloomFilter Index
Both the Silver and Gold layers will filter by user_id and app_id. These two columns have high cardinality (user IDs in the millions, game IDs in the tens of thousands), making them good candidates for BloomFilter indexes.
CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_user_id
ON TABLE doc_bronze_game_events (user_id);
CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_app_id
ON TABLE doc_bronze_game_events (app_id);
⚠️ Note: CREATE BLOOMFILTER INDEX requires the same schema context as the target table. Run USE SCHEMA best_practice_gaming_dw or pass -s best_practice_gaming_dw before executing, otherwise you will get an "index and table must in the same schema" error.
Configure Kafka PIPE
Game behavior events are collected by the client SDK and sent to a Kafka topic. The PIPE continuously consumes and writes to the Bronze table.
-- First create the raw string receiving table (Kafka PIPE writes JSON strings)
CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.kafka_raw_game_events (value STRING);
-- Create the Kafka PIPE
CREATE PIPE IF NOT EXISTS best_practice_gaming_dw.pipe_game_events
VIRTUAL_CLUSTER = 'DEFAULT'
BATCH_INTERVAL_IN_SECONDS = '60'
AS
COPY INTO best_practice_gaming_dw.kafka_raw_game_events
FROM (
SELECT CAST(value AS STRING) AS value
FROM READ_KAFKA(
'<kafka-broker>:9092', -- replace with your actual broker address
'game_tracking_events', -- topic name
'',
'cz_gaming_consumer', -- consumer group ID
'','','','',
'raw', 'raw',
0,
map()
)
);
💡 Tip: Game event Kafka topics are often split by event type (login_events, purchase_events), or you can use a single topic and distinguish by the event_type field. BATCH_INTERVAL_IN_SECONDS = 60 is suitable for minute-level latency scenarios; for purchase events requiring lower latency, use 10.
Configure OSS PIPE (Daily Offline Logs)
Client-side cache logs from low-connectivity environments are uploaded to OSS in daily batches. First create a Storage Connection, then mount an External Volume, and finally configure a PIPE to automatically scan and import.
Step 1: Create OSS Storage Connection
-- Replace ACCESS_ID / ACCESS_KEY with RAM user credentials that have OSS read/write permissions
-- Replace ENDPOINT with the access address for the OSS bucket's region
CREATE STORAGE CONNECTION IF NOT EXISTS best_practice_gaming_dw.gaming_oss_conn
TYPE oss
ENDPOINT = 'oss-cn-hangzhou.aliyuncs.com'
ACCESS_ID = '<your_access_key_id>'
ACCESS_KEY = '<your_access_key_secret>';
Step 2: Create an External Volume mounting the OSS path
-- Replace LOCATION with the actual OSS bucket path
CREATE EXTERNAL VOLUME IF NOT EXISTS best_practice_gaming_dw.gaming_offline_logs
LOCATION 'oss://<your-bucket>/game_offline_logs/'
USING CONNECTION best_practice_gaming_dw.gaming_oss_conn
DIRECTORY = (enable = true, auto_refresh = true)
RECURSIVE = true;
Step 3: Create OSS PIPE
CREATE PIPE IF NOT EXISTS best_practice_gaming_dw.pipe_offline_logs
VIRTUAL_CLUSTER = 'DEFAULT'
INGEST_MODE = 'LIST_PURGE'
AS
COPY INTO best_practice_gaming_dw.doc_bronze_game_events
FROM VOLUME best_practice_gaming_dw.gaming_offline_logs
USING json;
💡 Tip: LIST_PURGE mode deletes source files from the Volume after a successful import, preventing duplicate imports. If you need to preserve the original files, use LIST (no deletion) mode.
Load Sample Behavior Data
This guide simulates 7 players and 45 events covering five event types: login, logout, level_complete, purchase, and achievement.
Import from local CSV (recommended)
Save the event data as a CSV file and import in bulk via User Volume:
-- Step 1: Upload the local CSV file to User Volume via SQL PUT
PUT '/path/to/game_events.csv' TO USER VOLUME FILE 'game_events.csv';
-- Step 2: COPY INTO the table from User Volume
COPY INTO best_practice_gaming_dw.doc_bronze_game_events
FROM USER VOLUME
USING csv
OPTIONS('header'='true', 'sep'=',', 'nullValue'='')
FILES ('game_events.csv');
You can also insert simulated test data inline (no CSV file required):
INSERT INTO best_practice_gaming_dw.doc_bronze_game_events
(event_id, user_id, app_id, event_type, event_time, session_id,
level_id, level_name, amount_usd, item_id, item_name,
country_code, device_type, client_version, extra_props)
VALUES
('E0001','U001',730,'login',CAST('2026-05-20 09:00:00' AS TIMESTAMP),'S001',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'),
('E0002','U001',730,'level_complete',CAST('2026-05-20 09:12:00' AS TIMESTAMP),'S001',1,'de_dust2',NULL,NULL,NULL,'CN','PC','v1.40.1','{"kills":18,"deaths":5,"mvp":true}'),
('E0003','U001',730,'purchase',CAST('2026-05-20 09:15:00' AS TIMESTAMP),'S001',NULL,NULL,1.99,'skin_001','AK-47 | Redline','CN','PC','v1.40.1','{}'),
('E0004','U001',730,'level_complete',CAST('2026-05-20 09:30:00' AS TIMESTAMP),'S001',2,'de_inferno',NULL,NULL,NULL,'CN','PC','v1.40.1','{"kills":22,"deaths":8,"mvp":false}'),
('E0005','U001',730,'logout',CAST('2026-05-20 10:00:00' AS TIMESTAMP),'S001',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'),
('E0006','U001',730,'login',CAST('2026-05-21 09:00:00' AS TIMESTAMP),'S005',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'),
('E0007','U001',730,'level_complete',CAST('2026-05-21 09:30:00' AS TIMESTAMP),'S005',3,'de_mirage',NULL,NULL,NULL,'CN','PC','v1.40.1','{"kills":15,"deaths":6}'),
('E0008','U001',730,'purchase',CAST('2026-05-21 09:36:00' AS TIMESTAMP),'S005',NULL,NULL,14.99,'skin_002','M4A4 | Howl','CN','PC','v1.40.1','{}'),
('E0009','U001',730,'logout',CAST('2026-05-21 10:00:00' AS TIMESTAMP),'S005',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'),
('E0010','U002',570,'login',CAST('2026-05-20 10:00:00' AS TIMESTAMP),'S002',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'),
('E0011','U002',570,'level_complete',CAST('2026-05-20 11:00:00' AS TIMESTAMP),'S002',1,'tutorial',NULL,NULL,NULL,'US','PC','v7.34','{}'),
('E0012','U002',570,'purchase',CAST('2026-05-20 11:10:00' AS TIMESTAMP),'S002',NULL,NULL,4.99,'hero_001','Crystal Maiden Set','US','PC','v7.34','{}'),
('E0013','U002',570,'logout',CAST('2026-05-20 12:30:00' AS TIMESTAMP),'S002',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'),
('E0014','U002',570,'login',CAST('2026-05-21 10:00:00' AS TIMESTAMP),'S010',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'),
('E0015','U002',570,'level_complete',CAST('2026-05-21 11:00:00' AS TIMESTAMP),'S010',2,'ranked',NULL,NULL,NULL,'US','PC','v7.34','{}'),
('E0016','U002',570,'logout',CAST('2026-05-21 12:00:00' AS TIMESTAMP),'S010',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'),
('E0017','U003',1172470,'login',CAST('2026-05-20 11:00:00' AS TIMESTAMP),'S003',NULL,NULL,NULL,NULL,NULL,'KR','PC','v3.24','{}'),
('E0018','U003',1172470,'level_complete',CAST('2026-05-20 11:20:00' AS TIMESTAMP),'S003',1,'Kings Canyon',NULL,NULL,NULL,'KR','PC','v3.24','{}'),
('E0019','U003',1172470,'level_complete',CAST('2026-05-20 11:40:00' AS TIMESTAMP),'S003',2,'World Edge',NULL,NULL,NULL,'KR','PC','v3.24','{}'),
('E0020','U003',1172470,'purchase',CAST('2026-05-20 12:10:00' AS TIMESTAMP),'S003',NULL,NULL,9.99,'legend_001','Revenant Legend Skin','KR','PC','v3.24','{}'),
('E0021','U003',1172470,'logout',CAST('2026-05-20 13:00:00' AS TIMESTAMP),'S003',NULL,NULL,NULL,NULL,NULL,'KR','PC','v3.24','{}'),
('E0022','U004',578080,'login',CAST('2026-05-10 10:00:00' AS TIMESTAMP),'S012',NULL,NULL,NULL,NULL,NULL,'US','PC','v30.5','{}'),
('E0023','U004',578080,'logout',CAST('2026-05-10 11:00:00' AS TIMESTAMP),'S012',NULL,NULL,NULL,NULL,NULL,'US','PC','v30.5','{}'),
('E0024','U004',578080,'login',CAST('2026-05-20 14:00:00' AS TIMESTAMP),'S004',NULL,NULL,NULL,NULL,NULL,'US','PC','v31.0','{}'),
('E0025','U004',578080,'level_complete',CAST('2026-05-20 14:30:00' AS TIMESTAMP),'S004',1,'Erangel',NULL,NULL,NULL,'US','PC','v31.0','{}'),
('E0026','U004',578080,'logout',CAST('2026-05-20 15:00:00' AS TIMESTAMP),'S004',NULL,NULL,NULL,NULL,NULL,'US','PC','v31.0','{}'),
('E0027','U005',1091500,'login',CAST('2026-05-18 09:00:00' AS TIMESTAMP),'S006',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'),
('E0028','U005',1091500,'level_complete',CAST('2026-05-18 10:00:00' AS TIMESTAMP),'S006',1,'Prologue',NULL,NULL,NULL,'DE','PC','v2.12','{}'),
('E0029','U005',1091500,'purchase',CAST('2026-05-18 10:05:00' AS TIMESTAMP),'S006',NULL,NULL,59.99,'dlc_phantom','Phantom Liberty DLC','DE','PC','v2.12','{}'),
('E0030','U005',1091500,'logout',CAST('2026-05-18 11:00:00' AS TIMESTAMP),'S006',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'),
('E0031','U005',1091500,'login',CAST('2026-05-19 09:00:00' AS TIMESTAMP),'S007',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'),
('E0032','U005',1091500,'level_complete',CAST('2026-05-19 11:00:00' AS TIMESTAMP),'S007',1,'Heist',NULL,NULL,NULL,'DE','PC','v2.12','{}'),
('E0033','U005',1091500,'logout',CAST('2026-05-19 12:00:00' AS TIMESTAMP),'S007',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'),
('E0034','U005',1091500,'login',CAST('2026-05-20 09:00:00' AS TIMESTAMP),'S008',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'),
('E0035','U005',1091500,'purchase',CAST('2026-05-20 09:05:00' AS TIMESTAMP),'S008',NULL,NULL,9.99,'stash_001','Stash Tab','DE','PC','v2.12','{}'),
('E0036','U005',1091500,'logout',CAST('2026-05-20 10:30:00' AS TIMESTAMP),'S008',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'),
('E0037','U005',1091500,'achievement',CAST('2026-05-20 09:10:00' AS TIMESTAMP),'S008',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{"trophy":"platinum"}'),
('E0038','U006',892970,'login',CAST('2026-05-20 13:00:00' AS TIMESTAMP),'S009',NULL,NULL,NULL,NULL,NULL,'JP','Console','v1.15','{}'),
('E0039','U006',892970,'level_complete',CAST('2026-05-20 14:00:00' AS TIMESTAMP),'S009',1,'Limgrave',NULL,NULL,NULL,'JP','Console','v1.15','{}'),
('E0040','U006',892970,'logout',CAST('2026-05-20 14:00:00' AS TIMESTAMP),'S009',NULL,NULL,NULL,NULL,NULL,'JP','Console','v1.15','{}'),
('E0041','U007',2512000,'login',CAST('2026-05-21 14:00:00' AS TIMESTAMP),'S011',NULL,NULL,NULL,NULL,NULL,'BR','PC','v2.1.0','{}'),
('E0042','U007',2512000,'purchase',CAST('2026-05-21 14:25:00' AS TIMESTAMP),'S011',NULL,NULL,29.99,'hero_tank','Reinhardt Skin','BR','PC','v2.1.0','{}'),
('E0043','U007',2512000,'purchase',CAST('2026-05-21 14:30:00' AS TIMESTAMP),'S011',NULL,NULL,29.99,'hero_healer','Mercy Skin','BR','PC','v2.1.0','{}'),
('E0044','U007',2512000,'level_complete',CAST('2026-05-21 14:20:00' AS TIMESTAMP),'S011',1,'Control Point',NULL,NULL,NULL,'BR','PC','v2.1.0','{}'),
('E0045','U007',2512000,'logout',CAST('2026-05-21 16:00:00' AS TIMESTAMP),'S011',NULL,NULL,NULL,NULL,NULL,'BR','PC','v2.1.0','{}');
Verify the Bronze layer data distribution:
SELECT event_type, COUNT(*) AS cnt
FROM best_practice_gaming_dw.doc_bronze_game_events
GROUP BY event_type
ORDER BY cnt DESC;
The Bronze layer has 45 events covering 5 types. There are 8 purchase events (payment rate: 8 purchases / 12 logins ≈ 67% — note this is a high payment rate in the simulated scenario; in production it is typically 2–5%).
Silver Layer: Sessionized User Behavior Sequences
The Silver layer has two Dynamic Tables: silver_user_sessions (session aggregation) and silver_event_sequence (event stream with preceding/following sequences).
Session Aggregation Table
Aggregate at session_id granularity: compute session duration, levels completed, and payment amount, and enrich with game dimension data (game name and genre).
💡 Tip: In production, "sessions" are typically generated by the client SDK using a session_id, with a 30-minute idle timeout. This guide reuses the SDK-generated session_id directly; the Silver layer only aggregates and does not re-segment session boundaries. If the upstream data has no session_id, use LAG to calculate inter-event intervals and re-partition sessions in the Silver layer.
CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_silver_user_sessions
REFRESH INTERVAL 5 MINUTE VCLUSTER DEFAULT
COMMENT 'Silver: sessionized user behavior with 30-min idle cutoff'
AS
SELECT
e.session_id,
e.user_id,
e.app_id,
g.name AS game_name,
g.genres AS game_genres,
g.is_free AS game_is_free,
e.country_code,
e.device_type,
MIN(e.event_time) AS session_start,
MAX(e.event_time) AS session_end,
TIMESTAMPDIFF(SECOND, MIN(e.event_time), MAX(e.event_time)) / 60.0
AS session_duration_min,
COUNT(*) AS total_events,
SUM(CASE WHEN e.event_type = 'level_complete' THEN 1 ELSE 0 END) AS levels_completed,
SUM(CASE WHEN e.event_type = 'purchase' THEN 1 ELSE 0 END) AS purchase_count,
SUM(COALESCE(e.amount_usd, 0.0)) AS session_revenue_usd,
MAX(CASE WHEN e.event_type = 'purchase' THEN 1 ELSE 0 END) AS has_purchase,
CAST(MIN(e.event_time) AS DATE) AS session_date
FROM best_practice_gaming_dw.doc_bronze_game_events e
LEFT JOIN best_practice_gaming_dw.doc_dim_game g ON e.app_id = g.app_id
GROUP BY
e.session_id, e.user_id, e.app_id, g.name, g.genres, g.is_free,
e.country_code, e.device_type;
SELECT session_id, user_id, game_name, session_duration_min,
levels_completed, purchase_count, session_revenue_usd
FROM best_practice_gaming_dw.doc_silver_user_sessions
ORDER BY session_start;
7 out of 12 sessions generated purchases, giving a session payment rate of 58.3%. Total session durations range from 60 minutes (single-match scenarios) to 180 minutes (deep RPG gameplay).
Use LAG and LEAD to reconstruct the preceding and following steps for each event, as well as the cumulative number of levels completed, providing a foundation for payment decision path analysis.
CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_silver_event_sequence
REFRESH INTERVAL 5 MINUTE VCLUSTER DEFAULT
COMMENT 'Silver: event sequence with LAG/LEAD for payment path analysis'
AS
SELECT
event_id,
user_id,
app_id,
event_type,
event_time,
session_id,
amount_usd,
item_id,
item_name,
country_code,
-- Previous event in the same session
LAG(event_type, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time)
AS prev_event_type,
LAG(event_time, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time)
AS prev_event_time,
-- Next event in the same session
LEAD(event_type, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time)
AS next_event_type,
LEAD(event_time, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time)
AS next_event_time,
-- Seconds since the previous event
TIMESTAMPDIFF(SECOND,
LAG(event_time, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time),
event_time
) AS seconds_since_prev,
-- Cumulative levels completed before this purchase (global per user across sessions)
SUM(CASE WHEN event_type = 'level_complete' THEN 1 ELSE 0 END)
OVER (PARTITION BY user_id ORDER BY event_time
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
AS levels_before,
CAST(event_time AS DATE) AS event_date
FROM best_practice_gaming_dw.doc_bronze_game_events;
⚠️ Note: The default window frame for LAST_VALUE is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. To get the last value in the entire partition, you must explicitly specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING; otherwise it returns the current row's value.
Query the preceding and following steps for all purchase events:
SELECT user_id, event_type,
prev_event_type AS prev_step,
next_event_type AS next_step,
levels_before AS levels_done_before_pay,
seconds_since_prev AS secs_to_decide
FROM best_practice_gaming_dw.doc_silver_event_sequence
WHERE event_type = 'purchase'
ORDER BY event_time;
Further aggregate — distribution of pre-purchase steps:
SELECT prev_event_type AS prev_step,
COUNT(*) AS pay_count,
ROUND(AVG(seconds_since_prev), 0) AS avg_decision_secs
FROM best_practice_gaming_dw.doc_silver_event_sequence
WHERE event_type = 'purchase'
GROUP BY prev_event_type
ORDER BY pay_count DESC;
Result interpretation: 75% of purchases occur after completing a level (level_complete → purchase) with an average decision time of 590 seconds (about 10 minutes). Another 12.5% of purchases happen immediately after login — these users are high-intent players who came back specifically to top up, with an average wait of 300 seconds. This distribution can be used to optimize trigger timing: push item cards 5–10 minutes after level_complete.
Gold Layer: LTV Segmentation, Funnel, and Retention Matrix
User LTV Segmentation
Aggregate at user_id granularity to compute cumulative payments, active days, and game variety, and output LTV segmentation (Whale/Dolphin/Minnow/Free) along with a composite engagement score.
CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_gold_user_ltv
REFRESH INTERVAL 10 MINUTE VCLUSTER DEFAULT
COMMENT 'Gold: user LTV segmentation - total spend, sessions, active days'
AS
SELECT
s.user_id,
COUNT(DISTINCT s.session_id) AS total_sessions,
COUNT(DISTINCT s.session_date) AS active_days,
SUM(s.session_duration_min) AS total_playtime_min,
SUM(s.session_revenue_usd) AS total_revenue_usd,
COUNT(DISTINCT s.app_id) AS distinct_games_played,
MAX(s.session_date) AS last_active_date,
MIN(s.session_date) AS first_active_date,
-- LTV tier: based on cumulative spend amount
CASE
WHEN SUM(s.session_revenue_usd) >= 50 THEN 'Whale'
WHEN SUM(s.session_revenue_usd) >= 10 THEN 'Dolphin'
WHEN SUM(s.session_revenue_usd) > 0 THEN 'Minnow'
ELSE 'Free'
END AS ltv_tier,
-- Engagement score: 40% active days + 40% playtime + 20% spend
ROUND(
0.4 * LEAST(COUNT(DISTINCT s.session_date) / 7.0, 1.0) * 100
+ 0.4 * LEAST(SUM(s.session_duration_min) / 300.0, 1.0) * 100
+ 0.2 * LEAST(SUM(s.session_revenue_usd) / 20.0, 1.0) * 100,
1) AS engagement_score
FROM best_practice_gaming_dw.doc_silver_user_sessions s
GROUP BY s.user_id;
SELECT user_id, total_sessions, active_days, total_playtime_min,
total_revenue_usd, ltv_tier, engagement_score
FROM best_practice_gaming_dw.doc_gold_user_ltv
ORDER BY total_revenue_usd DESC;
Result interpretation: 2 of the 7 users are Whales (cumulative spend ≥ $50), contributing 71.3% of total revenue (129.96/161.94) — the classic "80/20 effect" in F2P games. U005 has the highest engagement score (77.1) due to 3 consecutive active days and the longest playtime, confirming a positive correlation between high engagement and high spend.
LTV tier summary:
SELECT ltv_tier,
COUNT(*) AS user_count,
ROUND(SUM(total_revenue_usd), 2) AS total_rev,
ROUND(AVG(total_revenue_usd), 2) AS avg_rev
FROM best_practice_gaming_dw.doc_gold_user_ltv
GROUP BY ltv_tier
ORDER BY avg_rev DESC;
BITMAP functions are suited for computing DAU per day and merging across days to compute MAU. Prerequisite: user IDs must be converted to integers (CAST(SUBSTR(user_id, 2) AS INT) converts "U001" to 1).
Approach 1: GROUP_BITMAP for direct daily unique user count
SELECT
CAST(event_time AS DATE) AS event_date,
GROUP_BITMAP(CAST(SUBSTR(user_id, 2) AS INT)) AS dau
FROM best_practice_gaming_dw.doc_bronze_game_events
GROUP BY CAST(event_time AS DATE)
ORDER BY event_date;
Approach 2: GROUP_BITMAP_STATE + GROUP_BITMAP_MERGE for cross-day deduplication (MAU scenario)
GROUP_BITMAP_STATE generates a daily bitmap state object; GROUP_BITMAP_MERGE merges multi-day bitmaps and counts cross-day deduplicated users:
WITH daily_bitmaps AS (
SELECT
CAST(event_time AS DATE) AS event_date,
GROUP_BITMAP_STATE(CAST(SUBSTR(user_id, 2) AS INT)) AS bm
FROM best_practice_gaming_dw.doc_bronze_game_events
GROUP BY CAST(event_time AS DATE)
)
SELECT
event_date,
GROUP_BITMAP_MERGE(bm) AS cumulative_unique_users
FROM daily_bitmaps
GROUP BY event_date
ORDER BY event_date;
⚠️ Note: The GROUP_BITMAP family of functions (GROUP_BITMAP, GROUP_BITMAP_AND, GROUP_BITMAP_OR, GROUP_BITMAP_MERGE) return cardinality (INT), not the bitmap object itself. If you need the bitmap object for subsequent operations, use GROUP_BITMAP_STATE.
Payment Conversion Funnel
Compute daily conversion rates for the three-step funnel: "login → level engagement → completed purchase".
CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_gold_payment_funnel
REFRESH INTERVAL 10 MINUTE VCLUSTER DEFAULT
COMMENT 'Gold: payment conversion funnel - login → engage → purchase'
AS
WITH user_daily AS (
SELECT
user_id,
event_date,
MAX(CASE WHEN event_type = 'login' THEN 1 ELSE 0 END) AS has_login,
MAX(CASE WHEN event_type = 'level_complete' THEN 1 ELSE 0 END) AS has_level,
MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS has_purchase,
SUM(COALESCE(amount_usd, 0.0)) AS day_revenue
FROM best_practice_gaming_dw.doc_silver_event_sequence
GROUP BY user_id, event_date
),
funnel_agg AS (
SELECT
event_date,
SUM(has_login) AS step1_login,
SUM(has_level) AS step2_engage,
SUM(has_purchase) AS step3_purchase,
SUM(day_revenue) AS total_revenue
FROM user_daily
GROUP BY event_date
)
SELECT
event_date,
step1_login,
step2_engage,
step3_purchase,
ROUND(step2_engage * 100.0 / NULLIF(step1_login, 0), 1) AS engage_rate_pct,
ROUND(step3_purchase * 100.0 / NULLIF(step2_engage, 0), 1) AS purchase_rate_pct,
ROUND(step3_purchase * 100.0 / NULLIF(step1_login, 0), 1) AS overall_conversion_pct,
total_revenue
FROM funnel_agg
ORDER BY event_date;
💡 Tip: On May 10, only login events occurred (U004 logged in but completed no levels), so engage_rate = 0.0 and purchase_rate is null (denominator is 0, NULLIF returns null). On May 20, 6 users logged in, 5 engaged with levels, and 4 ultimately made purchases, giving cvr_overall = 66.7%.
User Retention Matrix
Use the user's first login date as the cohort to compute D1 and D7 retention rates.
CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_gold_retention_matrix
REFRESH INTERVAL 10 MINUTE VCLUSTER DEFAULT
COMMENT 'Gold: N-day retention cohort matrix'
AS
WITH first_login AS (
SELECT
user_id,
MIN(CAST(event_time AS DATE)) AS cohort_date
FROM best_practice_gaming_dw.doc_bronze_game_events
WHERE event_type = 'login'
GROUP BY user_id
),
active_days AS (
SELECT DISTINCT
e.user_id,
CAST(e.event_time AS DATE) AS active_date,
f.cohort_date,
DATEDIFF(CAST(e.event_time AS DATE), f.cohort_date) AS day_offset
FROM best_practice_gaming_dw.doc_bronze_game_events e
JOIN first_login f ON e.user_id = f.user_id
WHERE DATEDIFF(CAST(e.event_time AS DATE), f.cohort_date) BETWEEN 0 AND 30
)
SELECT
cohort_date,
COUNT(DISTINCT CASE WHEN day_offset = 0 THEN user_id END) AS d0_users,
COUNT(DISTINCT CASE WHEN day_offset = 1 THEN user_id END) AS d1_retained,
COUNT(DISTINCT CASE WHEN day_offset = 7 THEN user_id END) AS d7_retained,
ROUND(COUNT(DISTINCT CASE WHEN day_offset = 1 THEN user_id END) * 100.0
/ NULLIF(COUNT(DISTINCT CASE WHEN day_offset = 0 THEN user_id END), 0), 1)
AS d1_retention_pct,
ROUND(COUNT(DISTINCT CASE WHEN day_offset = 7 THEN user_id END) * 100.0
/ NULLIF(COUNT(DISTINCT CASE WHEN day_offset = 0 THEN user_id END), 0), 1)
AS d7_retention_pct
FROM active_days
GROUP BY cohort_date
ORDER BY cohort_date;
Result interpretation: The user cohorted on May 18 (U005) had 100% D1 retention (3 consecutive active days). Of the 4 users cohorted on May 20, 2 returned the next day (D1 retention 50%). The simulated data covers a short time span (within 10 days), so D7 retention is 0 for all cohorts. In production, at least 7 days of user behavior data must accumulate before D7 retention is meaningful.
Notes
Kafka PIPE DDL attempts broker connection: When CREATE PIPE is executed, the system validates that the broker address and topic are reachable. If Kafka is not available in your development environment, create the target table first, then simulate the ingestion process with direct INSERT after the PIPE is created.
BLOOMFILTER INDEX does not apply to existing data after creation: CREATE BLOOMFILTER INDEX only applies to newly written data blocks. The BLOOMFILTER index type does not support BUILD INDEX for existing data; consider using INVERTED INDEX for range query scenarios.
GROUP_BITMAP requires integer input: If user IDs are strings (e.g., "U001"), they must be converted to integers before using BITMAP functions. A common approach is to maintain a user_id → int_id mapping table in the Bronze layer and use int_id for BITMAP operations in the Silver layer.
"Payment rate > 100%" is normal: When the same user makes multiple purchases on the same day, calculating "number of payments / number of logged-in users" can exceed 100%. Reports should provide both "number of paying users" (COUNT DISTINCT user_id) and "number of purchases" to avoid misinterpretation.
Dynamic Table first refresh requires manual trigger: After CREATE DYNAMIC TABLE, the system does not immediately execute the first computation. Run REFRESH DYNAMIC TABLE <table> to trigger it. Subsequent refreshes happen automatically at REFRESH INTERVAL.
LAST_VALUE window frame pitfall: LAST_VALUE's default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, causing it to return the current row's value rather than the last row's. Explicitly specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to get the last value in the partition.