Gaming Operations Data Warehouse Best Practices

Build a multi-layer data warehouse from a full stream of game behavior events to output user LTV segmentation, payment conversion funnels, and retention matrices. Using the Steam game catalog (15 mainstream games as a dimension table) as a base, and simulating player login, level completion, and payment behavior events, this guide provides an end-to-end walkthrough of the complete Kafka PIPE → Bronze → Silver → Gold pipeline, covering four key platform capabilities: Dynamic Table, BITMAP functions, LAG/LEAD window functions, and BloomFilter Index.


Overview

The core challenge in a gaming operations data warehouse is transforming a high-frequency, multi-type behavior event stream into actionable business metrics.

ChallengeSingdata Solution
High-frequency real-time writes of game events (login/payment/level), tens of millions per dayKafka PIPE continuous ingestion — no need to write custom consumers
Client logs uploaded to OSS in daily batches, need automatic ingestionOSS PIPE (LIST_PURGE) — triggers import when files land
Automatic incremental updates across Bronze → Silver (sessionized) → Gold (LTV/funnel)Dynamic Table declarative SQL; system auto-detects upstream changes
user_id and app_id are high-cardinality columns; funnel queries filter by user frequentlyBloomFilter Index for fast determination of whether a user is in a data block
Payment path analysis requires reconstructing preceding and following behavior sequencesLAG/LEAD window functions to reconstruct the decision context around purchases
Daily active user deduplication and N-day retention require cross-day COUNT DISTINCTGROUP_BITMAP family functions for precise cardinality statistics

SQL Commands Used

Command / FunctionPurposeNotes
CREATE TABLECreate Bronze-layer event table and game dimension tableRegular tables, serving as upstream for Dynamic Tables
CREATE BLOOMFILTER INDEXCreate BloomFilter indexes on user_id and app_id columnsHigh-cardinality column point-lookup filtering to reduce scanned data blocks
CREATE PIPECreate a Kafka continuous ingestion pipelineBATCH_INTERVAL controls ingestion latency
CREATE DYNAMIC TABLECreate Silver / Gold layer incremental computation tablesSystem auto-refreshes following the dependency chain
LAG / LEADGet the previous / next event in the same sessionReconstruct payment path (previous step, next step)
GROUP_BITMAPPrecise cardinality statistics (DAU calculation)Returns set cardinality, not the bitmap object itself
GROUP_BITMAP_STATEGenerate daily bitmap state snapshotsUsed for cross-day deduplication merges (MAU)
GROUP_BITMAP_MERGEMerge multiple bitmap statesUsed together with GROUP_BITMAP_STATE
REFRESH DYNAMIC TABLEManually trigger a single refreshUsed during initial table creation or debugging

Prerequisites

All examples in this guide run under the best_practice_gaming_dw schema.

CREATE SCHEMA IF NOT EXISTS best_practice_gaming_dw COMMENT 'Gaming Operations DW Best Practices';


Dimension Table: Game Catalog and Genre

Create Table

The game dimension table comes from the Steam Games Dataset (Kaggle, ~1 GB), containing game name, tags, price, review count, and other metadata. This guide uses 15 popular games as a representative sample.

CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.doc_dim_game ( app_id BIGINT COMMENT 'Steam App ID', name STRING COMMENT 'Game title', release_date DATE COMMENT 'Release date', developer STRING COMMENT 'Developer name', publisher STRING COMMENT 'Publisher name', genres STRING COMMENT 'Comma-separated genre tags', tags STRING COMMENT 'User-defined tags (top 20)', price_usd DOUBLE COMMENT 'Current price in USD', is_free BOOLEAN COMMENT 'True if free-to-play', positive_reviews BIGINT COMMENT 'Number of positive reviews', negative_reviews BIGINT COMMENT 'Number of negative reviews', estimated_owners STRING COMMENT 'Estimated owner range', avg_playtime_forever INT COMMENT 'Average playtime in minutes (lifetime)', ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() ) COMMENT 'Dim: Steam game catalog (sourced from Steam Games Dataset on Kaggle)';

CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.doc_dim_genre ( genre_id INT COMMENT 'Genre surrogate key', genre_name STRING COMMENT 'Genre name', category STRING COMMENT 'Meta-category: Core / Casual / Strategy' ) COMMENT 'Dim: Game genre taxonomy';

After loading the game dimension data (15 popular Steam games), query the top 5:

SELECT app_id, name, genres, price_usd, is_free, positive_reviews FROM best_practice_gaming_dw.doc_dim_game ORDER BY positive_reviews DESC LIMIT 5;

app_id | name | genres | price_usd | is_free | positive_reviews ---------+-----------------------+-----------------------------+-----------+---------+----------------- 730 | Counter-Strike 2 | Action,Free to Play | 0 | true | 1200000 570 | Dota 2 | Action,Free to Play,Strategy| 0 | true | 950000 578080 | PUBG: BATTLEGROUNDS | Action | 29.99 | false | 680000 1151640 | Baldurs Gate 3 | RPG | 59.99 | false | 600000 413150 | Stardew Valley | RPG,Simulation | 14.99 | false | 520000

The Steam game catalog covers both free-to-play (CS2, Dota 2) and paid (PUBG, BG3, Stardew Valley, Cyberpunk 2077) models, with review counts ranging from 140,000 to 1.2 million.


Bronze Layer: Raw Game Event Table

Create Table

The Bronze layer receives the real-time event stream from the Kafka PIPE. The schema covers common attributes for all event types; event-specific attributes are stored as JSON in extra_props.

CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.doc_bronze_game_events ( event_id STRING COMMENT 'Unique event identifier', user_id STRING COMMENT 'Player user ID', app_id BIGINT COMMENT 'Steam App ID (FK to dim_game)', event_type STRING COMMENT 'login / logout / level_complete / purchase / achievement', event_time TIMESTAMP COMMENT 'Client-side event timestamp (UTC)', session_id STRING COMMENT 'Session identifier (pre-assigned by client SDK)', level_id INT COMMENT 'Level/map number (null if not applicable)', level_name STRING COMMENT 'Level name or map name', amount_usd DOUBLE COMMENT 'Payment amount in USD (null if not a purchase)', item_id STRING COMMENT 'Item SKU for purchase events', item_name STRING COMMENT 'Item display name', country_code STRING COMMENT 'ISO 3166-1 alpha-2 country code', device_type STRING COMMENT 'PC / Console / Mobile', client_version STRING COMMENT 'Game client version string', extra_props STRING COMMENT 'JSON blob for event-specific extra properties', ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() ) COMMENT 'Bronze: raw in-game event stream (Kafka PIPE target)';

Create BloomFilter Index

Both the Silver and Gold layers will filter by user_id and app_id. These two columns have high cardinality (user IDs in the millions, game IDs in the tens of thousands), making them good candidates for BloomFilter indexes.

CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_user_id ON TABLE doc_bronze_game_events (user_id); CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_app_id ON TABLE doc_bronze_game_events (app_id);

Configure Kafka PIPE

Game behavior events are collected by the client SDK and sent to a Kafka topic. The PIPE continuously consumes and writes to the Bronze table.

-- First create the raw string receiving table (Kafka PIPE writes JSON strings) CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.kafka_raw_game_events (value STRING); -- Create the Kafka PIPE CREATE PIPE IF NOT EXISTS best_practice_gaming_dw.pipe_game_events VIRTUAL_CLUSTER = 'DEFAULT' BATCH_INTERVAL_IN_SECONDS = '60' AS COPY INTO best_practice_gaming_dw.kafka_raw_game_events FROM ( SELECT CAST(value AS STRING) AS value FROM READ_KAFKA( '<kafka-broker>:9092', -- replace with your actual broker address 'game_tracking_events', -- topic name '', 'cz_gaming_consumer', -- consumer group ID '','','','', 'raw', 'raw', 0, map() ) );

Configure OSS PIPE (Daily Offline Logs)

Client-side cache logs from low-connectivity environments are uploaded to OSS in daily batches. First create a Storage Connection, then mount an External Volume, and finally configure a PIPE to automatically scan and import.

Step 1: Create OSS Storage Connection

-- Replace ACCESS_ID / ACCESS_KEY with RAM user credentials that have OSS read/write permissions -- Replace ENDPOINT with the access address for the OSS bucket's region CREATE STORAGE CONNECTION IF NOT EXISTS best_practice_gaming_dw.gaming_oss_conn TYPE oss ENDPOINT = 'oss-cn-hangzhou.aliyuncs.com' ACCESS_ID = '<your_access_key_id>' ACCESS_KEY = '<your_access_key_secret>';

Step 2: Create an External Volume mounting the OSS path

-- Replace LOCATION with the actual OSS bucket path CREATE EXTERNAL VOLUME IF NOT EXISTS best_practice_gaming_dw.gaming_offline_logs LOCATION 'oss://<your-bucket>/game_offline_logs/' USING CONNECTION best_practice_gaming_dw.gaming_oss_conn DIRECTORY = (enable = true, auto_refresh = true) RECURSIVE = true;

Step 3: Create OSS PIPE

CREATE PIPE IF NOT EXISTS best_practice_gaming_dw.pipe_offline_logs VIRTUAL_CLUSTER = 'DEFAULT' INGEST_MODE = 'LIST_PURGE' AS COPY INTO best_practice_gaming_dw.doc_bronze_game_events FROM VOLUME best_practice_gaming_dw.gaming_offline_logs USING json;

Load Sample Behavior Data

This guide simulates 7 players and 45 events covering five event types: login, logout, level_complete, purchase, and achievement.

Import from local CSV (recommended)

Save the event data as a CSV file and import in bulk via User Volume:

-- Step 1: Upload the local CSV file to User Volume via SQL PUT PUT '/path/to/game_events.csv' TO USER VOLUME FILE 'game_events.csv';

-- Step 2: COPY INTO the table from User Volume COPY INTO best_practice_gaming_dw.doc_bronze_game_events FROM USER VOLUME USING csv OPTIONS('header'='true', 'sep'=',', 'nullValue'='') FILES ('game_events.csv');

You can also insert simulated test data inline (no CSV file required):

INSERT INTO best_practice_gaming_dw.doc_bronze_game_events (event_id, user_id, app_id, event_type, event_time, session_id, level_id, level_name, amount_usd, item_id, item_name, country_code, device_type, client_version, extra_props) VALUES ('E0001','U001',730,'login',CAST('2026-05-20 09:00:00' AS TIMESTAMP),'S001',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'), ('E0002','U001',730,'level_complete',CAST('2026-05-20 09:12:00' AS TIMESTAMP),'S001',1,'de_dust2',NULL,NULL,NULL,'CN','PC','v1.40.1','{"kills":18,"deaths":5,"mvp":true}'), ('E0003','U001',730,'purchase',CAST('2026-05-20 09:15:00' AS TIMESTAMP),'S001',NULL,NULL,1.99,'skin_001','AK-47 | Redline','CN','PC','v1.40.1','{}'), ('E0004','U001',730,'level_complete',CAST('2026-05-20 09:30:00' AS TIMESTAMP),'S001',2,'de_inferno',NULL,NULL,NULL,'CN','PC','v1.40.1','{"kills":22,"deaths":8,"mvp":false}'), ('E0005','U001',730,'logout',CAST('2026-05-20 10:00:00' AS TIMESTAMP),'S001',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'), ('E0006','U001',730,'login',CAST('2026-05-21 09:00:00' AS TIMESTAMP),'S005',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'), ('E0007','U001',730,'level_complete',CAST('2026-05-21 09:30:00' AS TIMESTAMP),'S005',3,'de_mirage',NULL,NULL,NULL,'CN','PC','v1.40.1','{"kills":15,"deaths":6}'), ('E0008','U001',730,'purchase',CAST('2026-05-21 09:36:00' AS TIMESTAMP),'S005',NULL,NULL,14.99,'skin_002','M4A4 | Howl','CN','PC','v1.40.1','{}'), ('E0009','U001',730,'logout',CAST('2026-05-21 10:00:00' AS TIMESTAMP),'S005',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'), ('E0010','U002',570,'login',CAST('2026-05-20 10:00:00' AS TIMESTAMP),'S002',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0011','U002',570,'level_complete',CAST('2026-05-20 11:00:00' AS TIMESTAMP),'S002',1,'tutorial',NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0012','U002',570,'purchase',CAST('2026-05-20 11:10:00' AS TIMESTAMP),'S002',NULL,NULL,4.99,'hero_001','Crystal Maiden Set','US','PC','v7.34','{}'), ('E0013','U002',570,'logout',CAST('2026-05-20 12:30:00' AS TIMESTAMP),'S002',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0014','U002',570,'login',CAST('2026-05-21 10:00:00' AS TIMESTAMP),'S010',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0015','U002',570,'level_complete',CAST('2026-05-21 11:00:00' AS TIMESTAMP),'S010',2,'ranked',NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0016','U002',570,'logout',CAST('2026-05-21 12:00:00' AS TIMESTAMP),'S010',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0017','U003',1172470,'login',CAST('2026-05-20 11:00:00' AS TIMESTAMP),'S003',NULL,NULL,NULL,NULL,NULL,'KR','PC','v3.24','{}'), ('E0018','U003',1172470,'level_complete',CAST('2026-05-20 11:20:00' AS TIMESTAMP),'S003',1,'Kings Canyon',NULL,NULL,NULL,'KR','PC','v3.24','{}'), ('E0019','U003',1172470,'level_complete',CAST('2026-05-20 11:40:00' AS TIMESTAMP),'S003',2,'World Edge',NULL,NULL,NULL,'KR','PC','v3.24','{}'), ('E0020','U003',1172470,'purchase',CAST('2026-05-20 12:10:00' AS TIMESTAMP),'S003',NULL,NULL,9.99,'legend_001','Revenant Legend Skin','KR','PC','v3.24','{}'), ('E0021','U003',1172470,'logout',CAST('2026-05-20 13:00:00' AS TIMESTAMP),'S003',NULL,NULL,NULL,NULL,NULL,'KR','PC','v3.24','{}'), ('E0022','U004',578080,'login',CAST('2026-05-10 10:00:00' AS TIMESTAMP),'S012',NULL,NULL,NULL,NULL,NULL,'US','PC','v30.5','{}'), ('E0023','U004',578080,'logout',CAST('2026-05-10 11:00:00' AS TIMESTAMP),'S012',NULL,NULL,NULL,NULL,NULL,'US','PC','v30.5','{}'), ('E0024','U004',578080,'login',CAST('2026-05-20 14:00:00' AS TIMESTAMP),'S004',NULL,NULL,NULL,NULL,NULL,'US','PC','v31.0','{}'), ('E0025','U004',578080,'level_complete',CAST('2026-05-20 14:30:00' AS TIMESTAMP),'S004',1,'Erangel',NULL,NULL,NULL,'US','PC','v31.0','{}'), ('E0026','U004',578080,'logout',CAST('2026-05-20 15:00:00' AS TIMESTAMP),'S004',NULL,NULL,NULL,NULL,NULL,'US','PC','v31.0','{}'), ('E0027','U005',1091500,'login',CAST('2026-05-18 09:00:00' AS TIMESTAMP),'S006',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0028','U005',1091500,'level_complete',CAST('2026-05-18 10:00:00' AS TIMESTAMP),'S006',1,'Prologue',NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0029','U005',1091500,'purchase',CAST('2026-05-18 10:05:00' AS TIMESTAMP),'S006',NULL,NULL,59.99,'dlc_phantom','Phantom Liberty DLC','DE','PC','v2.12','{}'), ('E0030','U005',1091500,'logout',CAST('2026-05-18 11:00:00' AS TIMESTAMP),'S006',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0031','U005',1091500,'login',CAST('2026-05-19 09:00:00' AS TIMESTAMP),'S007',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0032','U005',1091500,'level_complete',CAST('2026-05-19 11:00:00' AS TIMESTAMP),'S007',1,'Heist',NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0033','U005',1091500,'logout',CAST('2026-05-19 12:00:00' AS TIMESTAMP),'S007',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0034','U005',1091500,'login',CAST('2026-05-20 09:00:00' AS TIMESTAMP),'S008',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0035','U005',1091500,'purchase',CAST('2026-05-20 09:05:00' AS TIMESTAMP),'S008',NULL,NULL,9.99,'stash_001','Stash Tab','DE','PC','v2.12','{}'), ('E0036','U005',1091500,'logout',CAST('2026-05-20 10:30:00' AS TIMESTAMP),'S008',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0037','U005',1091500,'achievement',CAST('2026-05-20 09:10:00' AS TIMESTAMP),'S008',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{"trophy":"platinum"}'), ('E0038','U006',892970,'login',CAST('2026-05-20 13:00:00' AS TIMESTAMP),'S009',NULL,NULL,NULL,NULL,NULL,'JP','Console','v1.15','{}'), ('E0039','U006',892970,'level_complete',CAST('2026-05-20 14:00:00' AS TIMESTAMP),'S009',1,'Limgrave',NULL,NULL,NULL,'JP','Console','v1.15','{}'), ('E0040','U006',892970,'logout',CAST('2026-05-20 14:00:00' AS TIMESTAMP),'S009',NULL,NULL,NULL,NULL,NULL,'JP','Console','v1.15','{}'), ('E0041','U007',2512000,'login',CAST('2026-05-21 14:00:00' AS TIMESTAMP),'S011',NULL,NULL,NULL,NULL,NULL,'BR','PC','v2.1.0','{}'), ('E0042','U007',2512000,'purchase',CAST('2026-05-21 14:25:00' AS TIMESTAMP),'S011',NULL,NULL,29.99,'hero_tank','Reinhardt Skin','BR','PC','v2.1.0','{}'), ('E0043','U007',2512000,'purchase',CAST('2026-05-21 14:30:00' AS TIMESTAMP),'S011',NULL,NULL,29.99,'hero_healer','Mercy Skin','BR','PC','v2.1.0','{}'), ('E0044','U007',2512000,'level_complete',CAST('2026-05-21 14:20:00' AS TIMESTAMP),'S011',1,'Control Point',NULL,NULL,NULL,'BR','PC','v2.1.0','{}'), ('E0045','U007',2512000,'logout',CAST('2026-05-21 16:00:00' AS TIMESTAMP),'S011',NULL,NULL,NULL,NULL,NULL,'BR','PC','v2.1.0','{}');

Verify the Bronze layer data distribution:

SELECT event_type, COUNT(*) AS cnt FROM best_practice_gaming_dw.doc_bronze_game_events GROUP BY event_type ORDER BY cnt DESC;

event_type | cnt ----------------+---- login | 12 level_complete | 12 logout | 12 purchase | 8 achievement | 1

The Bronze layer has 45 events covering 5 types. There are 8 purchase events (payment rate: 8 purchases / 12 logins ≈ 67% — note this is a high payment rate in the simulated scenario; in production it is typically 2–5%).


Silver Layer: Sessionized User Behavior Sequences

The Silver layer has two Dynamic Tables: silver_user_sessions (session aggregation) and silver_event_sequence (event stream with preceding/following sequences).

Session Aggregation Table

Aggregate at session_id granularity: compute session duration, levels completed, and payment amount, and enrich with game dimension data (game name and genre).

CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_silver_user_sessions REFRESH INTERVAL 5 MINUTE VCLUSTER DEFAULT COMMENT 'Silver: sessionized user behavior with 30-min idle cutoff' AS SELECT e.session_id, e.user_id, e.app_id, g.name AS game_name, g.genres AS game_genres, g.is_free AS game_is_free, e.country_code, e.device_type, MIN(e.event_time) AS session_start, MAX(e.event_time) AS session_end, TIMESTAMPDIFF(SECOND, MIN(e.event_time), MAX(e.event_time)) / 60.0 AS session_duration_min, COUNT(*) AS total_events, SUM(CASE WHEN e.event_type = 'level_complete' THEN 1 ELSE 0 END) AS levels_completed, SUM(CASE WHEN e.event_type = 'purchase' THEN 1 ELSE 0 END) AS purchase_count, SUM(COALESCE(e.amount_usd, 0.0)) AS session_revenue_usd, MAX(CASE WHEN e.event_type = 'purchase' THEN 1 ELSE 0 END) AS has_purchase, CAST(MIN(e.event_time) AS DATE) AS session_date FROM best_practice_gaming_dw.doc_bronze_game_events e LEFT JOIN best_practice_gaming_dw.doc_dim_game g ON e.app_id = g.app_id GROUP BY e.session_id, e.user_id, e.app_id, g.name, g.genres, g.is_free, e.country_code, e.device_type;

REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_silver_user_sessions;

Query session summary:

SELECT session_id, user_id, game_name, session_duration_min, levels_completed, purchase_count, session_revenue_usd FROM best_practice_gaming_dw.doc_silver_user_sessions ORDER BY session_start;

session_id | user_id | game_name | duration_min | levels | purchases | revenue -----------+---------+--------------------+--------------+--------+-----------+-------- S012 | U004 | PUBG: BATTLEGROUNDS| 60.00 | 0 | 0 | 0 S006 | U005 | Cyberpunk 2077 | 120.00 | 1 | 1 | 59.99 S007 | U005 | Cyberpunk 2077 | 180.00 | 1 | 0 | 0 S008 | U005 | Cyberpunk 2077 | 90.00 | 0 | 1 | 9.99 S001 | U001 | Counter-Strike 2 | 60.00 | 2 | 1 | 1.99 S002 | U002 | Dota 2 | 150.00 | 1 | 1 | 4.99 S003 | U003 | Apex Legends | 120.00 | 2 | 1 | 9.99 S009 | U006 | ELDEN RING | 60.00 | 1 | 0 | 0 S004 | U004 | PUBG: BATTLEGROUNDS| 60.00 | 1 | 0 | 0 S005 | U001 | Counter-Strike 2 | 60.00 | 1 | 1 | 14.99 S010 | U002 | Dota 2 | 120.00 | 1 | 0 | 0 S011 | U007 | Overwatch 2 | 120.00 | 1 | 2 | 59.98

7 out of 12 sessions generated purchases, giving a session payment rate of 58.3%. Total session durations range from 60 minutes (single-match scenarios) to 180 minutes (deep RPG gameplay).

Event Sequence Table (LAG/LEAD Payment Path Analysis)

Use LAG and LEAD to reconstruct the preceding and following steps for each event, as well as the cumulative number of levels completed, providing a foundation for payment decision path analysis.

CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_silver_event_sequence REFRESH INTERVAL 5 MINUTE VCLUSTER DEFAULT COMMENT 'Silver: event sequence with LAG/LEAD for payment path analysis' AS SELECT event_id, user_id, app_id, event_type, event_time, session_id, amount_usd, item_id, item_name, country_code, -- Previous event in the same session LAG(event_type, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS prev_event_type, LAG(event_time, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS prev_event_time, -- Next event in the same session LEAD(event_type, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS next_event_type, LEAD(event_time, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS next_event_time, -- Seconds since the previous event TIMESTAMPDIFF(SECOND, LAG(event_time, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time), event_time ) AS seconds_since_prev, -- Cumulative levels completed before this purchase (global per user across sessions) SUM(CASE WHEN event_type = 'level_complete' THEN 1 ELSE 0 END) OVER (PARTITION BY user_id ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS levels_before, CAST(event_time AS DATE) AS event_date FROM best_practice_gaming_dw.doc_bronze_game_events;

REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_silver_event_sequence;

Query the preceding and following steps for all purchase events:

SELECT user_id, event_type, prev_event_type AS prev_step, next_event_type AS next_step, levels_before AS levels_done_before_pay, seconds_since_prev AS secs_to_decide FROM best_practice_gaming_dw.doc_silver_event_sequence WHERE event_type = 'purchase' ORDER BY event_time;

user_id | event_type | prev_step | next_step | levels_before | secs_to_decide --------+------------+----------------+----------------+---------------+--------------- U005 | purchase | level_complete | logout | 1 | 300 U005 | purchase | login | achievement | 2 | 300 U001 | purchase | level_complete | level_complete | 1 | 180 U002 | purchase | level_complete | logout | 1 | 600 U003 | purchase | level_complete | logout | 2 | 1800 U001 | purchase | level_complete | logout | 3 | 360 U007 | purchase | level_complete | purchase | 1 | 300 U007 | purchase | purchase | logout | 1 | 300

Further aggregate — distribution of pre-purchase steps:

SELECT prev_event_type AS prev_step, COUNT(*) AS pay_count, ROUND(AVG(seconds_since_prev), 0) AS avg_decision_secs FROM best_practice_gaming_dw.doc_silver_event_sequence WHERE event_type = 'purchase' GROUP BY prev_event_type ORDER BY pay_count DESC;

prev_step | pay_count | avg_decision_secs ----------------+-----------+------------------ level_complete | 6 | 590 purchase | 1 | 300 login | 1 | 300

Result interpretation: 75% of purchases occur after completing a level (level_complete → purchase) with an average decision time of 590 seconds (about 10 minutes). Another 12.5% of purchases happen immediately after login — these users are high-intent players who came back specifically to top up, with an average wait of 300 seconds. This distribution can be used to optimize trigger timing: push item cards 5–10 minutes after level_complete.


Gold Layer: LTV Segmentation, Funnel, and Retention Matrix

User LTV Segmentation

Aggregate at user_id granularity to compute cumulative payments, active days, and game variety, and output LTV segmentation (Whale/Dolphin/Minnow/Free) along with a composite engagement score.

CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_gold_user_ltv REFRESH INTERVAL 10 MINUTE VCLUSTER DEFAULT COMMENT 'Gold: user LTV segmentation - total spend, sessions, active days' AS SELECT s.user_id, COUNT(DISTINCT s.session_id) AS total_sessions, COUNT(DISTINCT s.session_date) AS active_days, SUM(s.session_duration_min) AS total_playtime_min, SUM(s.session_revenue_usd) AS total_revenue_usd, COUNT(DISTINCT s.app_id) AS distinct_games_played, MAX(s.session_date) AS last_active_date, MIN(s.session_date) AS first_active_date, -- LTV tier: based on cumulative spend amount CASE WHEN SUM(s.session_revenue_usd) >= 50 THEN 'Whale' WHEN SUM(s.session_revenue_usd) >= 10 THEN 'Dolphin' WHEN SUM(s.session_revenue_usd) > 0 THEN 'Minnow' ELSE 'Free' END AS ltv_tier, -- Engagement score: 40% active days + 40% playtime + 20% spend ROUND( 0.4 * LEAST(COUNT(DISTINCT s.session_date) / 7.0, 1.0) * 100 + 0.4 * LEAST(SUM(s.session_duration_min) / 300.0, 1.0) * 100 + 0.2 * LEAST(SUM(s.session_revenue_usd) / 20.0, 1.0) * 100, 1) AS engagement_score FROM best_practice_gaming_dw.doc_silver_user_sessions s GROUP BY s.user_id;

REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_gold_user_ltv;

Query LTV segmentation for each user:

SELECT user_id, total_sessions, active_days, total_playtime_min, total_revenue_usd, ltv_tier, engagement_score FROM best_practice_gaming_dw.doc_gold_user_ltv ORDER BY total_revenue_usd DESC;

user_id | sessions | active_days | playtime_min | revenue_usd | ltv_tier | engagement_score --------+----------+-------------+--------------+-------------+----------+----------------- U005 | 3 | 3 | 390.00 | 69.98 | Whale | 77.1 U007 | 1 | 1 | 120.00 | 59.98 | Whale | 41.7 U001 | 2 | 2 | 120.00 | 16.98 | Dolphin | 44.4 U003 | 1 | 1 | 120.00 | 9.99 | Minnow | 31.7 U002 | 2 | 2 | 270.00 | 4.99 | Minnow | 52.4 U004 | 2 | 2 | 120.00 | 0 | Free | 27.4 U006 | 1 | 1 | 60.00 | 0 | Free | 13.7

Result interpretation: 2 of the 7 users are Whales (cumulative spend ≥ $50), contributing 71.3% of total revenue (129.96/161.94) — the classic "80/20 effect" in F2P games. U005 has the highest engagement score (77.1) due to 3 consecutive active days and the longest playtime, confirming a positive correlation between high engagement and high spend.

LTV tier summary:

SELECT ltv_tier, COUNT(*) AS user_count, ROUND(SUM(total_revenue_usd), 2) AS total_rev, ROUND(AVG(total_revenue_usd), 2) AS avg_rev FROM best_practice_gaming_dw.doc_gold_user_ltv GROUP BY ltv_tier ORDER BY avg_rev DESC;

ltv_tier | user_count | total_rev | avg_rev ---------+------------+-----------+-------- Whale | 2 | 129.96 | 64.98 Dolphin | 1 | 16.98 | 16.98 Minnow | 2 | 14.98 | 7.49 Free | 2 | 0 | 0

Calculate DAU Using BITMAP Functions

BITMAP functions are suited for computing DAU per day and merging across days to compute MAU. Prerequisite: user IDs must be converted to integers (CAST(SUBSTR(user_id, 2) AS INT) converts "U001" to 1).

Approach 1: GROUP_BITMAP for direct daily unique user count

SELECT CAST(event_time AS DATE) AS event_date, GROUP_BITMAP(CAST(SUBSTR(user_id, 2) AS INT)) AS dau FROM best_practice_gaming_dw.doc_bronze_game_events GROUP BY CAST(event_time AS DATE) ORDER BY event_date;

event_date | dau ------------+---- 2026-05-10 | 1 2026-05-18 | 1 2026-05-19 | 1 2026-05-20 | 6 2026-05-21 | 3

Approach 2: GROUP_BITMAP_STATE + GROUP_BITMAP_MERGE for cross-day deduplication (MAU scenario)

GROUP_BITMAP_STATE generates a daily bitmap state object; GROUP_BITMAP_MERGE merges multi-day bitmaps and counts cross-day deduplicated users:

WITH daily_bitmaps AS ( SELECT CAST(event_time AS DATE) AS event_date, GROUP_BITMAP_STATE(CAST(SUBSTR(user_id, 2) AS INT)) AS bm FROM best_practice_gaming_dw.doc_bronze_game_events GROUP BY CAST(event_time AS DATE) ) SELECT event_date, GROUP_BITMAP_MERGE(bm) AS cumulative_unique_users FROM daily_bitmaps GROUP BY event_date ORDER BY event_date;

event_date | cumulative_unique_users ------------+------------------------ 2026-05-10 | 1 2026-05-18 | 1 2026-05-19 | 1 2026-05-20 | 6 2026-05-21 | 3

Payment Conversion Funnel

Compute daily conversion rates for the three-step funnel: "login → level engagement → completed purchase".

CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_gold_payment_funnel REFRESH INTERVAL 10 MINUTE VCLUSTER DEFAULT COMMENT 'Gold: payment conversion funnel - login → engage → purchase' AS WITH user_daily AS ( SELECT user_id, event_date, MAX(CASE WHEN event_type = 'login' THEN 1 ELSE 0 END) AS has_login, MAX(CASE WHEN event_type = 'level_complete' THEN 1 ELSE 0 END) AS has_level, MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS has_purchase, SUM(COALESCE(amount_usd, 0.0)) AS day_revenue FROM best_practice_gaming_dw.doc_silver_event_sequence GROUP BY user_id, event_date ), funnel_agg AS ( SELECT event_date, SUM(has_login) AS step1_login, SUM(has_level) AS step2_engage, SUM(has_purchase) AS step3_purchase, SUM(day_revenue) AS total_revenue FROM user_daily GROUP BY event_date ) SELECT event_date, step1_login, step2_engage, step3_purchase, ROUND(step2_engage * 100.0 / NULLIF(step1_login, 0), 1) AS engage_rate_pct, ROUND(step3_purchase * 100.0 / NULLIF(step2_engage, 0), 1) AS purchase_rate_pct, ROUND(step3_purchase * 100.0 / NULLIF(step1_login, 0), 1) AS overall_conversion_pct, total_revenue FROM funnel_agg ORDER BY event_date;

REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_gold_payment_funnel;

SELECT * FROM best_practice_gaming_dw.doc_gold_payment_funnel ORDER BY event_date;

event_date | login | engage | purchase | engage_rate | pay_rate | cvr_overall | revenue ------------+-------+--------+----------+-------------+----------+-------------+-------- 2026-05-10 | 1 | 0 | 0 | 0.0 | null | 0.0 | 0 2026-05-18 | 1 | 1 | 1 | 100.0 | 100.0 | 100.0 | 59.99 2026-05-19 | 1 | 1 | 0 | 100.0 | 0.0 | 0.0 | 0 2026-05-20 | 6 | 5 | 4 | 83.3 | 80.0 | 66.7 | 26.96 2026-05-21 | 3 | 3 | 2 | 100.0 | 66.7 | 66.7 | 74.97

User Retention Matrix

Use the user's first login date as the cohort to compute D1 and D7 retention rates.

CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_gold_retention_matrix REFRESH INTERVAL 10 MINUTE VCLUSTER DEFAULT COMMENT 'Gold: N-day retention cohort matrix' AS WITH first_login AS ( SELECT user_id, MIN(CAST(event_time AS DATE)) AS cohort_date FROM best_practice_gaming_dw.doc_bronze_game_events WHERE event_type = 'login' GROUP BY user_id ), active_days AS ( SELECT DISTINCT e.user_id, CAST(e.event_time AS DATE) AS active_date, f.cohort_date, DATEDIFF(CAST(e.event_time AS DATE), f.cohort_date) AS day_offset FROM best_practice_gaming_dw.doc_bronze_game_events e JOIN first_login f ON e.user_id = f.user_id WHERE DATEDIFF(CAST(e.event_time AS DATE), f.cohort_date) BETWEEN 0 AND 30 ) SELECT cohort_date, COUNT(DISTINCT CASE WHEN day_offset = 0 THEN user_id END) AS d0_users, COUNT(DISTINCT CASE WHEN day_offset = 1 THEN user_id END) AS d1_retained, COUNT(DISTINCT CASE WHEN day_offset = 7 THEN user_id END) AS d7_retained, ROUND(COUNT(DISTINCT CASE WHEN day_offset = 1 THEN user_id END) * 100.0 / NULLIF(COUNT(DISTINCT CASE WHEN day_offset = 0 THEN user_id END), 0), 1) AS d1_retention_pct, ROUND(COUNT(DISTINCT CASE WHEN day_offset = 7 THEN user_id END) * 100.0 / NULLIF(COUNT(DISTINCT CASE WHEN day_offset = 0 THEN user_id END), 0), 1) AS d7_retention_pct FROM active_days GROUP BY cohort_date ORDER BY cohort_date;

REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_gold_retention_matrix;

SELECT * FROM best_practice_gaming_dw.doc_gold_retention_matrix ORDER BY cohort_date;

cohort_date | d0_users | d1_retained | d7_retained | d1_pct | d7_pct ------------+----------+-------------+-------------+--------+------- 2026-05-10 | 1 | 0 | 0 | 0.0 | 0.0 2026-05-18 | 1 | 1 | 0 | 100.0 | 0.0 2026-05-20 | 4 | 2 | 0 | 50.0 | 0.0 2026-05-21 | 1 | 0 | 0 | 0.0 | 0.0

Result interpretation: The user cohorted on May 18 (U005) had 100% D1 retention (3 consecutive active days). Of the 4 users cohorted on May 20, 2 returned the next day (D1 retention 50%). The simulated data covers a short time span (within 10 days), so D7 retention is 0 for all cohorts. In production, at least 7 days of user behavior data must accumulate before D7 retention is meaningful.


Notes

  • Kafka PIPE DDL attempts broker connection: When CREATE PIPE is executed, the system validates that the broker address and topic are reachable. If Kafka is not available in your development environment, create the target table first, then simulate the ingestion process with direct INSERT after the PIPE is created.

  • BLOOMFILTER INDEX does not apply to existing data after creation: CREATE BLOOMFILTER INDEX only applies to newly written data blocks. The BLOOMFILTER index type does not support BUILD INDEX for existing data; consider using INVERTED INDEX for range query scenarios.

  • GROUP_BITMAP requires integer input: If user IDs are strings (e.g., "U001"), they must be converted to integers before using BITMAP functions. A common approach is to maintain a user_id → int_id mapping table in the Bronze layer and use int_id for BITMAP operations in the Silver layer.

  • "Payment rate > 100%" is normal: When the same user makes multiple purchases on the same day, calculating "number of payments / number of logged-in users" can exceed 100%. Reports should provide both "number of paying users" (COUNT DISTINCT user_id) and "number of purchases" to avoid misinterpretation.

  • Dynamic Table first refresh requires manual trigger: After CREATE DYNAMIC TABLE, the system does not immediately execute the first computation. Run REFRESH DYNAMIC TABLE <table> to trigger it. Subsequent refreshes happen automatically at REFRESH INTERVAL.

  • LAST_VALUE window frame pitfall: LAST_VALUE's default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, causing it to return the current row's value rather than the last row's. Explicitly specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to get the last value in the partition.