Building a Retail Chain POS Operations Data Warehouse

Integrate POS transaction data and inventory data from 100 chain stores nationwide to build a store operations analysis and SKU turnover data warehouse. Using the Retail Data Warehouse 12-Table Dataset, this guide provides a complete end-to-end walkthrough from ODS raw POS transactions to ADS fast/slow-moving SKU analysis and store rankings. It covers six core platform capabilities: MySQL CDC multi-table full database mirroring, Dynamic Table incremental aggregation, static partitioning by store_id, BloomFilter Index, External Schema (Hive), and Time Travel month-end reconciliation.


Overview

Core challenges in a retail chain data warehouse:

ChallengeSingdata Solution
POS systems across regional stores use separate databases with sharded tables — need unified consolidationMySQL CDC multi-table full database mirroring, merging sharded databases into a single ODS layer
Multi-layer aggregation across ODS→DWD→DWS→ADS, error-prone with manual schedulingDynamic Table cascading refresh with declarative SQL; system automatically maintains the dependency chain
Historical order queries for a single store are slow; cross-partition full scans are expensivePARTITIONED BY (store_id) + SESSION_CONFIGS parameterized partitioning, refreshing per store
product_id in order line items is a high-cardinality column with frequent point lookupsBloomFilter Index for millisecond-level skipping of non-matching data blocks
Years of historical data still in a Hive cluster — don't want to migrate itExternal Schema (Hive) for direct SQL queries against external historical tables
Month-end reconciliation requires comparing historical sales snapshots with current system dataTime Travel with TIMESTAMP AS OF to roll back to any historical version

SQL Commands Used

Command / FeaturePurposeNotes
CREATE TABLECreate ODS raw POS transaction tables and dimension tablesRegular tables, serving as upstream for Dynamic Tables
CREATE BLOOMFILTER INDEXCreate a BloomFilter index on the product_id columnSuitable for point-lookup filtering on high-cardinality columns
CREATE DYNAMIC TABLECreate DWD / DWS / ADS incremental computation tablesDeclarative SQL, system auto incremental refresh
PARTITIONED BY + SESSION_CONFIGSRefresh DWS layer with static partitioning by store IDParameterized partitioning for precise historical data backfill per store
REFRESH DYNAMIC TABLE ... PARTITIONRefresh a specific store partition in DWSStudio Task scheduled by store + date granularity
CREATE EXTERNAL SCHEMAMount Hive historical archive dataTwo-level name query without data migration
DESC HISTORYView table version historyReturns timestamp and row count changes for each version
SELECT ... TIMESTAMP AS OFMonth-end reconciliation: roll back to historical snapshotLocate order totals and revenue at month-end cutoff

Prerequisites

All examples run under the best_practice_retail_pos schema.

CREATE SCHEMA IF NOT EXISTS best_practice_retail_pos;

Download and unzip the dataset locally:

kaggle datasets download -d datarspectrum/retail-data-warehouse-12-table-1m-rows-dataset \ --unzip -p /tmp/retail_pos/

After unzipping, you get 12 CSV files: stores.csv (100 rows), orders.csv (300k rows), order_items.csv (600k rows), products.csv (10k rows), etc. This guide uses 100 stores, 100 orders, and 200 order line items as the demo dataset.


ODS Layer: Multi-Store POS Raw Data

The ODS layer directly receives raw data from regional POS systems. Tables are created with natural distribution by business primary key — no aggregation or transformation.

MySQL CDC Ingestion

In production, each regional POS system is typically an independent MySQL instance. A Studio multi-table real-time sync task (task_type=281) uses CDC to sync Binlog changes in real time to the ODS layer.

Source MySQL preparation

Confirm the following parameters are correctly configured on the source MySQL:

ParameterRequired valueHow to check
log_binONSHOW GLOBAL VARIABLES LIKE 'log_bin'
binlog_formatROWSHOW GLOBAL VARIABLES LIKE 'binlog_format'
binlog_row_imageFULLSHOW GLOBAL VARIABLES LIKE 'binlog_row_image'

The sync account needs the following permissions:

GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'cdc_user'@'%';

Studio multi-table real-time sync configuration steps

  1. Add the MySQL data source in Studio datasource management (one data source per regional database, e.g., ds_mysql_pos_north, ds_mysql_pos_south)
  2. Create a new task in Studio Development → Multi-table real-time sync, selecting the sync mode:
    • Full database mirror: database-level granularity, auto-adapts to new tables, suitable for initial full database ingestion
    • Multi-table mirror: sync selected tables, suitable when only some tables are needed
  3. Configure the source data source, select the target workspace and schema (best_practice_retail_pos)
  4. Set sync_mode to Full sync + incremental, which pulls historical data first then switches to CDC
  5. Submit the task and manually start it in Studio

After the task starts, it goes through three phases: initialize → full sync → incremental CDC. End-to-end latency in the incremental phase is seconds. Multiple regional databases can have separate tasks writing to their respective tables under the same schema.

Create Tables

Store master table (partitioned)

CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_stores ( store_id INT, city STRING );

Product dimension tables

CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_categories ( category_id INT, category_name STRING ); CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_products ( product_id INT, category_id INT, supplier_id INT, price DOUBLE ); CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_promotions ( promotion_id INT, discount INT -- discount percentage, e.g. 24 means 24% off (76% of original price) );

Customer and order tables

CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_customers ( customer_id INT, city STRING, signup_date DATE ); CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_orders ( order_id INT, customer_id INT, store_id INT, order_date DATE, promotion_id INT ) PARTITIONED BY (store_id); CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_order_items ( order_item_id INT, order_id INT, product_id INT, qty INT, price DOUBLE );

Payment and return tables

CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_payments ( payment_id INT, order_id INT, amount DOUBLE ); CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_returns ( return_id INT, order_item_id INT, refund DOUBLE );

Load Data

Import Kaggle CSV files via cz-cli (replace with PIPE auto-ingestion in production).

Import from local CSV (recommended)

Save each table's data as a CSV file, then import in bulk via User Volume:

-- Step 1: Upload the local CSV file to User Volume via SQL PUT PUT '/path/to/stores.csv' TO USER VOLUME FILE 'stores.csv';

-- Step 2: COPY INTO the table from User Volume COPY INTO best_practice_retail_pos.doc_ods_stores FROM USER VOLUME USING csv OPTIONS('header'='true', 'sep'=',', 'nullValue'='') FILES ('stores.csv');

Repeat the above steps for all other tables (doc_ods_orders, doc_ods_order_items, doc_ods_products, doc_ods_customers, doc_ods_payments, doc_ods_returns, etc.), uploading the corresponding CSV files and running COPY INTO for each.

You can also insert a small batch of test data inline (no CSV file required):

-- Example: load store master data (100 stores) INSERT INTO best_practice_retail_pos.doc_ods_stores VALUES (1,'Pune'),(2,'Pune'),(3,'Delhi'),(4,'Mumbai'),(5,'Mumbai'), -- ... 100 rows total (100,'Delhi'); -- Load orders (100) and order items (200) INSERT INTO best_practice_retail_pos.doc_ods_orders VALUES (1,45,33,CAST('2021-08-26' AS DATE),24), (2,10,81,CAST('2022-03-19' AS DATE),3), -- ... (100,31,63,CAST('2022-03-30' AS DATE),33);

Verify ODS table row counts:

SELECT (SELECT COUNT(*) FROM best_practice_retail_pos.doc_ods_stores) AS stores, (SELECT COUNT(*) FROM best_practice_retail_pos.doc_ods_orders) AS orders, (SELECT COUNT(*) FROM best_practice_retail_pos.doc_ods_order_items) AS items, (SELECT COUNT(*) FROM best_practice_retail_pos.doc_ods_products) AS products, (SELECT COUNT(*) FROM best_practice_retail_pos.doc_ods_payments) AS payments, (SELECT COUNT(*) FROM best_practice_retail_pos.doc_ods_returns) AS returns;

stores | orders | items | products | payments | returns -------+--------+-------+----------+----------+-------- 100 | 100 | 200 | 100 | 100 | 20

Create BloomFilter Index

Filtering order line items by product_id is a frequent operation (SKU sales ranking, fast/slow-moving SKU analysis). The product_id column has cardinality on the order of the product count, making it a good candidate for a BloomFilter Index:

CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_product_id ON TABLE doc_ods_order_items (product_id);

External Schema: Accessing Hive Historical Archive Data

When historical archive data is still in a Hive cluster, query it directly via External Schema without migrating:

-- Step 1: Create a Catalog Connection pointing to Hive Metastore CREATE CATALOG CONNECTION IF NOT EXISTS conn_hive_pos TYPE HMS HIVE_METASTORE_URIS = 'thrift://hive-metastore:9083' STORAGE_CONNECTION = 'conn_oss_archive'; -- Step 2: Mount the Hive database as an External Schema CREATE EXTERNAL SCHEMA IF NOT EXISTS pos_hive_archive CONNECTION conn_hive_pos OPTIONS (SCHEMA = 'pos_archive_db'); -- Step 3: Query historical archived orders directly (two-level naming, like a local table) SELECT order_id, store_id, order_date, total_amount FROM pos_hive_archive.historical_orders WHERE store_id = 33 AND order_date >= CAST('2019-01-01' AS DATE) ORDER BY order_date DESC LIMIT 5;


DWD Layer: Standardized Sales Detail

The DWD layer JOINs multiple ODS fact tables with dimension tables to produce a wide table with derived discounted amounts and return flags, forming a complete view of each transaction.

Create Table

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_retail_pos.doc_dwd_sales_detail AS SELECT oi.order_item_id, o.order_id, o.store_id, s.city AS store_city, o.order_date, o.customer_id, c.city AS customer_city, oi.product_id, p.category_id, cat.category_name, p.price AS list_price, oi.qty, oi.price AS unit_price, CAST(oi.qty * oi.price AS DOUBLE) AS gross_amount, COALESCE(pr.discount, 0) AS discount_pct, ROUND(oi.qty * oi.price * (1.0 - COALESCE(pr.discount, 0) / 100.0), 2) AS net_amount, CASE WHEN r.return_id IS NOT NULL THEN 1 ELSE 0 END AS is_returned, COALESCE(r.refund, 0) AS refund_amount FROM best_practice_retail_pos.doc_ods_order_items oi JOIN best_practice_retail_pos.doc_ods_orders o ON oi.order_id = o.order_id JOIN best_practice_retail_pos.doc_ods_stores s ON o.store_id = s.store_id JOIN best_practice_retail_pos.doc_ods_customers c ON o.customer_id = c.customer_id JOIN best_practice_retail_pos.doc_ods_products p ON oi.product_id = p.product_id JOIN best_practice_retail_pos.doc_ods_categories cat ON p.category_id = cat.category_id LEFT JOIN best_practice_retail_pos.doc_ods_promotions pr ON o.promotion_id = pr.promotion_id LEFT JOIN best_practice_retail_pos.doc_ods_returns r ON oi.order_item_id = r.order_item_id;

Manually trigger the initial refresh:

REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_dwd_sales_detail;

Verify the result:

SELECT COUNT(*) AS dwd_rows FROM best_practice_retail_pos.doc_dwd_sales_detail;

dwd_rows -------- 200

Preview a few rows to confirm net_amount discount calculation is correct:

SELECT order_item_id, store_id, store_city, order_date, product_id, category_name, qty, unit_price, gross_amount, discount_pct, net_amount, is_returned FROM best_practice_retail_pos.doc_dwd_sales_detail ORDER BY order_item_id LIMIT 5;

order_item_id | store_id | store_city | order_date | product_id | category_name | qty | unit_price | gross_amount | discount_pct | net_amount | is_returned --------------+----------+------------+------------+------------+---------------+-----+------------+--------------+--------------+------------+------------- 1 | 33 | Delhi | 2021-08-26 | 72 | Cat_12 | 3 | 176 | 528 | 5 | 501.6 | 0 2 | 33 | Delhi | 2021-08-26 | 10 | Cat_29 | 2 | 316 | 632 | 5 | 600.4 | 0 3 | 81 | Mumbai | 2022-03-19 | 45 | Cat_16 | 1 | 1345 | 1345 | 27 | 981.85 | 0 4 | 17 | Delhi | 2021-01-21 | 23 | Cat_15 | 4 | 2116 | 8464 | 34 | 5586.24 | 0 5 | 85 | Hyderabad | 2021-01-16 | 87 | Cat_24 | 2 | 4567 | 9134 | 22 | 7124.52 | 0

Result interpretation: Row 4 (product 23, promotion discount 34%) has net_amount (5586.24) = 8464 × (1 - 0.34) — the discount calculation is correct. Row 3 has is_returned = 0, indicating this order item was not returned; the LEFT JOIN to doc_ods_returns returned NULL, which was converted to 0 by CASE WHEN.


DWS Layer: Store Daily Sales Summary

The DWS layer aggregates DWD details at store + transaction date granularity to support daily reports, weekly trends, and period-over-period analysis.

Store Daily Summary Dynamic Table

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_retail_pos.doc_dws_store_daily_sales AS SELECT store_id, store_city, order_date, COUNT(DISTINCT order_id) AS order_count, COUNT(order_item_id) AS item_count, SUM(qty) AS total_qty, ROUND(SUM(gross_amount), 2) AS gross_revenue, ROUND(SUM(net_amount), 2) AS net_revenue, ROUND(AVG(discount_pct), 2) AS avg_discount_pct, SUM(is_returned) AS return_count, ROUND(SUM(refund_amount), 2) AS total_refund, COUNT(DISTINCT product_id) AS sku_count FROM best_practice_retail_pos.doc_dwd_sales_detail GROUP BY store_id, store_city, order_date;

REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_dws_store_daily_sales; SELECT store_id, store_city, order_date, order_count, total_qty, gross_revenue, net_revenue, avg_discount_pct, return_count, sku_count FROM best_practice_retail_pos.doc_dws_store_daily_sales ORDER BY net_revenue DESC LIMIT 8;

store_id | store_city | order_date | order_count | total_qty | gross_revenue | net_revenue | avg_discount_pct | return_count | sku_count ---------+------------+------------+-------------+-----------+---------------+-------------+------------------+--------------+---------- 17 | Delhi | 2021-12-09 | 1 | 6 | 25089 | 22830.99 | 9 | 0 | 2 24 | Pune | 2021-08-05 | 1 | 6 | 22694 | 21105.42 | 7 | 0 | 2 68 | Pune | 2022-01-14 | 1 | 5 | 21725 | 20638.75 | 5 | 0 | 2 54 | Bangalore | 2021-04-26 | 1 | 7 | 21983 | 19564.87 | 11 | 0 | 2 43 | Bangalore | 2022-08-17 | 1 | 5 | 20948 | 18853.2 | 10 | 0 | 2 1 | Pune | 2023-11-27 | 1 | 6 | 22167 | 18398.61 | 17 | 0 | 2 99 | Bangalore | 2022-02-04 | 1 | 5 | 18178 | 17087.32 | 6 | 0 | 2 55 | Hyderabad | 2022-11-06 | 1 | 9 | 31919 | 16934.26 | 25.5 | 0 | 2

Result interpretation: Delhi store 17 (2021-12-09) tops the single-day chart with ¥22,831 net revenue and a discount rate of only 9%, indicating low promotional intensity but high average transaction value around the holiday period. Hyderabad store 55 (discount rate 25.5%) had higher transaction volume (qty=9) but net revenue was compressed by the discount.

Static Partition Refresh by Store

When you need to precisely backfill a specific store's data for a specific month, use PARTITIONED BY (store_id) + SESSION_CONFIGS parameterized partitioning:

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_retail_pos.doc_dws_store_date_partition ( store_id, order_date, order_count, item_count, total_qty, net_revenue ) PARTITIONED BY (store_id) AS SELECT store_id, order_date, COUNT(DISTINCT order_id) AS order_count, COUNT(order_item_id) AS item_count, SUM(qty) AS total_qty, ROUND(SUM(net_amount), 2) AS net_revenue FROM best_practice_retail_pos.doc_dwd_sales_detail WHERE store_id = CAST(SESSION_CONFIGS()['dt.args.store_id'] AS INT) GROUP BY store_id, order_date;

Refresh partition data for store 33:

SET dt.args.store_id = 33; REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_dws_store_date_partition PARTITION (store_id = 33);

SELECT * FROM best_practice_retail_pos.doc_dws_store_date_partition WHERE store_id = 33;

store_id | order_date | order_count | item_count | total_qty | net_revenue ---------+------------+-------------+------------+-----------+------------ 33 | 2021-08-26 | 1 | 3 | 6 | 5335.2

Each refresh only affects the specified store partition without impacting other stores' data — ideal for parallel backfill across multiple stores.

Configure Studio Refresh Tasks

DWD and DWS layer refreshes are scheduled via Studio Tasks rather than writing REFRESH INTERVAL in the DDL:

# Create DWD refresh task cz-cli task create refresh_dwd_sales_detail --type SQL -p skill_test # Returns example: {"data":{"id":10353698,...}} # Save SQL content cz-cli task save-content 10353698 \ --content "REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_dwd_sales_detail;" \ -p skill_test # Set schedule: run at 1:30 AM every day cz-cli task save-cron 10353698 --cron "0 30 1 * * ?" -p skill_test # Create DWS refresh task (same steps) cz-cli task create refresh_dws_store_daily --type SQL -p skill_test # task id: 10354652 cz-cli task save-content 10354652 \ --content "REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_dws_store_daily_sales;" \ -p skill_test cz-cli task save-cron 10354652 --cron "0 30 1 * * ?" -p skill_test


ADS Layer: Fast/Slow-Moving SKU Analysis and Store Rankings

SKU Sales Velocity Analysis

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_retail_pos.doc_ads_sku_velocity AS SELECT product_id, category_id, category_name, SUM(qty) AS total_sold_qty, COUNT(DISTINCT order_id) AS order_count, ROUND(SUM(net_amount), 2) AS total_net_revenue, COUNT(DISTINCT store_id) AS store_coverage, SUM(is_returned) AS return_count, ROUND(SUM(is_returned) * 100.0 / NULLIF(COUNT(*), 0), 2) AS return_rate_pct, CASE WHEN SUM(qty) >= 10 THEN 'fast_moving' WHEN SUM(qty) >= 5 THEN 'normal' ELSE 'slow_moving' END AS velocity_label, ROUND(SUM(net_amount) / NULLIF(COUNT(DISTINCT store_id), 0), 2) AS revenue_per_store FROM best_practice_retail_pos.doc_dwd_sales_detail GROUP BY product_id, category_id, category_name;

REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_ads_sku_velocity; -- View SKU count and revenue distribution by velocity tier SELECT velocity_label, COUNT(*) AS sku_count, ROUND(SUM(total_net_revenue), 2) AS label_revenue FROM best_practice_retail_pos.doc_ads_sku_velocity GROUP BY velocity_label ORDER BY label_revenue DESC;

velocity_label | sku_count | label_revenue ---------------+-----------+-------------- normal | 37 | 520537.33 slow_moving | 61 | 392062.88

Result interpretation: Of 100 SKUs, 37 are normal movers (qty 5–9) and 61 are slow-moving (qty < 5). Slow-moving SKUs contributed ¥392,062 in net revenue, about 43% of the total. This shows that high-priced slow movers still contribute significant revenue — procurement decisions should consider both total_net_revenue and total_sold_qty, not just sales volume alone.

View top 10 best-selling SKUs:

SELECT product_id, category_name, total_sold_qty, order_count, total_net_revenue, store_coverage, velocity_label FROM best_practice_retail_pos.doc_ads_sku_velocity ORDER BY total_sold_qty DESC LIMIT 10;

product_id | category_name | total_sold_qty | order_count | total_net_revenue | store_coverage | velocity_label -----------+---------------+----------------+-------------+-------------------+----------------+--------------- 60 | Cat_30 | 9 | 3 | 7922.28 | 3 | normal 12 | Cat_14 | 9 | 3 | 27547.1 | 3 | normal 37 | Cat_24 | 8 | 3 | 12089 | 3 | normal 87 | Cat_24 | 8 | 4 | 31740.65 | 4 | normal 23 | Cat_15 | 8 | 3 | 10833.92 | 3 | normal 65 | Cat_11 | 7 | 2 | 16566.66 | 2 | normal 56 | Cat_2 | 7 | 3 | 23981.55 | 3 | normal 30 | Cat_4 | 7 | 2 | 11288.1 | 2 | normal 55 | Cat_13 | 7 | 3 | 20519.61 | 3 | normal 14 | Cat_10 | 7 | 3 | 15307.11 | 3 | normal

View the slowest-moving SKUs (clearance candidates):

SELECT product_id, category_name, total_sold_qty, total_net_revenue, store_coverage, return_rate_pct, velocity_label FROM best_practice_retail_pos.doc_ads_sku_velocity WHERE velocity_label = 'slow_moving' ORDER BY total_sold_qty ASC, total_net_revenue ASC LIMIT 8;

product_id | category_name | total_sold_qty | total_net_revenue | store_coverage | return_rate_pct | velocity_label -----------+---------------+----------------+-------------------+----------------+-----------------+--------------- 54 | Cat_20 | 1 | 2360.82 | 1 | 0.00 | slow_moving 66 | Cat_22 | 1 | 2599.1 | 1 | 0.00 | slow_moving 2 | Cat_18 | 1 | 2835.88 | 1 | 0.00 | slow_moving 20 | Cat_27 | 2 | 954.8 | 2 | 0.00 | slow_moving 89 | Cat_29 | 2 | 2218.36 | 1 | 0.00 | slow_moving 49 | Cat_18 | 2 | 2468.82 | 1 | 0.00 | slow_moving 70 | Cat_3 | 2 | 2620.8 | 2 | 0.00 | slow_moving 98 | Cat_3 | 2 | 2649.92 | 2 | 0.00 | slow_moving

Result interpretation: Product 20 (Cat_27) sold only 2 units at ~¥477 each — a typical low-price, low-volume double slow mover, prioritize for promotional clearance. Products 54, 66, and 2 each completed just 1 sale at 1 store; consider delisting or concentrating inventory in high-traffic stores for trial sales.

Store Revenue Rankings

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_retail_pos.doc_ads_store_ranking AS SELECT store_id, store_city, SUM(order_count) AS total_orders, SUM(item_count) AS total_items, SUM(total_qty) AS total_qty, ROUND(SUM(gross_revenue), 2) AS gross_revenue, ROUND(SUM(net_revenue), 2) AS net_revenue, ROUND(AVG(avg_discount_pct), 2) AS avg_discount_pct, SUM(return_count) AS total_returns, ROUND(SUM(return_count) * 100.0 / NULLIF(SUM(item_count), 0), 2) AS return_rate_pct, RANK() OVER (ORDER BY SUM(net_revenue) DESC) AS revenue_rank FROM best_practice_retail_pos.doc_dws_store_daily_sales GROUP BY store_id, store_city;

REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_ads_store_ranking; SELECT store_id, store_city, total_orders, total_qty, gross_revenue, net_revenue, avg_discount_pct, total_returns, return_rate_pct, revenue_rank FROM best_practice_retail_pos.doc_ads_store_ranking ORDER BY revenue_rank LIMIT 10;

store_id | store_city | total_orders | total_qty | gross_revenue | net_revenue | avg_discount_pct | total_returns | return_rate_pct | revenue_rank ---------+------------+--------------+-----------+---------------+-------------+------------------+---------------+-----------------+------------- 85 | Hyderabad | 3 | 11 | 41227 | 31331.48 | 24.33 | 1 | 16.67 | 1 17 | Delhi | 2 | 11 | 37009 | 30698.19 | 21.5 | 0 | 0.00 | 2 54 | Bangalore | 2 | 12 | 30081 | 25719.35 | 17.5 | 0 | 0.00 | 3 55 | Hyderabad | 2 | 9 | 31919 | 24637.9 | 25.5 | 0 | 0.00 | 4 87 | Delhi | 2 | 12 | 36889 | 24199.49 | 34 | 1 | 25.00 | 5 81 | Mumbai | 2 | 9 | 29513 | 21114.41 | 28 | 0 | 0.00 | 6 24 | Pune | 1 | 6 | 22694 | 21105.42 | 7 | 0 | 0.00 | 7 68 | Pune | 1 | 5 | 21725 | 20638.75 | 5 | 0 | 0.00 | 8 11 | Delhi | 2 | 10 | 22606 | 19327.32 | 18 | 0 | 0.00 | 9 43 | Bangalore | 1 | 5 | 20948 | 18853.2 | 10 | 0 | 0.00 | 10

Result interpretation:

  • Hyderabad store 85 (rank 1) has ¥31,331 net revenue, but avg_discount_pct = 24.33 (relatively high discount rate) and a return rate of 16.67% — there are quality concerns behind the high revenue. Investigate the high-return SKU mix.
  • Delhi store 17 (rank 2) has a discount rate of only 21.5% and a return rate of 0% — a truly high-quality store that can be used as an operational benchmark to promote best practices.
  • Delhi store 87 (rank 5) has a discount rate as high as 34% and a return rate of 25%. High-promotion-driven revenue is unsustainable; gradually adjust the promotion strategy.

Configure ADS layer refresh tasks:

cz-cli task create refresh_ads_sku_velocity --type SQL -p skill_test # task id: 10353699 cz-cli task save-content 10353699 \ --content "REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_ads_sku_velocity;" \ -p skill_test cz-cli task save-cron 10353699 --cron "0 0 2 * * ?" -p skill_test cz-cli task create refresh_ads_store_ranking --type SQL -p skill_test # task id: 10354653 cz-cli task save-content 10354653 \ --content "REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_ads_store_ranking;" \ -p skill_test cz-cli task save-cron 10354653 --cron "0 0 2 * * ?" -p skill_test

Complete scheduling chain: refresh_dwd_sales_detail (01:30) → refresh_dws_store_daily (01:30, depends on the previous) → refresh_ads_* (02:00).


Time Travel: Month-End Reconciliation

During month-end financial reconciliation, you need to roll back to the order snapshot at the month-end cutoff time and compare it with the current data to detect any subsequently backloaded orders.

-- View the ODS order table version history DESC HISTORY best_practice_retail_pos.doc_ods_orders;

version | time | total_rows | operation | stats --------+-----------------------------+------------+-------------+---------------------------- 4 | 2026-06-06T14:41:30.348 | 100 | INSERT_INTO | rows_inserted: 40 3 | 2026-06-06T14:41:15.488 | 60 | INSERT_INTO | rows_inserted: 30 2 | 2026-06-06T14:41:02.329 | 30 | INSERT_INTO | rows_inserted: 30 1 | 2026-06-06T14:38:18.807 | 0 | CREATE | —

-- Roll back to the store revenue snapshot at the month-end cutoff SELECT snap.order_date, snap.store_id, COUNT(*) AS orders_in_snapshot, SUM(p.amount) AS snapshot_revenue FROM best_practice_retail_pos.doc_ods_orders TIMESTAMP AS OF '2026-06-06 14:41:15.488' snap JOIN best_practice_retail_pos.doc_ods_payments p ON snap.order_id = p.order_id GROUP BY snap.order_date, snap.store_id ORDER BY snapshot_revenue DESC LIMIT 5;

order_date | store_id | orders_in_snapshot | snapshot_revenue -----------+----------+--------------------+----------------- 2021-04-26 | 54 | 1 | 13688 2022-03-08 | 87 | 1 | 12036 2023-11-15 | 100 | 1 | 11156 2020-11-14 | 57 | 1 | 9702 2023-11-27 | 1 | 1 | 9465

-- Find orders backloaded after the month-end cutoff (present in current system, absent in historical snapshot) SELECT o_cur.order_id, o_cur.store_id, o_cur.order_date FROM best_practice_retail_pos.doc_ods_orders o_cur LEFT JOIN ( SELECT order_id FROM best_practice_retail_pos.doc_ods_orders TIMESTAMP AS OF '2026-06-06 14:41:15.488' ) snap ON o_cur.order_id = snap.order_id WHERE snap.order_id IS NULL ORDER BY o_cur.order_id LIMIT 10;

order_id | store_id | order_date ---------+----------+----------- 61 | 46 | 2022-04-07 62 | 89 | 2023-09-22 63 | 13 | 2021-05-16 64 | 71 | 2022-12-30 65 | 38 | 2023-04-14 66 | 92 | 2021-06-28 67 | 27 | 2022-01-12 68 | 50 | 2023-05-27 69 | 65 | 2021-07-11 70 | 3 | 2022-02-25

Result interpretation: order_id 61–100 (40 orders total) were backloaded after the month-end cutoff. Submit these order IDs to finance for review to determine whether to include them in the current month's accounting or carry them forward to the next month — this is the core value of Time Travel in month-end reconciliation scenarios.


Data Warehouse Object Summary

All objects in the best_practice_retail_pos schema:

SHOW TABLES IN best_practice_retail_pos;

schema_name | table_name | is_dynamic -------------------------+---------------------------------+----------- best_practice_retail_pos | doc_ods_stores | false best_practice_retail_pos | doc_ods_categories | false best_practice_retail_pos | doc_ods_products | false best_practice_retail_pos | doc_ods_promotions | false best_practice_retail_pos | doc_ods_customers | false best_practice_retail_pos | doc_ods_orders | false best_practice_retail_pos | doc_ods_order_items | false best_practice_retail_pos | doc_ods_payments | false best_practice_retail_pos | doc_ods_returns | false best_practice_retail_pos | doc_dwd_sales_detail | true best_practice_retail_pos | doc_dws_store_daily_sales | true best_practice_retail_pos | doc_dws_store_date_partition | true best_practice_retail_pos | doc_ads_sku_velocity | true best_practice_retail_pos | doc_ads_store_ranking | true


Notes

  • BloomFilter Index does not automatically apply to existing data: CREATE BLOOMFILTER INDEX only accelerates new data written after creation. Existing data must be written again after the index is created, or you accept that existing data will not benefit from BloomFilter acceleration.

  • Partitioned Dynamic Tables must explicitly declare PARTITIONED BY: You cannot rely on the system to infer partition columns automatically. SESSION_CONFIGS()['dt.args.xxx'] returns STRING and must be CAST to the target type before it can match the partition column; otherwise refresh will fail with a type incompatibility error.

  • Do not write REFRESH INTERVAL for REFRESH DYNAMIC TABLE: All periodic refreshes for Dynamic Tables are managed through Studio Tasks. Studio Tasks support attaching data quality rules and alerts to the same task. Writing REFRESH INTERVAL in the DDL bypasses this management mechanism.

  • Time Travel TIMESTAMP AS OF only accepts constants: Runtime expressions like NOW() - INTERVAL N DAY are not supported. Pre-calculate the target timestamp before calling. DESC HISTORY returns UTC time; add 8 hours when converting to local time (UTC+8).

  • External Schema is read-only: External tables under pos_hive_archive do not support INSERT / UPDATE / DELETE. To import historical data into local Lakehouse tables, use INSERT INTO ... SELECT ... FROM pos_hive_archive.xxx for explicit migration.

  • ODS layer doc_ods_orders uses PARTITIONED BY (store_id, order_date): If the upstream CDC uses INSERT OVERWRITE (rather than INSERT INTO), Dynamic Tables will fall back to full refresh. Use append-write mode (INSERT only) to preserve Dynamic Table incremental refresh capability.