Weather × Retail Cross-Analysis Best Practices

Correlating historical meteorological data with store sales data to analyze the boosting or suppressing effects of weather on different product category sales, supporting dynamic replenishment decisions. This article uses a retail dataset of 100 stores across four Indian cities with 300,000 orders, combined with manually constructed weather observation data, to demonstrate an end-to-end OSS PIPE → ODS → DWD → DWS → ADS weather-enhanced data warehouse pipeline, covering three key capabilities: External Function integration with a weather API, Dynamic Table automatic incremental calculation, and window function sliding averages.


Overview

The challenge in retail replenishment decisions is: historical sales data lives inside the system, but weather data — which influences sales — is external. Joining the two requires matching on a city × date dimension and periodic refresh to reflect the latest weather forecasts.

Singdata Lakehouse solves the core problems with the following combination:

ProblemSolution
Store POS files auto-imported daily, no manual consumer code neededOSS PIPE (LIST_PURGE mode), scans new files and auto-ingests
Weather API data pulled and written to LakehouseExternal Function wraps HTTP calls; callable directly at the SQL layer
Multi-dimensional sales × weather correlation, automatically incrementedDynamic Table, declarative SQL, system automatically schedules dependency chain
7/30 day sliding averages to identify weather-driven effectsWindow functions (ROWS BETWEEN N PRECEDING AND CURRENT ROW)
Multi-step pipeline scheduling (weather pull → correlation calculation → replenishment signals)Lakehouse Studio task DAG, managed uniformly under best_practices/weather_retail/

SQL Commands Used

Command / FunctionPurposeNotes
CREATE TABLEBuild ODS base tables (stores, orders, products, weather)Used as upstream regular tables for Dynamic Tables
CREATE PIPECreate OSS PIPE to auto-ingest store POS CSV filesLIST_PURGE option prevents re-ingestion
CREATE DYNAMIC TABLEBuild incremental calculation tables for ODS/DWD/DWS/ADS layersNo REFRESH INTERVAL; scheduled by Lakehouse Studio tasks
AVG() OVER (ROWS BETWEEN ... AND ...)Compute 7/30 day sliding averagesSliding window for identifying weather effects
CASE WHEN ... THEN ...Apply temp_band labels by temperature rangeextreme_heat / hot / warm / mild / cold
REFRESH DYNAMIC TABLEManually trigger a single full refreshUsed during initial build or within Lakehouse Studio task scheduling

Prerequisites

All examples in this article run under the best_practice_weather_retail schema.

CREATE SCHEMA IF NOT EXISTS best_practice_weather_retail;

Actual execution result:

{"data":{},"time_ms":94}


Data Source Layer: Base Tables and Weather Data

Retail Base Tables

Data source: Retail Data Warehouse - 12 Table 1M+ Rows Dataset, containing 100 stores, 300,000 orders, and 600,000 order lines.

-- Store master data (100 stores, distributed across Pune/Delhi/Mumbai/Bangalore) CREATE TABLE IF NOT EXISTS best_practice_weather_retail.doc_stores ( store_id INT, city STRING ); -- Product categories (30 categories) CREATE TABLE IF NOT EXISTS best_practice_weather_retail.doc_categories ( category_id INT, category_name STRING ); -- Product information CREATE TABLE IF NOT EXISTS best_practice_weather_retail.doc_products ( product_id INT, category_id INT, supplier_id INT, price DOUBLE ); -- Order master table (300K rows) CREATE TABLE IF NOT EXISTS best_practice_weather_retail.doc_orders ( order_id INT, customer_id INT, store_id INT, order_date DATE, promotion_id INT ); -- Order line details (600K rows) CREATE TABLE IF NOT EXISTS best_practice_weather_retail.doc_order_items ( order_item_id INT, order_id INT, product_id INT, qty INT, price DOUBLE );

Verify row counts:

SELECT (SELECT COUNT(*) FROM best_practice_weather_retail.doc_stores) AS stores, (SELECT COUNT(*) FROM best_practice_weather_retail.doc_orders) AS orders, (SELECT COUNT(*) FROM best_practice_weather_retail.doc_order_items) AS order_items, (SELECT COUNT(*) FROM best_practice_weather_retail.doc_products) AS products, (SELECT COUNT(*) FROM best_practice_weather_retail.doc_categories) AS categories;

stores | orders | order_items | products | categories -------+--------+-------------+----------+----------- 100 | 50 | 60 | 50 | 30

(This article's demo uses the first 50 orders as a subset; in production, import the full 300K+ rows.)

OSS PIPE: Auto-Ingest Store POS Files

In a production environment, store POS systems export daily sales records as CSV and upload to OSS. PIPE enables unattended auto-ingestion:

-- First create an OSS Storage Connection (replace with actual AK/SK) CREATE STORAGE CONNECTION IF NOT EXISTS best_practice_weather_retail.conn_pos_oss TYPE = OSS ACCESS_ID = '<your-access-key-id>' ACCESS_KEY = '<your-access-key-secret>' ENDPOINT = 'oss-cn-hangzhou.aliyuncs.com'; -- Create a Volume mapping to the OSS bucket path CREATE EXTERNAL VOLUME IF NOT EXISTS best_practice_weather_retail.vol_pos_daily TYPE = OSS BUCKET = '<your-oss-bucket>' PATH = '/retail/pos-daily/' CONNECTION = best_practice_weather_retail.conn_pos_oss; -- Create PIPE to scan new CSV files and auto-write to doc_orders CREATE PIPE IF NOT EXISTS best_practice_weather_retail.pipe_pos_orders VIRTUAL_CLUSTER = 'DEFAULT' AUTO_PURGE = TRUE AS COPY INTO best_practice_weather_retail.doc_orders FROM ( SELECT $1::INT, $2::INT, $3::INT, TO_DATE($4, 'yyyy-MM-dd'), $5::INT FROM best_practice_weather_retail.vol_pos_daily ) USING csv OPTIONS('header'='true', 'sep'=',');

Weather Data: External Function Pull + Manual Construction

Method 1: Calling OpenWeatherMap History API via External Function (recommended)

The External Function wraps the weather API call, allowing direct use in SQL:

-- Assumes cloud function has already been deployed (see External Function development guide) CREATE EXTERNAL FUNCTION IF NOT EXISTS best_practice_weather_retail.fetch_weather_history( city STRING, date_str STRING ) RETURNS STRING CONNECTION = '<your-api-connection>' AS '<your-lambda-or-fc-arn>';

Example call:

-- Pull Delhi weather for 2023-06-14 SELECT best_practice_weather_retail.fetch_weather_history('Delhi', '2023-06-14') AS weather_json;

After receiving the returned JSON, parse and write to doc_weather_daily:

INSERT INTO best_practice_weather_retail.doc_weather_daily SELECT TO_DATE(date_str, 'yyyy-MM-dd') AS weather_date, city, GET_JSON_OBJECT(weather_json, '$.avg_temp_c')::DOUBLE AS avg_temp_c, GET_JSON_OBJECT(weather_json, '$.min_temp_c')::DOUBLE AS min_temp_c, GET_JSON_OBJECT(weather_json, '$.max_temp_c')::DOUBLE AS max_temp_c, GET_JSON_OBJECT(weather_json, '$.precipitation_mm')::DOUBLE AS precipitation_mm, GET_JSON_OBJECT(weather_json, '$.condition')::STRING AS weather_condition, GET_JSON_OBJECT(weather_json, '$.humidity_pct')::INT AS humidity_pct FROM ( SELECT city, date_str, best_practice_weather_retail.fetch_weather_history(city, date_str) AS weather_json FROM city_date_pairs -- pre-built city × date dimension table );

Method 2: Manual INSERT construction (when no API environment is available)

If the External Function has not been configured, insert simulated weather observation data directly to verify the downstream Dynamic Table and analysis logic:

CREATE TABLE IF NOT EXISTS best_practice_weather_retail.doc_weather_daily ( weather_date DATE, city STRING, avg_temp_c DOUBLE, min_temp_c DOUBLE, max_temp_c DOUBLE, precipitation_mm DOUBLE, weather_condition STRING, -- sunny / rainy / cloudy / heatwave / cold humidity_pct INT );

Import data from local CSV (recommended):

-- Step 1: Upload local CSV file to User Volume via SQL PUT PUT '/path/to/your/doc_weather_daily.csv' TO USER VOLUME FILE 'doc_weather_daily.csv';

-- Step 2: COPY INTO table from User Volume COPY INTO best_practice_weather_retail.doc_weather_daily FROM USER VOLUME USING csv OPTIONS('header'='true', 'sep'=',', 'nullValue'='') FILES ('doc_weather_daily.csv');

You can also insert a small batch of test data inline (no CSV file required):

INSERT INTO best_practice_weather_retail.doc_weather_daily VALUES (CAST('2021-08-26' AS DATE), 'Pune', 28.5, 22.0, 34.0, 12.3, 'rainy', 82), (CAST('2022-03-19' AS DATE), 'Delhi', 24.0, 16.5, 31.5, 0.0, 'sunny', 45), (CAST('2021-01-21' AS DATE), 'Delhi', 14.0, 8.0, 20.0, 0.0, 'sunny', 40), (CAST('2021-01-16' AS DATE), 'Mumbai', 26.5, 20.0, 33.0, 0.0, 'sunny', 62), (CAST('2022-09-14' AS DATE), 'Delhi', 29.5, 24.0, 35.0, 8.5, 'cloudy', 75), (CAST('2023-02-03' AS DATE), 'Mumbai', 28.0, 22.5, 33.5, 0.0, 'sunny', 58), (CAST('2022-10-29' AS DATE), 'Delhi', 22.5, 15.0, 30.0, 2.1, 'cloudy', 52), (CAST('2022-10-10' AS DATE), 'Bangalore', 21.0, 16.0, 26.0, 18.7, 'rainy', 88), (CAST('2021-07-09' AS DATE), 'Bangalore', 22.0, 18.0, 26.0, 45.2, 'rainy', 92), (CAST('2022-06-03' AS DATE), 'Delhi', 38.5, 30.0, 45.0, 0.0, 'heatwave', 28), (CAST('2021-04-26' AS DATE), 'Mumbai', 32.5, 26.0, 39.0, 0.0, 'sunny', 65), (CAST('2023-01-05' AS DATE), 'Mumbai', 27.0, 20.0, 34.0, 0.0, 'sunny', 60), (CAST('2023-06-14' AS DATE), 'Delhi', 40.0, 33.5, 46.5, 0.0, 'heatwave', 22), (CAST('2022-03-08' AS DATE), 'Mumbai', 29.5, 23.0, 36.0, 0.0, 'sunny', 62), (CAST('2022-08-06' AS DATE), 'Mumbai', 29.0, 25.0, 33.0, 22.1, 'rainy', 86), (CAST('2023-05-31' AS DATE), 'Bangalore', 23.5, 18.0, 29.0, 0.0, 'sunny', 55), (CAST('2022-02-03' AS DATE), 'Pune', 22.0, 14.5, 29.5, 0.0, 'sunny', 48), (CAST('2020-04-27' AS DATE), 'Delhi', 36.0, 28.0, 44.0, 0.0, 'heatwave', 25), (CAST('2023-05-02' AS DATE), 'Bangalore', 24.0, 18.5, 29.5, 0.0, 'sunny', 52), (CAST('2021-12-21' AS DATE), 'Delhi', 15.5, 9.0, 22.0, 0.0, 'cold', 35);

Verify weather data row count:

SELECT COUNT(*) AS weather_rows FROM best_practice_weather_retail.doc_weather_daily;

weather_rows ------------ 20


ODS Layer: Raw Sales Wide Table

The ODS layer joins three fact tables (orders, stores, order lines) into a single wide table for direct use in DWD layer weather data JOINs.

Create Table

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_weather_retail.ods_sales_raw AS SELECT o.order_id, o.customer_id, o.order_date, o.promotion_id, s.store_id, s.city AS store_city, oi.order_item_id, oi.product_id, oi.qty, oi.price AS unit_price, oi.qty * oi.price AS line_revenue FROM best_practice_weather_retail.doc_orders o JOIN best_practice_weather_retail.doc_stores s ON o.store_id = s.store_id JOIN best_practice_weather_retail.doc_order_items oi ON o.order_id = oi.order_id;

Lakehouse Studio Task Scheduling

Create a refresh task under the Studio best_practices/weather_retail/ path:

# Create task cz-cli task create "refresh_ods_sales_raw" -p skill_test --type SQL --folder best_practices/weather_retail # Write task content cz-cli task save-content refresh_ods_sales_raw -p skill_test \ --content "REFRESH DYNAMIC TABLE best_practice_weather_retail.ods_sales_raw;" # Set daily 01:00 schedule cz-cli task save-cron refresh_ods_sales_raw -p skill_test --cron "0 1 * * *"

Manually Trigger the First Refresh

REFRESH DYNAMIC TABLE best_practice_weather_retail.ods_sales_raw; SELECT COUNT(*) AS ods_rows FROM best_practice_weather_retail.ods_sales_raw;

ods_rows -------- 60

60 rows: 50 orders × average 1.2 order lines, confirming the JOIN logic is correct.


DWD Layer: Sales × Weather Fact Wide Table

The DWD layer is the core of this article. Based on the ODS wide table, it LEFT JOINs weather data by store city × order date dimension and adds product category and temperature band labels. The LEFT JOIN ensures order lines without weather data are not filtered out.

Create Table

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_weather_retail.dwd_sales_weather_fact AS SELECT f.order_id, f.order_date, f.store_city, f.product_id, p.category_id, c.category_name, f.qty, f.unit_price, f.line_revenue, -- Weather dimension (from doc_weather_daily, joined by city × date) w.avg_temp_c, w.min_temp_c, w.max_temp_c, w.precipitation_mm, w.weather_condition, w.humidity_pct, -- Temperature band label for downstream aggregation analysis CASE WHEN w.avg_temp_c >= 35 THEN 'extreme_heat' WHEN w.avg_temp_c >= 28 THEN 'hot' WHEN w.avg_temp_c >= 22 THEN 'warm' WHEN w.avg_temp_c >= 15 THEN 'mild' ELSE 'cold' END AS temp_band FROM best_practice_weather_retail.ods_sales_raw f JOIN best_practice_weather_retail.doc_products p ON f.product_id = p.product_id JOIN best_practice_weather_retail.doc_categories c ON p.category_id = c.category_id LEFT JOIN best_practice_weather_retail.doc_weather_daily w ON f.order_date = w.weather_date AND f.store_city = w.city;

Temperature band definitions:

BandDefinitionTypical Scenario
extreme_heat≥ 35°CSummer peak heat in cities like Beijing/Delhi; cold beverage demand spikes
hot28–35°CTypical summer day; sunscreen and air conditioning-related categories sell well
warm22–28°CComfortable temperature range; normal sales
mild15–22°CAutumn; seasonal demand for some categories starts to rise
cold< 15°CWinter; warm clothing and hot beverage categories benefit

Lakehouse Studio Task Scheduling

cz-cli task save-content refresh_dwd_sales_weather_fact -p skill_test \ --content "REFRESH DYNAMIC TABLE best_practice_weather_retail.dwd_sales_weather_fact;" cz-cli task save-cron refresh_dwd_sales_weather_fact -p skill_test --cron "0 2 * * *"

Refresh and Verify

REFRESH DYNAMIC TABLE best_practice_weather_retail.dwd_sales_weather_fact; SELECT COUNT(*) AS dwd_rows FROM best_practice_weather_retail.dwd_sales_weather_fact;

dwd_rows -------- 60

View sales distribution by temperature band:

SELECT temp_band, COUNT(*) AS order_cnt, ROUND(SUM(line_revenue), 0) AS total_revenue FROM best_practice_weather_retail.dwd_sales_weather_fact WHERE weather_condition IS NOT NULL GROUP BY temp_band ORDER BY total_revenue DESC;

temp_band | order_cnt | total_revenue -------------+-----------+-------------- hot | 11 | 63395 warm | 10 | 54881 cold | 3 | 23101 mild | 1 | 6312 extreme_heat | 3 | 5500

Results show: the hot and warm bands contributed the vast majority of revenue (~¥118K), while the extreme_heat (heatwave) band had low revenue (¥5,500), indicating that extreme heat significantly suppresses overall consumption.


DWS Layer: Category Climate Sensitivity Metrics

The DWS layer aggregates by category × weather condition dimension, outputting sales volume, revenue, and average temperature for each category under different weather conditions, providing data support for replenishment decisions.

Create Table

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_weather_retail.dws_category_climate_sensitivity AS SELECT category_name, weather_condition, temp_band, COUNT(DISTINCT order_id) AS order_count, SUM(qty) AS total_qty, ROUND(SUM(line_revenue), 0) AS total_revenue, ROUND(AVG(avg_temp_c), 1) AS avg_temp, ROUND(AVG(precipitation_mm), 1) AS avg_precip FROM best_practice_weather_retail.dwd_sales_weather_fact WHERE weather_condition IS NOT NULL GROUP BY category_name, weather_condition, temp_band;

Lakehouse Studio Task Scheduling

cz-cli task save-content refresh_dws_category_climate -p skill_test \ --content "REFRESH DYNAMIC TABLE best_practice_weather_retail.dws_category_climate_sensitivity;" cz-cli task save-cron refresh_dws_category_climate -p skill_test --cron "0 3 * * *"

Query Top 10 Climate Sensitivity

SELECT category_name, weather_condition, total_revenue, order_count FROM best_practice_weather_retail.dws_category_climate_sensitivity ORDER BY total_revenue DESC LIMIT 10;

category_name | weather_condition | total_revenue | order_count --------------+-------------------+---------------+------------ Cat_10 | sunny | 18986 | 1 Cat_18 | sunny | 17648 | 1 Cat_14 | sunny | 10266 | 1 Cat_8 | sunny | 10256 | 1 Cat_9 | rainy | 9134 | 1 Cat_1 | sunny | 8414 | 1 Cat_10 | cloudy | 8259 | 1 Cat_19 | sunny | 7931 | 2 Cat_6 | rainy | 7902 | 1 Cat_8 | rainy | 6991 | 1

Cat_10 has high sales in both sunny and cloudy conditions, showing strong weather adaptability. Cat_9 and Cat_6 perform well in rainy weather — they may be rain gear or indoor entertainment products.

Effect of Weather Conditions on Average Order Value

SELECT weather_condition, ROUND(AVG(unit_price * qty), 0) AS avg_order_value, COUNT(DISTINCT order_id) AS orders, SUM(qty) AS total_qty FROM best_practice_weather_retail.dwd_sales_weather_fact WHERE weather_condition IS NOT NULL GROUP BY weather_condition ORDER BY avg_order_value DESC;

weather_condition | avg_order_value | orders | total_qty ------------------+-----------------+--------+----------- sunny | 6880 | 9 | 33 cold | 6312 | 1 | 4 rainy | 5031 | 3 | 9 cloudy | 3979 | 2 | 8 heatwave | 1833 | 3 | 4

Sunny weather has the highest average order value (¥6,880), while heatwave has the lowest (¥1,833) — a difference of approximately 3.75x. This metric can directly drive proactive stocking when sunny weather forecasts arrive.


Window Functions: 7-Day Sliding Average to Identify Weather Effects

Sliding averages smooth out the noise from occasional orders and highlight trends in sustained weather impact.

SELECT category_name, order_date, SUM(line_revenue) AS daily_revenue, ROUND(AVG(SUM(line_revenue)) OVER ( PARTITION BY category_name ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ), 0) AS revenue_7d_avg, ROUND(AVG(avg_temp_c), 1) AS avg_temp FROM best_practice_weather_retail.dwd_sales_weather_fact WHERE weather_condition IS NOT NULL GROUP BY category_name, order_date, avg_temp_c ORDER BY category_name, order_date LIMIT 15;

category_name | order_date | daily_revenue | revenue_7d_avg | avg_temp --------------+------------+---------------+----------------+--------- Cat_1 | 2021-01-16 | 8414 | 8414 | 26.5 Cat_10 | 2021-01-16 | 5506 | 5506 | 26.5 Cat_10 | 2021-01-21 | 18986 | 12246 | 14.0 Cat_10 | 2022-09-14 | 8259 | 10917 | 29.5 Cat_14 | 2022-03-19 | 10266 | 10266 | 24.0 Cat_15 | 2021-12-21 | 6312 | 6312 | 15.5 Cat_17 | 2022-06-03 | 2634 | 2634 | 38.5 Cat_18 | 2022-03-08 | 17648 | 17648 | 29.5 Cat_19 | 2022-02-03 | 5913 | 5913 | 22.0 Cat_19 | 2022-10-29 | 3155 | 4534 | 22.5 Cat_19 | 2023-05-02 | 2018 | 3695 | 24.0 Cat_26 | 2023-06-14 | 2234 | 2234 | 40.0 Cat_29 | 2020-04-27 | 632 | 632 | 36.0 Cat_29 | 2022-09-14 | 1844 | 1238 | 29.5 Cat_29 | 2023-02-03 | 3688 | 2055 | 28.0

Notable observations: Cat_10 on 2021-01-21 (temperature 14°C, relatively cold) had revenue (¥18,986) significantly higher than its 7-day average (¥12,246), suggesting this category may be warm clothing items. Cat_26 on 2023-06-14 with extreme heat of 40°C had revenue of only ¥2,234, below the prior average, confirming the suppressive effect of heatwaves on certain categories.


ADS Layer: Replenishment Signal Output

The ADS layer aggregates DWD layer detail data to the city × category × weather condition granularity for replenishment recommendations, outputting four signal levels: INCREASE_STOCK, REDUCE_STOCK, MONITOR, and NORMAL.

Create Table

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_weather_retail.ads_replenishment_signal AS SELECT f.store_city, f.category_name, f.weather_condition, f.temp_band, SUM(f.qty) AS total_qty, ROUND(SUM(f.line_revenue), 0) AS total_revenue, COUNT(DISTINCT f.order_id) AS order_count, ROUND(AVG(f.avg_temp_c), 1) AS avg_temp, ROUND(SUM(f.qty) / NULLIF(COUNT(DISTINCT f.order_id), 0), 1) AS avg_qty_per_order, CASE WHEN SUM(f.qty) >= 4 AND f.weather_condition IN ('sunny', 'hot') THEN 'INCREASE_STOCK' WHEN f.weather_condition = 'heatwave' AND SUM(f.line_revenue) < 3000 THEN 'REDUCE_STOCK' WHEN f.weather_condition IN ('rainy', 'cloudy') THEN 'MONITOR' ELSE 'NORMAL' END AS replenishment_action FROM best_practice_weather_retail.dwd_sales_weather_fact f WHERE f.weather_condition IS NOT NULL GROUP BY f.store_city, f.category_name, f.weather_condition, f.temp_band;

Replenishment rule explanation:

RuleConditionBusiness Meaning
INCREASE_STOCKSunny/hot weather with cumulative sales ≥ 4 unitsGood weather continuously drives demand; proactively increase stock
REDUCE_STOCKHeatwave weather with revenue < ¥3,000Extreme heat suppresses consumption; reduce stock of perishables or time-sensitive goods
MONITORRainy or cloudy weatherUnstable weather; monitor real-time sales rate, do not proactively adjust
NORMALAll other conditionsNormal replenishment rhythm

Lakehouse Studio Task Scheduling

cz-cli task save-content refresh_ads_replenishment -p skill_test \ --content "REFRESH DYNAMIC TABLE best_practice_weather_retail.ads_replenishment_signal;" cz-cli task save-cron refresh_ads_replenishment -p skill_test --cron "30 3 * * *"

Data quality rules can also be attached to the Studio task: alert when the number of INCREASE_STOCK signal rows is 0, prompting investigation into whether the upstream weather data connection has been lost.

Query Replenishment Signals

SELECT store_city, category_name, weather_condition, total_qty, total_revenue, replenishment_action FROM best_practice_weather_retail.ads_replenishment_signal ORDER BY total_revenue DESC LIMIT 10;

store_city | category_name | weather_condition | total_qty | total_revenue | replenishment_action -----------+---------------+-------------------+-----------+---------------+--------------------- Delhi | Cat_10 | sunny | 5 | 18986 | INCREASE_STOCK Mumbai | Cat_18 | sunny | 4 | 17648 | INCREASE_STOCK Delhi | Cat_14 | sunny | 3 | 10266 | NORMAL Mumbai | Cat_8 | sunny | 4 | 10256 | INCREASE_STOCK Bangalore | Cat_9 | rainy | 2 | 9134 | MONITOR Mumbai | Cat_1 | sunny | 2 | 8414 | NORMAL Delhi | Cat_10 | cloudy | 3 | 8259 | MONITOR Mumbai | Cat_6 | rainy | 2 | 7902 | MONITOR Pune | Cat_8 | rainy | 3 | 6991 | MONITOR Delhi | Cat_15 | cold | 4 | 6312 | NORMAL

Replenishment signal summary:

SELECT replenishment_action, COUNT(*) AS signal_count, SUM(total_qty) AS total_qty, SUM(total_revenue) AS total_revenue FROM best_practice_weather_retail.ads_replenishment_signal GROUP BY replenishment_action ORDER BY total_revenue DESC;

replenishment_action | signal_count | total_qty | total_revenue ---------------------+--------------+-----------+-------------- NORMAL | 11 | 24 | 55747 INCREASE_STOCK | 3 | 13 | 46890 MONITOR | 9 | 17 | 45052 REDUCE_STOCK | 3 | 4 | 5500

The 3 INCREASE_STOCK signals (Delhi Cat_10, Mumbai Cat_18, Mumbai Cat_8) drove combined revenue of ¥46,890 and are the top priority targets for proactive stocking. The 3 REDUCE_STOCK signals all came from extreme heat weather with total revenue of only ¥5,500; reducing inventory for these categories can lower dead stock losses.


Complete Lakehouse Studio Task Scheduling Chain

The following is the complete scheduling chain for the 4 Lakehouse Studio tasks under best_practices/weather_retail/:

Task NameSQLCronDependencies
refresh_ods_sales_rawREFRESH DYNAMIC TABLE ... .ods_sales_raw0 1 * * * (daily at 01:00)
refresh_dwd_sales_weather_factREFRESH DYNAMIC TABLE ... .dwd_sales_weather_fact0 2 * * * (daily at 02:00)ods_sales_raw refresh completed
refresh_dws_category_climateREFRESH DYNAMIC TABLE ... .dws_category_climate_sensitivity0 3 * * * (daily at 03:00)dwd_sales_weather_fact refresh completed
refresh_ads_replenishmentREFRESH DYNAMIC TABLE ... .ads_replenishment_signal30 3 * * * (daily at 03:30)dws refresh completed

# View task list cz-cli task list -p skill_test

You can also append to refresh_ads_replenishment:

  • Data quality rules: alert when INCREASE_STOCK row count is 0
  • Alert notifications: integrate with Feishu/DingTalk webhook to push daily replenishment signal summaries

Notes

  • Dynamic Table DDL does not include REFRESH INTERVAL; all scheduling is managed via Lakehouse Studio tasks. This allows data quality checks, alerts, and dependency configuration to be attached to the same task without needing to modify DDL.
  • The DWD layer uses LEFT JOIN with the weather table to ensure order lines without weather coverage are retained in the fact table, without affecting the completeness of sales aggregations. The WHERE weather_condition IS NOT NULL filter is only used in analyses that require weather data.
  • The temp_band classification is based on threshold rules, adjustable based on a city's climate characteristics. For example, 28°C is a normal temperature in Indian cities; if analyzing European markets, the thresholds should be lowered.
  • OSS PIPE with AUTO_PURGE = TRUE deletes source files. Confirm before production use whether original files need to be retained for audit tracing.
  • External Function calls to weather APIs incur API fees and cloud function compute fees. It is recommended to pull data in batches by city × date rather than calling per order line individually.
  • NULLIF(COUNT(DISTINCT order_id), 0) prevents division by zero errors. Preserve this pattern when referencing it in the ADS layer.
  • BloomFilter Index is suitable for high-cardinality filter columns like store_city and category_name. In production, it is recommended to create one on the DWD layer main table.