Cumulative Calculation and Running Totals

Overview

Cumulative calculation (Running Total / Cumulative Aggregation) is a high-frequency requirement in data analysis: calculating cumulative sales up to a point in time, computing rolling N-day averages, tracking how much a metric has changed from its starting point. Singdata Lakehouse implements these calculations through window function ORDER BY clauses and frame specifications (ROWS BETWEEN) without requiring self-joins or subqueries.

SQL Syntax Covered

SyntaxPurpose
SUM(...) OVER (ORDER BY ...)Cumulative sum (default frame: from first row to current row)
ROWS BETWEEN N PRECEDING AND CURRENT ROWRolling N-row window
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWFrom partition start to current row
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGEntire partition
FIRST_VALUE / LAST_VALUEGet first/last row value in window
LAG(col, n)Get value of n rows before, used for period-over-period calculation

Prerequisite Data

The following examples use two test tables:

-- Daily sales details table CREATE TABLE IF NOT EXISTS doc_running_sales ( sale_date DATE, region VARCHAR(20), amount DECIMAL(10,2) ); INSERT INTO doc_running_sales VALUES (CAST('2024-01-01' AS DATE), 'East', 1200.00), (CAST('2024-01-02' AS DATE), 'East', 850.00), (CAST('2024-01-03' AS DATE), 'East', 1500.00), (CAST('2024-01-04' AS DATE), 'East', 600.00), (CAST('2024-01-05' AS DATE), 'East', 2100.00), (CAST('2024-01-01' AS DATE), 'West', 900.00), (CAST('2024-01-02' AS DATE), 'West', 1100.00), (CAST('2024-01-03' AS DATE), 'West', 750.00), (CAST('2024-01-04' AS DATE), 'West', 1300.00), (CAST('2024-01-05' AS DATE), 'West', 980.00); -- Monthly revenue table CREATE TABLE IF NOT EXISTS doc_running_monthly ( month_date DATE, category VARCHAR(20), revenue DECIMAL(12,2) ); INSERT INTO doc_running_monthly VALUES (CAST('2024-01-01' AS DATE), 'Apparel', 50000.00), (CAST('2024-02-01' AS DATE), 'Apparel', 45000.00), (CAST('2024-03-01' AS DATE), 'Apparel', 62000.00), (CAST('2024-04-01' AS DATE), 'Apparel', 58000.00), (CAST('2024-01-01' AS DATE), 'Electronics', 120000.00), (CAST('2024-02-01' AS DATE), 'Electronics', 135000.00), (CAST('2024-03-01' AS DATE), 'Electronics', 98000.00), (CAST('2024-04-01' AS DATE), 'Electronics', 145000.00);


Scenario 1: Cumulative Sum by Partition

Calculate the cumulative daily sales for each region.

SELECT sale_date, region, amount, SUM(amount) OVER ( PARTITION BY region ORDER BY sale_date ) AS running_total FROM doc_running_sales ORDER BY region, sale_date;

Result:

sale_dateregionamountrunning_total
2024-01-01East1200.001200.00
2024-01-02East850.002050.00
2024-01-03East1500.003550.00
2024-01-04East600.004150.00
2024-01-05East2100.006250.00
2024-01-01West900.00900.00
2024-01-02West1100.002000.00
2024-01-03West750.002750.00
2024-01-04West1300.004050.00
2024-01-05West980.005030.00

Scenario 2: Cumulative Proportion of Total (YTD Progress)

Calculate each category's monthly YTD revenue and its proportion of total annual revenue.

SELECT month_date, category, revenue, SUM(revenue) OVER ( PARTITION BY category ORDER BY month_date ) AS ytd_revenue, SUM(revenue) OVER (PARTITION BY category) AS annual_total, ROUND( SUM(revenue) OVER (PARTITION BY category ORDER BY month_date) / SUM(revenue) OVER (PARTITION BY category) * 100, 2 ) AS ytd_pct FROM doc_running_monthly ORDER BY category, month_date;

Result:

month_datecategoryrevenueytd_revenueannual_totalytd_pct
2024-01-01Apparel50000.0050000.00215000.0023.26
2024-02-01Apparel45000.0095000.00215000.0044.19
2024-03-01Apparel62000.00157000.00215000.0073.02
2024-04-01Apparel58000.00215000.00215000.00100.00
2024-01-01Electronics120000.00120000.00498000.0024.10
2024-02-01Electronics135000.00255000.00498000.0051.20
2024-03-01Electronics98000.00353000.00498000.0070.88
2024-04-01Electronics145000.00498000.00498000.00100.00

Scenario 3: Rolling N-Row Window (Moving Average)

Calculate a 3-day moving average of sales for each region.

SELECT sale_date, region, amount, AVG(amount) OVER ( PARTITION BY region ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS moving_avg_3day FROM doc_running_sales ORDER BY region, sale_date;

Result:

sale_dateregionamountmoving_avg_3day
2024-01-01East1200.001200.000000
2024-01-02East850.001025.000000
2024-01-03East1500.001183.333333
2024-01-04East600.00983.333333
2024-01-05East2100.001400.000000

Scenario 4: Aggregate First, Then Roll (Two-Level Window)

First aggregate total daily sales across all regions, then calculate a 7-day rolling sum.

SELECT sale_date, SUM(amount) AS daily_total, SUM(SUM(amount)) OVER ( ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS rolling_7day_sum, AVG(SUM(amount)) OVER ( ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS rolling_7day_avg FROM doc_running_sales GROUP BY sale_date ORDER BY sale_date;

Result:

sale_datedaily_totalrolling_7day_sumrolling_7day_avg
2024-01-012100.002100.002100.000000
2024-01-021950.004050.002025.000000
2024-01-032250.006300.002100.000000
2024-01-041900.008200.002050.000000
2024-01-053080.0011280.002256.000000

Scenario 5: Cumulative Maximum / Minimum

Track the historical highest and lowest single-day sales for each region.

SELECT sale_date, region, amount, MAX(amount) OVER ( PARTITION BY region ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_max, MIN(amount) OVER ( PARTITION BY region ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_min FROM doc_running_sales ORDER BY region, sale_date;

Result (East region only):

sale_dateregionamountrunning_maxrunning_min
2024-01-01East1200.001200.001200.00
2024-01-02East850.001200.00850.00
2024-01-03East1500.001500.00850.00
2024-01-04East600.001500.00600.00
2024-01-05East2100.002100.00600.00

Scenario 6: Period-over-Period Change (MoM / DoD)

Calculate the month-over-month change amount and change rate for each category.

SELECT month_date, category, revenue, LAG(revenue, 1) OVER ( PARTITION BY category ORDER BY month_date ) AS prev_month, revenue - LAG(revenue, 1) OVER ( PARTITION BY category ORDER BY month_date ) AS mom_change, ROUND( (revenue - LAG(revenue, 1) OVER (PARTITION BY category ORDER BY month_date)) / LAG(revenue, 1) OVER (PARTITION BY category ORDER BY month_date) * 100, 2 ) AS mom_pct FROM doc_running_monthly ORDER BY category, month_date;

Result (Apparel category):

month_datecategoryrevenueprev_monthmom_changemom_pct
2024-01-01Apparel50000.00NULLNULLNULL
2024-02-01Apparel45000.0050000.00-5000.00-10.00
2024-03-01Apparel62000.0045000.0017000.0037.78
2024-04-01Apparel58000.0062000.00-4000.00-6.45

Scenario 7: Change from Starting Point (FIRST_VALUE)

Calculate each region's daily sales change relative to the first day, and the final day's sales.

SELECT sale_date, region, amount, FIRST_VALUE(amount) OVER ( PARTITION BY region ORDER BY sale_date ) AS first_day_amount, LAST_VALUE(amount) OVER ( PARTITION BY region ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS last_day_amount, amount - FIRST_VALUE(amount) OVER ( PARTITION BY region ORDER BY sale_date ) AS change_from_start FROM doc_running_sales ORDER BY region, sale_date;

Result (East region only):

sale_dateregionamountfirst_day_amountlast_day_amountchange_from_start
2024-01-01East1200.001200.002100.000.00
2024-01-02East850.001200.002100.00-350.00
2024-01-03East1500.001200.002100.00300.00
2024-01-04East600.001200.002100.00-600.00
2024-01-05East2100.001200.002100.00900.00

Important Notes

  • Frame Specification Defaults: When ORDER BY is present, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (value-range based). For date columns with duplicate dates, RANGE and ROWS may produce different results. Use explicit ROWS BETWEEN when precise control is needed.
  • Two-Level Windows: When applying window functions over GROUP BY aggregated results, the window function argument must wrap the aggregate function (e.g., SUM(SUM(col))), otherwise an error will occur.
  • LAST_VALUE Trap: Without ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, LAST_VALUE only returns the current row's value (because the default frame stops at the current row).
  • NULL Handling: LAG / LEAD return NULL at boundaries. Use COALESCE(LAG(col, 1), 0) to substitute default values.