Lakehouse Data Pivot and Row-Column Transposition Guide

Overview

Data pivoting and row-column transposition are common requirements in data analysis, used to reshape detail data into wide-table or long-table formats suitable for reporting or further analysis. Singdata Lakehouse supports complete row-column transposition capabilities through standard SQL syntax such as CASE WHEN, UNION ALL, and LATERAL VIEW EXPLODE. This guide categorizes usage by business scenario to help you quickly master data pivot and row-column transposition methods.


SQL Commands Covered

Command/FunctionPurposeApplicable Scenario
CASE WHEN ... ENDConditional expressionRow to column, cross-tabulation
GROUP BYGroup aggregationSummary statistics
UNION ALLMerge multiple query resultsColumn to row
LATERAL VIEW EXPLODE()Expand arrays into multiple rowsMulti-value column expansion
SPLIT()Split string into array by delimiterComma-separated value handling
WITH ... AS (CTE)Common Table ExpressionStep-by-step aggregation then transposition
SUM() / COUNT()Aggregate functionsPivot calculations

Prerequisites

The following examples use a simulated sales detail table doc_pivot_sales:

-- Create test table
CREATE TABLE IF NOT EXISTS doc_pivot_sales (
    sale_id INT,
    region  STRING,
    month   STRING,
    product STRING,
    amount  DOUBLE
);

-- Insert test data (covering East/South/North regions, Jan-Mar 2024, two products, 18 rows total)
INSERT INTO doc_pivot_sales VALUES
(1,  'East', '2024-01', 'ProductA', 1200.00),
(2,  'East', '2024-01', 'ProductB',  800.00),
(3,  'East', '2024-02', 'ProductA', 1500.00),
(4,  'East', '2024-02', 'ProductB',  950.00),
(5,  'East', '2024-03', 'ProductA', 1800.00),
(6,  'East', '2024-03', 'ProductB', 1100.00),
(7,  'South', '2024-01', 'ProductA',  900.00),
(8,  'South', '2024-01', 'ProductB',  600.00),
(9,  'South', '2024-02', 'ProductA', 1100.00),
(10, 'South', '2024-02', 'ProductB',  750.00),
(11, 'South', '2024-03', 'ProductA', 1300.00),
(12, 'South', '2024-03', 'ProductB',  880.00),
(13, 'North', '2024-01', 'ProductA',  700.00),
(14, 'North', '2024-01', 'ProductB',  500.00),
(15, 'North', '2024-02', 'ProductA',  850.00),
(16, 'North', '2024-02', 'ProductB',  620.00),
(17, 'North', '2024-03', 'ProductA', 1000.00),
(18, 'North', '2024-03', 'ProductB',  750.00);

Scenario 1: Row to Column (PIVOT)

Convert detail data with months as row dimensions into a wide-table format with one column per month, making it easy to compare sales across months.

-- Summarize monthly sales by region, turning month rows into columns
SELECT
    region,
    SUM(CASE WHEN month = '2024-01' THEN amount ELSE 0 END) AS jan_amount,
    SUM(CASE WHEN month = '2024-02' THEN amount ELSE 0 END) AS feb_amount,
    SUM(CASE WHEN month = '2024-03' THEN amount ELSE 0 END) AS mar_amount
FROM doc_pivot_sales
GROUP BY region
ORDER BY region;

Execution Result:

regionjan_amountfeb_amountmar_amount
East200024502900
North120014701750
South150018502180

Scenario 2: Column to Row (UNPIVOT)

Convert multiple quarter columns (q1_amount, q2_amount, q3_amount, q4_amount) in a wide table into a long-table format with quarter + amount columns, easier for unified processing or line chart plotting.

First, create wide-table sample data:

-- Create wide table (one row per region, four quarter columns)
CREATE TABLE IF NOT EXISTS doc_pivot_wide (
    region    STRING,
    q1_amount DOUBLE,
    q2_amount DOUBLE,
    q3_amount DOUBLE,
    q4_amount DOUBLE
);

INSERT INTO doc_pivot_wide VALUES
('East',  2000, 2450, 2900, 3100),
('South', 1500, 1850, 2180, 2400),
('North', 1200, 1470, 1750, 1900);

Use UNION ALL to expand four columns into multiple rows:

-- Column to row: merge q1~q4 columns into quarter + amount columns
SELECT region, 'Q1' AS quarter, q1_amount AS amount FROM doc_pivot_wide
UNION ALL
SELECT region, 'Q2',            q2_amount             FROM doc_pivot_wide
UNION ALL
SELECT region, 'Q3',            q3_amount             FROM doc_pivot_wide
UNION ALL
SELECT region, 'Q4',            q4_amount             FROM doc_pivot_wide
ORDER BY region, quarter;

Execution Result:

regionquarteramount
EastQ12000
EastQ22450
EastQ32900
EastQ43100
NorthQ11200
NorthQ21470
NorthQ31750
NorthQ41900
SouthQ11500
SouthQ21850
SouthQ32180
SouthQ42400

Scenario 3: Cross-Tabulation Statistics

Summarize sales for each month by region, forming a month x region two-dimensional matrix suitable for cross-tab reports.

-- Month x Region cross-tab: one row per month, one column per region
SELECT
    month,
    SUM(CASE WHEN region = 'East'  THEN amount ELSE 0 END) AS east_amount,
    SUM(CASE WHEN region = 'South' THEN amount ELSE 0 END) AS south_amount,
    SUM(CASE WHEN region = 'North' THEN amount ELSE 0 END) AS north_amount,
    SUM(amount)                                             AS total_amount
FROM doc_pivot_sales
GROUP BY month
ORDER BY month;

Execution Result:

montheast_amountsouth_amountnorth_amounttotal_amount
2024-012000150012004700
2024-022450185014705770
2024-032900218017506830

You can also create a region x product cross-tab showing sales for both products per region with totals:

-- Region x Product cross-tab: one row per region, one column per product
SELECT
    region,
    SUM(CASE WHEN product = 'ProductA' THEN amount ELSE 0 END) AS product_a_total,
    SUM(CASE WHEN product = 'ProductB' THEN amount ELSE 0 END) AS product_b_total,
    SUM(amount)                                                AS grand_total
FROM doc_pivot_sales
GROUP BY region
ORDER BY region;

Execution Result:

regionproduct_a_totalproduct_b_totalgrand_total
East450028507350
North255018704420
South330022305530

Scenario 4: Multi-Value Column Expansion

When a column stores multiple comma-separated values (e.g., tags, product lists), expand each value into an independent row for statistical analysis.

First, create a sample table with multi-value columns:

-- Create order tags table (tags column stores comma-separated product tags)
CREATE TABLE IF NOT EXISTS doc_pivot_tags (
    order_id INT,
    region   STRING,
    tags     STRING
);

INSERT INTO doc_pivot_tags VALUES
(1, 'East',  'ProductA,ProductB'),
(2, 'South', 'ProductA'),
(3, 'North', 'ProductB,ProductC'),
(4, 'East',  'ProductA,ProductC'),
(5, 'South', 'ProductB,ProductC');

Use LATERAL VIEW EXPLODE with SPLIT to expand the multi-value column:

-- Split the tags column by comma, expanding each tag into an independent row
SELECT
    order_id,
    region,
    tag
FROM doc_pivot_tags
LATERAL VIEW EXPLODE(SPLIT(tags, ',')) t AS tag
ORDER BY order_id, tag;

Execution Result:

order_idregiontag
1EastProductA
1EastProductB
2SouthProductA
3NorthProductB
3NorthProductC
4EastProductA
4EastProductC
5SouthProductB
5SouthProductC

After expansion, you can directly perform aggregate statistics, e.g., counting how many orders each product appears in:

-- After expansion, count orders per product
SELECT
    tag        AS product,
    COUNT(*)   AS order_count
FROM doc_pivot_tags
LATERAL VIEW EXPLODE(SPLIT(tags, ',')) t AS tag
GROUP BY tag
ORDER BY tag;

Execution Result:

productorder_count
ProductA3
ProductB3
ProductC3

Scenario 5: Post-Aggregation Transposition

First group and aggregate by region and month, then transpose the aggregated results into a wide-table format with a total column. This two-step approach keeps SQL clear when aggregation logic is complex.

-- Step 1: Aggregate by region and month; Step 2: Transpose month columns to wide table
WITH monthly_summary AS (
    SELECT
        region,
        month,
        SUM(amount) AS total_amount
    FROM doc_pivot_sales
    GROUP BY region, month
)
SELECT
    region,
    ROUND(SUM(CASE WHEN month = '2024-01' THEN total_amount ELSE 0 END), 2) AS m1,
    ROUND(SUM(CASE WHEN month = '2024-02' THEN total_amount ELSE 0 END), 2) AS m2,
    ROUND(SUM(CASE WHEN month = '2024-03' THEN total_amount ELSE 0 END), 2) AS m3,
    ROUND(SUM(total_amount), 2)                                              AS total
FROM monthly_summary
GROUP BY region
ORDER BY region;

Execution Result:

regionm1m2m3total
East2000245029007350
North1200147017504420
South1500185021805530

Clean Up Test Data

After completing data pivot verification, it is recommended to clean up test tables:

DROP TABLE IF EXISTS doc_pivot_sales;
DROP TABLE IF EXISTS doc_pivot_wide;
DROP TABLE IF EXISTS doc_pivot_tags;

Important Notes

  1. Column Values Must Be Enumerated in Advance: CASE WHEN row-to-column transposition requires knowing all column values when writing the SQL (e.g., months, regions). If column values change dynamically, construct SQL dynamically at the application layer, or first use GROUP_CONCAT to discover all values and then build the query.
  2. ELSE 0 vs ELSE NULL: With CASE WHEN ... ELSE 0 END paired with SUM, non-matching rows contribute 0 and the result contains no NULLs. With ELSE NULL, SUM ignores NULLs (same result), but AVG would differ due to different denominators -- choose based on business semantics.
  3. UNION ALL Column Count and Types Must Match: When doing column-to-row, each SELECT clause must have the same number of columns and corresponding data types, otherwise an error occurs.
  4. LATERAL VIEW and WHERE Execution Order: LATERAL VIEW executes at the FROM stage, WHERE filters after it. To filter original rows before expansion, put conditions in a subquery. To filter expanded results, put conditions in the outer WHERE.
  5. SPLIT Results May Contain Empty Strings: If the original data has trailing commas (e.g., "ProductA,"), SPLIT produces empty string elements. Filter them with WHERE tag != '' after expansion.
  6. Performance Considerations: More CASE WHEN expressions in row-to-column means more column scans, impacting wide-table performance. When columns exceed 50, consider using MAP_AGG or application-layer processing.