Lakehouse Data Pivot and Row-Column Transposition Guide
Overview
Data pivoting and row-column transposition are common requirements in data analysis, used to reshape detail data into wide-table or long-table formats suitable for reporting or further analysis. Singdata Lakehouse supports complete row-column transposition capabilities through standard SQL syntax such as CASE WHEN, UNION ALL, and LATERAL VIEW EXPLODE. This guide categorizes usage by business scenario to help you quickly master data pivot and row-column transposition methods.
Quick Navigation
- Row to Column (PIVOT) -- Use CASE WHEN + GROUP BY to expand row data into columns
- Column to Row (UNPIVOT) -- Use UNION ALL to merge multiple columns into multiple rows
- Cross-Tabulation Statistics -- Two-dimensional cross-summary matrix
- Multi-Value Column Expansion -- Use LATERAL VIEW EXPLODE + SPLIT to expand comma-separated values
- Post-Aggregation Transposition -- First GROUP BY aggregate, then transpose to wide table with CASE WHEN
SQL Commands Covered
| Command/Function | Purpose | Applicable Scenario |
|---|---|---|
CASE WHEN ... END | Conditional expression | Row to column, cross-tabulation |
GROUP BY | Group aggregation | Summary statistics |
UNION ALL | Merge multiple query results | Column to row |
LATERAL VIEW EXPLODE() | Expand arrays into multiple rows | Multi-value column expansion |
SPLIT() | Split string into array by delimiter | Comma-separated value handling |
WITH ... AS (CTE) | Common Table Expression | Step-by-step aggregation then transposition |
SUM() / COUNT() | Aggregate functions | Pivot calculations |
Prerequisites
The following examples use a simulated sales detail table doc_pivot_sales:
Scenario 1: Row to Column (PIVOT)
Convert detail data with months as row dimensions into a wide-table format with one column per month, making it easy to compare sales across months.
Execution Result:
| region | jan_amount | feb_amount | mar_amount |
|---|---|---|---|
| East | 2000 | 2450 | 2900 |
| North | 1200 | 1470 | 1750 |
| South | 1500 | 1850 | 2180 |
Scenario 2: Column to Row (UNPIVOT)
Convert multiple quarter columns (q1_amount, q2_amount, q3_amount, q4_amount) in a wide table into a long-table format with quarter + amount columns, easier for unified processing or line chart plotting.
First, create wide-table sample data:
Use UNION ALL to expand four columns into multiple rows:
Execution Result:
| region | quarter | amount |
|---|---|---|
| East | Q1 | 2000 |
| East | Q2 | 2450 |
| East | Q3 | 2900 |
| East | Q4 | 3100 |
| North | Q1 | 1200 |
| North | Q2 | 1470 |
| North | Q3 | 1750 |
| North | Q4 | 1900 |
| South | Q1 | 1500 |
| South | Q2 | 1850 |
| South | Q3 | 2180 |
| South | Q4 | 2400 |
Scenario 3: Cross-Tabulation Statistics
Summarize sales for each month by region, forming a month x region two-dimensional matrix suitable for cross-tab reports.
Execution Result:
| month | east_amount | south_amount | north_amount | total_amount |
|---|---|---|---|---|
| 2024-01 | 2000 | 1500 | 1200 | 4700 |
| 2024-02 | 2450 | 1850 | 1470 | 5770 |
| 2024-03 | 2900 | 2180 | 1750 | 6830 |
You can also create a region x product cross-tab showing sales for both products per region with totals:
Execution Result:
| region | product_a_total | product_b_total | grand_total |
|---|---|---|---|
| East | 4500 | 2850 | 7350 |
| North | 2550 | 1870 | 4420 |
| South | 3300 | 2230 | 5530 |
Scenario 4: Multi-Value Column Expansion
When a column stores multiple comma-separated values (e.g., tags, product lists), expand each value into an independent row for statistical analysis.
First, create a sample table with multi-value columns:
Use LATERAL VIEW EXPLODE with SPLIT to expand the multi-value column:
Execution Result:
| order_id | region | tag |
|---|---|---|
| 1 | East | ProductA |
| 1 | East | ProductB |
| 2 | South | ProductA |
| 3 | North | ProductB |
| 3 | North | ProductC |
| 4 | East | ProductA |
| 4 | East | ProductC |
| 5 | South | ProductB |
| 5 | South | ProductC |
After expansion, you can directly perform aggregate statistics, e.g., counting how many orders each product appears in:
Execution Result:
| product | order_count |
|---|---|
| ProductA | 3 |
| ProductB | 3 |
| ProductC | 3 |
Scenario 5: Post-Aggregation Transposition
First group and aggregate by region and month, then transpose the aggregated results into a wide-table format with a total column. This two-step approach keeps SQL clear when aggregation logic is complex.
Execution Result:
| region | m1 | m2 | m3 | total |
|---|---|---|---|---|
| East | 2000 | 2450 | 2900 | 7350 |
| North | 1200 | 1470 | 1750 | 4420 |
| South | 1500 | 1850 | 2180 | 5530 |
Clean Up Test Data
After completing data pivot verification, it is recommended to clean up test tables:
Important Notes
- Column Values Must Be Enumerated in Advance:
CASE WHENrow-to-column transposition requires knowing all column values when writing the SQL (e.g., months, regions). If column values change dynamically, construct SQL dynamically at the application layer, or first useGROUP_CONCATto discover all values and then build the query. - ELSE 0 vs ELSE NULL: With
CASE WHEN ... ELSE 0 ENDpaired withSUM, non-matching rows contribute 0 and the result contains no NULLs. WithELSE NULL,SUMignores NULLs (same result), butAVGwould differ due to different denominators -- choose based on business semantics. - UNION ALL Column Count and Types Must Match: When doing column-to-row, each
SELECTclause must have the same number of columns and corresponding data types, otherwise an error occurs. - LATERAL VIEW and WHERE Execution Order:
LATERAL VIEWexecutes at theFROMstage,WHEREfilters after it. To filter original rows before expansion, put conditions in a subquery. To filter expanded results, put conditions in the outerWHERE. - SPLIT Results May Contain Empty Strings: If the original data has trailing commas (e.g.,
"ProductA,"),SPLITproduces empty string elements. Filter them withWHERE tag != ''after expansion. - Performance Considerations: More
CASE WHENexpressions in row-to-column means more column scans, impacting wide-table performance. When columns exceed 50, consider usingMAP_AGGor application-layer processing.
