You need to retain users with no tags for comparative analysis.
SQL Implementation
SELECT user_id, tag FROM user_tags
LATERAL VIEW explode_outer(tags) t AS tag;
Output:
user_id
tag
1
vip
1
active
1
buyer
2
new_user
3
NULL
Key difference: User 3 is preserved, with tag set to NULL.
Scenario 3: Expansion with Index (posexplode)
Problem
You need to know the position of each tag in the original array (e.g., for priority ordering).
SQL Implementation
SELECT user_id, pos, tag FROM user_tags
LATERAL VIEW posexplode(tags) t AS pos, tag;
Output:
user_id
pos
tag
1
0
vip
1
1
active
1
2
buyer
2
0
new_user
Index starts at 0, reflecting each element's position in the original array.
Joining Multiple Arrays
-- Tags and scores correspond by position
SELECT
u.user_id,
t.tag,
s.score
FROM user_tags u
LATERAL VIEW posexplode(u.tags) t AS pos, tag
LATERAL VIEW posexplode(u.scores) s AS spos, score
WHERE t.pos = s.spos;
Expand key-value pair arrays in sync while preserving their correspondence.
SQL Implementation
SELECT user_id, k, v FROM user_tags, unnest(tags, scores) AS t(k, v);
Output:
user_id
k
v
1
vip
100
1
active
80
1
buyer
90
2
new_user
50
Advantage: More concise than posexplode + WHERE; automatically pairs elements by position.
Scenario 5: Flattening Nested Arrays (flatten)
Problem
Flatten a 2D array (e.g., tags from multiple users merged together) into a 1D array.
SQL Implementation
SELECT flatten(ARRAY[ARRAY['vip', 'active'], ARRAY['buyer'], ARRAY['new_user']]) AS all_tags;
Output:
all_tags
vip:active:buyer:new_user
Using flatten with explode
-- Flatten first, then explode
SELECT tag FROM (
SELECT flatten(ARRAY[ARRAY['vip', 'active'], ARRAY['buyer']]) AS all_tags
)
LATERAL VIEW explode(all_tags) t AS tag;
Output:
tag
vip
active
buyer
Scenario 6: Expanding JSON Arrays
Problem
Extract and expand array fields from a JSON string.
SQL Implementation
-- You need to PARSE_JSON first to convert to a JSON type
WITH parsed AS (
SELECT user_id, PARSE_JSON(metadata) AS meta FROM user_tags
)
SELECT user_id, tag FROM parsed
LATERAL VIEW explode_json_array_string(meta.tags) t AS tag;
Output:
user_id
tag
1
a
1
b
2
c
Expanding a JSON Integer Array
WITH parsed AS (
SELECT user_id, PARSE_JSON(metadata) AS meta FROM user_tags
)
SELECT user_id, id FROM parsed
LATERAL VIEW explode_json_array_int(meta.ids) t AS id;
Output:
user_id
id
1
1
1
2
2
3
Common Issues
1. Choosing between explode and explode_outer
-- explode: rows with empty arrays are dropped
SELECT user_id, tag FROM user_tags
LATERAL VIEW explode(tags) t AS tag;
-- User 3 is not in the result
-- explode_outer: rows with empty arrays are kept as NULL
SELECT user_id, tag FROM user_tags
LATERAL VIEW explode_outer(tags) t AS tag;
-- User 3 is in the result, with tag = NULL
2. explode_json_array_* requires a JSON type
-- Wrong: passing a string directly
LATERAL VIEW explode_json_array_string('["a", "b"]') t AS tag
-- Correct: use PARSE_JSON first
LATERAL VIEW explode_json_array_string(PARSE_JSON('["a", "b"]')) t AS tag
3. unnest requires arrays of equal length
-- If arrays have different lengths, the shorter one determines the row count; extra elements are ignored
SELECT unnest(ARRAY['a', 'b', 'c'], ARRAY[1, 2]) AS t(k, v);
-- Only 2 rows are expanded: (a,1), (b,2)
4. posexplode index starts at 0
-- Index is 0-based, not 1-based
SELECT pos, tag FROM user_tags
LATERAL VIEW posexplode(tags) t AS pos, tag;
-- pos = 0, 1, 2 ...
5. Cartesian product from multiple LATERAL VIEWs
-- Wrong: two explodes produce a Cartesian product
SELECT user_id, tag, score FROM user_tags
LATERAL VIEW explode(tags) t AS tag
LATERAL VIEW explode(scores) s AS score;
-- User 1 produces 3 x 3 = 9 rows
-- Correct: use unnest or posexplode + WHERE
SELECT user_id, k, v FROM user_tags, unnest(tags, scores) AS t(k, v);
Performance Optimization Tips
Scenario
Optimization Strategy
Expanding large arrays
Use filter to shrink the array before expanding
Aggregating after expansion
Apply WHERE filtering immediately after LATERAL VIEW
Expanding JSON arrays
Avoid complex JSON parsing before expansion
Multi-column expansion
Prefer unnest over multiple posexplode + WHERE
-- Recommended: filter before expanding
SELECT user_id, tag FROM user_tags
LATERAL VIEW explode(filter(tags, t -> t != 'inactive')) t AS tag;
-- Not recommended: expand then filter (produces more intermediate rows)
SELECT user_id, tag FROM user_tags
LATERAL VIEW explode(tags) t AS tag
WHERE tag != 'inactive';