Lakehouse JSON Data Parsing Guide
Overview
Semi-structured data (such as JSON) is increasingly common in modern data platforms. Singdata Lakehouse provides a native JSON data type and rich JSON functions, supporting efficient JSON parsing, extraction, and conversion. This guide categorizes usage by business scenario to help you quickly master JSON data handling methods.
Quick Navigation
- JSON String Parsing -- Use PARSE_JSON to convert strings to JSON type
- Extract Scalar Values -- Use json_extract_string/int to extract fields
- Extract Nested Objects -- Use json_extract to extract sub-objects or arrays
- Expand JSON to Rows -- Use LATERAL VIEW + EXPLODE to expand arrays
- Build JSON Objects -- Use TO_JSON to convert structured data to JSON
SQL Commands Covered
| Command/Function | Purpose | Applicable Scenario |
|---|---|---|
PARSE_JSON(str) | String to JSON | Parse JSON strings into JSON type |
json_extract_string(json, path) | Extract string value | Extract JSON field as STRING |
json_extract_int(json, path) | Extract integer value | Extract JSON field as INT |
json_extract(json, path) | Extract JSON object/array | Extract nested structure, returns JSON type |
LATERAL VIEW EXPLODE(...) | Expand JSON array | Convert array elements to multiple rows |
TO_JSON(struct) | Struct to JSON | Convert query results to JSON string |
Prerequisites
The following examples use a simulated user behavior log table user_events:
JSON String Parsing
Parse JSON data stored as STRING into Lakehouse's native JSON type for use with JSON functions.
Result:
| event_id | user_id | json_payload |
|---|---|---|
| 1 | 101 | {"page":"home","duration":30,"tags":["tech","news"]} |
Extract Scalar Values
Use json_extract_* functions to extract specific field values from JSON and convert them to the corresponding type.
Result:
| event_id | page_name | duration_sec |
|---|---|---|
| 1 | home | 30 |
Path Syntax
$.field: Extract a root-level field$.parent.child: Extract a nested field$.array[0]: Extract an element at a specific array index
Extract Nested Objects
When extracting an entire sub-object or array, use json_extract, which returns the JSON type.
Result:
| event_id | item_details |
|---|---|
| 2 | {"color":"silver","warranty":2} |
After extraction, you can continue parsing with json_extract_string:
Result:
| event_id | color |
|---|---|
| 2 | silver |
Expand JSON to Rows
Expand JSON arrays into multiple rows, commonly used for tags, attribute lists, and similar scenarios. In Lakehouse, use EXPLODE with type casting.
Result:
| event_id | user_id | tag |
|---|---|---|
| 1 | 101 | tech |
| 1 | 101 | news |
Build JSON Objects
Convert query results or structs to JSON strings, commonly used for API responses or data export.
Result:
| event_id | event_json |
|---|---|
| 1 | {"user_id":101,"event_type":"page_view","timestamp":"2024-06-01T10:00:00Z"} |
Clean Up Test Data
After completing JSON parsing verification, it is recommended to clean up test tables:
Important Notes
- PARSE_JSON Performance: If a column is already of
JSONtype, callingPARSE_JSONis unnecessary. - Non-Existent Path:
json_extract_*functions returnNULLwhen the path does not exist and do not throw errors. - Type Matching: Use the correct extraction function (e.g.,
json_extract_intfor numbers); otherwise additionalCASTmay be needed. - FROM_JSON Case Pitfall:
FROM_JSONforce-lowercases field names when parsing schemas. UsePARSE_JSONto preserve case. - Array Expansion: Use
EXPLODE(CAST(json_extract(...) AS ARRAY<TYPE>))to expand JSON arrays into multiple rows. - Array Indexing: JSON array indices are 0-based, e.g.,
$.tags[0]gets the first element.
