TRANS_ARRAY Function

Function Description

TRANS_ARRAY is a user-defined table-valued function (UDTF) used to transform one row of data into multiple rows. It converts arrays stored in columns that are formatted with fixed delimiters into multiple rows, implementing "row-to-column" data transformation. This function is particularly suitable for handling data structures where multiple values are stored in a single string field.

Core Characteristics

  • Row-to-Column Transformation: Splits multiple delimiter-separated values in a single row into multiple rows
  • Flexible Key Columns: Supports specifying one or more columns as Keys that are not involved in the transformation
  • Automatic Padding: When array lengths are unequal, automatically fills shorter arrays with NULL
  • Configurable Separator: Supports any string as a separator

Syntax

trans_array(<num_keys>, <separator>, <key1>, <key2>, ..., <col1>, <col2>, <col3>)
AS (<key1>, <key2>, ..., <col1>, <col2>)

Parameter Description

ParameterTypeRequiredDescription
num_keysBIGINTYesThe number of columns used as Keys during transformation. Must be a constant with value ≥ 0. Key columns must come before all columns to be transposed
separatorSTRINGYesThe separator used to split the string into multiple elements. Must be a constant and cannot be an empty string, otherwise an error is returned
key1, key2, ...Any TypeYesColumns used as Keys during transformation, with the number specified by num_keys. Key columns maintain their original data type in the result
col1, col2, ...STRINGYesArray columns to be converted into rows. These columns must be of STRING type, and their content must be string-formatted arrays, for example item1;item2;item3

Special Notes

  • Role of Key Columns: Key columns are used for grouping and identification. The values of Key columns remain unchanged during the splitting process

  • Requirements for Columns to Be Transposed: All columns after Key columns are treated as array columns to be transposed and must be of STRING type

  • Output Row Count: Based on the length of the longest array, other shorter arrays are filled with NULL where insufficient

  • Output Type: Key columns maintain their original type, and all other columns are of STRING type

Return Value

Returns transposed multiple rows of data, where:

  • The data type of Key columns remains unchanged
  • All transposed columns are of STRING type
  • New column names are specified by the AS clause

Usage Examples

Example 1: Basic Transposition with Single Key

Scenario Description: In the user login table, each user has multiple login IP and time information stored as delimiter-separated strings. It needs to be split into multiple rows for analysis.

Original Data Table: test_user_login

CREATE TABLE test_user_login (
    login_id STRING,
    login_ip STRING,
    login_time STRING
);

INSERT INTO test_user_login VALUES
('wangwangA', '192.168.0.1,192.168.0.2', '20120101010000,20120102010000'),
('wangwangB', '192.168.45.10,192.168.67.22,192.168.6.3', '20120111010000,20120112010000,20120223080000');

SQL Statement:

SELECT  trans_array(1, ',', login_id, login_ip, login_time)  AS (login_id, login_ip, login_time)
FROM test_user_login;

Execution Result:

login_idlogin_iplogin_time
wangwangA192.168.0.120120101010000
wangwangA192.168.0.220120102010000
wangwangB192.168.45.1020120111010000
wangwangB192.168.67.2220120112010000
wangwangB192.168.6.320120223080000

Result Explanation:

  • num_keys=1 means only the login_id column serves as the Key
  • separator=',' specifies comma as the separator
  • Each user's login records are split into separate rows
  • login_id remains unchanged, serving as the grouping identifier

Example 2: Transposition with Multiple Keys

Scenario Description: The user information table has user ID, name, and multiple login IP and time records. Use two Key columns for transposition.

Original Data Table: test_user_info

CREATE TABLE test_user_info (
    id INT,
    name STRING,
    login_ip STRING,
    login_time STRING
);

INSERT INTO test_user_info VALUES
(1, 'Tom', '192.168.100.1,192.168.100.2', '20211101010101,20211101010102'),
(2, 'Jerry', '192.168.100.3,192.168.100.4', '20211101010103,20211101010104');

SQL Statement:

SELECT  trans_array(2, ',', id, name, login_ip, login_time)  AS (id, name, login_ip, login_time)
FROM test_user_info;

Execution Result:

idnamelogin_iplogin_time
1Tom192.168.100.120211101010101
1Tom192.168.100.220211101010102
2Jerry192.168.100.320211101010103
2Jerry192.168.100.420211101010104

Result Explanation:

  • num_keys=2 means columns id and name serve as Keys
  • The two Key columns remain unchanged in the result
  • Login records maintain grouping relationships based on the values of Key columns

Example 3: Handling Unequal Length Arrays (Automatic NULL Padding)

Scenario Description: In the user interest table, different users have different numbers of hobbies and sports. Shows how to handle cases where array lengths are unequal.

Original Data Table: test_unequal_array

CREATE TABLE test_unequal_array (
    user_id STRING,
    hobbies STRING,
    sports STRING
);

INSERT INTO test_unequal_array VALUES
('user1', 'reading,coding', 'basketball,tennis,swimming');

SQL Statement:

SELECT  trans_array(1, ',', user_id, hobbies, sports)  AS (user_id, hobbies, sports)
FROM test_unequal_array;

Execution Result:

user_idhobbiessports
user1readingbasketball
user1codingtennis
user1NULLswimming

Result Explanation:

  • The hobbies column has 2 elements, and the sports column has 3 elements
  • The output row count is based on the longer array (3 rows)
  • The third row of the hobbies column is filled with NULL
  • This automatic padding prevents data loss