Parameterized Definition Dynamic Table Creation Syntax
When the dynamic table is a partitioned table, parameters can be passed. Parameters are defined through SESSION_CONFIGS()['dt.arg.xx']. SESSION_CONFIGS() is a built-in system function. For non-partitioned tables, if parameterized definitions are used and the passed parameter values remain unchanged, the system will perform incremental refresh. However, once the parameter values change, the system will perform a full refresh, as this is equivalent to changing the table definition. For partitioned tables, the partition key must be a parameter field, but these parameter fields are not required to be partition columns of the table.
Creation Statement
The parameterized definition of a Dynamic Table allows users to define parameters through SESSION_CONFIGS()['dt.arg.xx']
. These parameters are referenced when creating the table, thereby dynamically affecting the content of the table.
SESSION_CONFIGS()
:A built-in system function used to read parameters from the configuration.'dt.arg.xx'
:The name of the DT parameter, which must start withdt.arg.
to avoid conflicts with internal system fields.
Example Code
In this example, dt.args.event_day
is a DT parameter, which is read from the configuration using the SESSION_CONFIGS()
function. If non-string type parameters are needed, the CAST
function can be used for type conversion.
Refresh Statement
The refresh behavior of a parameterized Dynamic Table depends on whether the table is partitioned.
Non-Partitioned Table Refresh Syntax
- The parameter values of non-partitioned tables remain unchanged and will be incrementally refreshed.
- The parameter values of non-partitioned tables change, and a full refresh will occur.
Partitioned Table Refresh Syntax
When refreshing a partitioned table, the partition_spec
must be specified in the order of the table's partition hierarchy. This means that if the table is partitioned by multiple fields, these fields need to be specified from the highest level to the lowest level.
Example Description
Assuming a table is partitioned by day
, hour
, and min
in three levels, the correct way to specify the partition_spec
is as follows:
Valid Specification: You can specify higher-level and some lower-level partitions, but you cannot skip any intermediate level partitions.
In this example, day
and hour
are specified, while the min
partition can be ignored.
Invalid Specification: Skipping the specification of any intermediate level partitions is not allowed.
Precautions
-
Parameter and Partition Consistency When performing a refresh operation on a Dynamic Table, you must ensure that the parameter values passed in are consistent with the specified partition values to be refreshed. If there is any inconsistency, the system will report an error during execution.
-
Error Example:
In the above situation, although the parameter event_day is set to 11, the refresh command attempts to write to partition event_day=19. This will cause the actual calculated partition result to be inconsistent with the specified partition, resulting in errors.
-
-
Correct Example:
-
Concurrent Refresh Tasks In these commands, the parameter values match the partition values, so they can be executed concurrently without conflicts. As long as there are no conflicts between partitions, the system allows multiple partition refresh tasks to be executed simultaneously.
Full Refresh and Incremental Refresh
- Full Refresh
Full refresh occurs in the following situations:
- The parameter values of a non-partitioned table change.
- The parameter values of a partitioned table change, causing the partition conditions to change. Example code
- Incremental Refresh
Incremental refresh occurs in the following situations:
- The parameter values of the non-partitioned table remain unchanged.
- The parameter values of the partitioned table remain unchanged, and the partition conditions have not changed. Example code
Scenario Case
Case 1: Converting Offline Tasks to Incremental Tasks This section will guide users on how to convert existing offline tasks into incremental tasks for more efficient data processing. The following are specific operational steps based on a "traditional database," suitable for scenarios where business logic is aligned and scheduled for daily refresh.
- Step 1: Parameterize the original SQL. The original SQL is as follows
First, all parameters ${bizdate}
passed in by the scheduling engine in the original SQL need to be replaced with SESSION_CONFIGS()['dt.args.bizdate']
. This step will allow the parameter values to be dynamically passed in through configuration, rather than being hard-coded in the SQL.
Original SQL Parameter Replacement:
Replace all ${bizdate}
with SESSION_CONFIGS()['dt.args.bizdate']
:
-
Step 2: Schedule Refresh Command During each scheduling, the parameter
dt.args.bizdate
needs to be set to a specific date value, and the refresh command should be executed.Example of scheduling refresh command
Case 2: Incremental Task Data Supplementation, in some cases, users may need to supplement data to existing partitions.
-
Method 1: Supplement data to the source table, users can directly supplement data to the source table. These supplemented data will be automatically reflected in the Dynamic Table (DT) through the corresponding REFRESH task.
Operation Steps
- Directly insert or update data in the source table.
- Execute the REFRESH task to synchronize the changes to the DT.
-
Method 2: Use DML statements to directly supplement data to the DT, users can also use DML statements to insert data into specific partitions of the DT. Operation Steps
- Use DML statements to insert data into specific partitions of the DT.
- Note that directly modifying the DT will result in a full refresh of that partition the next time. If users do not want a full refresh result, they should avoid scheduling the REFRESH task for that partition. Example Code
Notes
- Data directly inserted into DT will participate in downstream calculations of DT. If the downstream old partitions do not require this data, please do not schedule REFRESH tasks involving these partitions.
- Other unaffected partitions can still be incrementally refreshed.
Case Three: Executing incremental tasks in different VCs. For parameterized partitioned DT, refresh tasks for different partitions can be executed simultaneously. Users can allocate different REFRESH tasks to different Virtual Clusters (VC) as needed. Steps
- Allocate different REFRESH tasks to different VCs based on timeliness requirements and resource needs.
- For example, for new partitions with high timeliness requirements, their REFRESH tasks can be executed in large VCs with more resources.
- For supplementary tasks of other old partitions, their REFRESH tasks can be executed in smaller VCs with fewer resources.