Task Scheduling Dependencies

Overview

Singdata Lakehouse task scheduling dependencies are configuration operations that form upstream and downstream dependency relationships between different task nodes.

Singdata Lakehouse supports creating dependencies across workspaces. You can configure the upstream and downstream dependencies of tasks according to the dependency scenarios required by actual business logic.

Special Notes on Dependencies

Current Dependency Limitations

  • Unsupported task types: real-time synchronization, multi-table real-time synchronization, dynamic tables, streaming SQL
  • Circular dependencies are not allowed
  • When the dependent node is not online, the downstream node cannot go online.
  • If the current node is dependent on by other tasks, and the downstream periodic scheduling task of this node is in normal scheduling, the current node is not allowed to go offline directly.

Explanation of Instance Dependencies

Singdata Lakehouse supports three types of scheduling tasks: minute, hour, and day, which can depend on each other. Different scheduling types have different task execution cycles. When a periodic task runs, the downstream periodic instance will depend on the upstream periodic instance. The default mounting relationship in the current system is interval dependency, which means that the start and end time range of the downstream task instance is overlapped with the start and end time range of the upstream instance, and the specific dependency instance range is determined based on the following rules:

  • Same cycle: Dependency occurs in the same cycle.
  • Large cycle depends on small cycle: For example, a daily task depends on an hourly task. The downstream instance covers several instances of the upstream task, and it depends on those upstream instances.
  • Small cycle depends on large cycle: For example, an hourly task depends on a daily task. The downstream instance is covered by which instance of the upstream, and it depends on that upstream instance.

Note: Once the downstream task is mounted with dependencies, two conditions must be met to trigger the normal operation of the instance, namely the successful normal operation of the upstream task instance and reaching its own scheduling time.

Explanation of Complex Dependency Scenarios

Same Cycle Dependency

Scenario ClassificationTask DescriptionDiagram
Daily Task B depends on Daily Task ATask A: Generates 1 instance every day at 19:00 Task B: Generates 1 instance every day at 09:00 Upstream daily task does not set self-dependency By default, the periodic instance of the downstream daily task mounts dependency to the periodic instance of the upstream daily task in the same cycle. Upstream daily task sets self-dependency Upstream daily task sets self-dependency, and there is a cross-cycle dependency when the downstream daily task depends on the upstream daily task.
Note: Any type of self-dependency will create a dependency relationship with the previous cycle. If the previous cycle is not completed, it will prevent the next cycle's task from being scheduled. The diagram only demonstrates the daily cycle, subsequent types will not be repeatedly explained.
Hourly Task B depends on Hourly Task ATask A: 00:31-23:59 interval, generates an instance every 1 hour. A total of 24 instances, the first instance is at 00:31. Task B: 00:10-23:59 interval, generates an instance every 1 hour. A total of 24 instances, the first instance is at 00:10.
Task A: 12:30-15:59 interval, generate an instance every 1 hour. A total of 4 instances, the first instance at 12:31. Task B: 15:10-18:59 interval, generate an instance every 1 hour. A total of 4 instances, the first instance at 15:10.
Minute Task B depends on Minute Task ATask A: 00:00-01:59 interval, generate an instance every 5 minutes. A total of 12 instances, the first instance at 00:00. Task B: 00:10-00:59 interval, generate an instance every 5 minutes. A total of 10 instances, the first instance at 00:10.
Task A: 00:22-00:59 interval, generate an instance every 5 minutes. A total of 8 instances, the first instance at 00:12. Task B: 00:36-00:59 interval, generate an instance every 5 minutes. A total of 5 instances, the first instance at 00:15.

Large Cycle Depends on Small Cycle

ScenarioDependency DescriptionDiagram
Daily Task B depends on Hourly Task ATask A: 00:31-23:59, generate an instance every 1 hour. A total of 24 instances, the first instance at 00:31. Task B: generate 1 instance daily at 19:00.
Hourly Task B depends on Hourly Task ATask A: 00:31-23:59, generate an instance every 1 hour. A total of 24 instances, the first instance at 00:31. Task B: 00:10-23:59, generate an instance every 2 hours. A total of 12 instances, the first instance at 00:10.
Task A: Interval from 05:31 to 11:59, generate an instance every 3 hours. A total of 3 instances are generated, with the first instance at 05:31. Task B: Interval from 07:10 to 17:59, generate an instance every 5 hours. A total of 3 instances are generated, with the first instance at 07:10.
Hourly Task B depends on Minute Task ATask A: Interval from 00:08 to 01:59, generate an instance every 10 minutes. A total of 12 instances are generated, with the first instance at 00:08. Task B: Interval from 00:15 to 00:59, generate an instance every 1 hour. A total of 1 instance is generated, with the first instance at 00:15.
Minute Task B depends on Minute Task ATask A: Interval from 00:00 to 01:59, generate an instance every 10 minutes. A total of 12 instances are generated, with the first instance at 00:00. Task B: Interval from 00:38 to 01:59, generate an instance every 20 minutes. A total of 4 instances are generated, with the first instance at 00:38.
Note: If the minute granularity start and end range spans across hours, it will be automatically truncated. For example, the range of the second instance of Task B from left to right in the illustration is [58, 60).

Small Cycle Depends on Large Cycle

ScenarioDependency DescriptionIllustration
Hourly Task B depends on Daily Task ATask A: Generate 1 instance every day at 19:00. Task B: Interval from 00:31 to 23:59, generate an instance every 1 hour. A total of 24 instances are generated, with the first instance at 00:31.
Hourly Task B depends on Hourly Task ATask A: Interval from 00:31 to 23:59, generate an instance every 2 hours. A total of 12 instances are generated, with the first instance at 00:31. Task B: Interval from 01:10 to 10:59, generate an instance every 1 hour. A total of 10 instances are generated, with the first instance at 01:10.
Minute Task B depends on Hourly Task ATask A: Interval from 00:15 to 00:59, generate an instance every 1 hour. A total of 1 instance is generated, with the first instance at 00:15. Task B: Interval from 00:08 to 01:59, generate an instance every 10 minutes. A total of 12 instances are generated, with the first instance at 00:08.
Minute Task B Depends on Minute Task ATask A: 00:38-01:59 interval, generates an instance every 20 minutes. A total of 4 instances are generated, with the first instance at 00:38. Task B: 00:00-01:59 interval, generates an instance every 10 minutes. A total of 12 instances are generated, with the first instance at 00:00.

Frequently Asked Questions

Q1: In the periodic task process submitted to the production environment, after finding an instance running error at a certain node, why does the erroneous instance still exist after taking the task offline and replacing it with another task for resubmission?

A: The task instances that have already been completed will not be taken offline because the task itself is taken offline.