Task Development and Orchestration

Overview

"Development" refers to the big data development IDE, integrating both stream and batch development, providing developers with an efficient and intelligent development environment. It supports users in task development, debugging, configuration scheduling, and submission and release operations, completing the key processes of big data aggregation, processing, and analysis.

Click "Development" in the left navigation menu to enter the main interface.

Interface Overview

No.Function
1Function Switch
2Switch Workspace: This section displays the name of the current data development project space and the region you are in. You can click the dropdown icon to switch to other project spaces.
3Development Directory Tree: The directory tree is used to manage task codes in an orderly manner. You can create directory trees according to business needs to manage tasks by category and hierarchy.
4SQL Compilation Area: The main operation interface for development, with different types of compilation areas having different operation interfaces.
5Execution Results: Presents the results after running the operations in the SQL compilation area.

Directory Tree

The data development module displays all tasks under the current project in the form of a directory tree, making it easy to add, delete, modify, and query tasks.

The supported operations on the directory are as follows:

  • Switch tab: Switch the type of the directory tree below by clicking the tab, i.e., task directory tree or data directory tree.

  • Task Directory Tree: The list below shows all tasks in the current space.

  • Data Directory Tree: The list below shows all data directory trees in the user's region, presented in the hierarchy of workspace-schema-table/view.

  • New: Create folders and various development tasks by clicking the + button on the directory tree.

  • Development Task Types: Real-time synchronization, offline synchronization, SQL script, Python script, shell script

  • Filter Tasks: Supports filtering tasks by development task type, submission status, and responsible person.

  • Refresh: Refresh the current directory tree information.

  • Search Tasks: Search and locate specific tasks based on task name or path.

  • Task Name: Fuzzy match keywords with "task name" to return file directories or tasks that meet the criteria.

  • Path: Fuzzy match keywords with the path where the task is located to return file directories or tasks that meet the criteria.

  • Create Subdirectory & Create Subtask: When hovering over a subdirectory, more icons will appear, allowing the following operations:

  • Current Directory is a Folder: Create any task type, new folder, delete, rename at the current level.

  • Current Directory is a Task: Open task, create a copy, copy name, rename, move, delete at the current level.

Operation Instructions:

  • Create Copy: Copy the current task, including the scheduling configuration within the task.
  • Copy Name: Copy the current directory name.
  • Rename: Rename the directory name.
  • Move: Move the directory to another directory within the project. If the directory contains subdirectories or tasks, they will be moved together.
  • Delete: Delete the directory.
  • Task Directory Tree: Icons distinguish task types, and colors distinguish online status.

Status LegendDescription
After the task is submitted, the saved version differs from the submitted version.
After the task is submitted, the server's saved version is consistent with the submitted version.
After submitting the task, click the offline operation.
Other status types.
Icon LegendDescription
Real-time synchronization
Offline synchronization
SQL script
Shell script
Python script

Task Development Process

The data development module is based on the underlying engine capabilities of Lakehouse, providing various development types. Users can choose the task type for interactive data development work.

Create a New Task

Currently, two types of task nodes are supported: data integration and data development.

After clicking "New", select the specific development type, a pop-up window will appear. Enter the task name, select the specific directory level where the current task needs to be saved, and then enter the development interface.

Operation Bar Description

Task Tab

After the user clicks the task directory tree on the left or creates a new task node, the task tab will open (or locate). In the task tab bar, users can perform the following operations:

New: Users can click the "+" sign in the tab status bar to quickly create a new task, which is saved by default under the root node of the current space.

Close Tab: When the mouse hovers over the tab, a downward button appears, allowing the user to perform batch closing operations on the task tab.

Tab color status correspondence:

Status IconDescription
GreenRun successfully
RedRun failed
BlueRunning
GrayNo run

SQL Functional Area

No.Function NameDescription
1SaveSave the task, including the current node code and related configurations.
2FormattingFormats the written code to make its syntax structure look concise and clear
3ParametersNot yet online.
4VersionsClick on Versions to view the committed and saved versions of the current task. Supports code viewing and rollback between versions.
5SchedulingClick to pop up the sidebar window for scheduling settings. For detailed configuration, see Scheduling Settings.
6SubmitThe "Submit" function is only needed for tasks that require scheduled dispatch. After "Submit", the task will be submitted and published to the Operations Center, and will run according to the configured schedule. Before submission, scheduling configuration is required. For detailed configuration, see Scheduling Settings.
7OperationsClick to enter the "Operations" center.
8Task Flow DescriptionHover the mouse over the tips of the task flow to display the online flowchart of the development task.
9Cluster FilteringThe "default" shown in the figure indicates the Virtual Cluster required for task execution. You can click to switch and select other Virtual Clusters. To add a new Virtual Cluster, you can operate in "Compute".
10Run/StopRun/Terminate the code of the current node. When running SQL code, it supports running selected parts of the code.

SQL Editing Area

  • Schema Switching: Click the dropdown box to switch the schema in the current workspace, default is public.
  • Shortcuts: Various capabilities supported in the Studio development compilation area. For details, see Others: Common Shortcut Operations.
  • SQL Editing Area: The SQL compiler in the Studio development function provides the following features to enhance the efficiency of data development and data analysis.
FunctionDescription
Code FoldingCollapse code blocks to reduce reading interference.
Real-time Syntax Error PromptPrompt users of syntax errors found during code writing to help avoid mistakes.
Syntax HighlightingUse different colors or fonts to highlight keywords and syntax structures in the code in the editor or IDE to enhance readability.
Intelligent CompletionAutomatically complete keywords, function names, variable names, etc., in the code based on context and known information to improve coding efficiency and accuracy.
Partial Code ExecutionRun only a part of the code instead of the entire program to quickly test small segments of code or debug errors.

Orchestration

After the task testing is completed and everything works well, submit it to the operations system for orchestration. You need to configure the scheduling properties of the task. You can click "Scheduling" in the SQL function area.

Basic Information

ParameterDescription
Person in ChargeRequired. Only one member is allowed, defaulting to the task creator. It can be modified to other members within the workspace as needed.
DescriptionOptional. You can provide a detailed description of the task for future reference and management.
Run AttributesRequired Normal Scheduling: Scheduled according to the user's set scheduling rules. Dry Run Scheduling: When a task's logic temporarily does not need to run, but you do not want to change the entire data link relationship, you can set it to dry run, and the task will be marked as successful. Pause: After setting the task to "Pause Scheduling" and publishing it to the operations center, the entire task status will be "Paused", no task instances will be generated, and data supplementation operations can be performed.
ClusterRequired. Used to define the scheduling resource group used when the task is published to the production environment for scheduling and running.
SchemaRequired. Used to define the prefix schema used when the task is published to the production environment for running.
Task PriorityOptional. Specifies the scheduling priority for Lakehouse SQL tasks, supporting 10 configurable levels (0-9) where higher numerical values indicate elevated execution precedence.
Parameter ConfigurationClick "Add Parameter" to add a new parameter. Click "Load Parameters in Code" to automatically load parameters already used in the code. In the code, parameters are referenced as: '${bizdate}', note that the quotes are English quotes. Note: More system built-in parameters will be supported in future versions.

Scheduling Time

ParameterDescription
Scheduling CycleDaily Scheduling: When selected, the scheduling will be executed daily. Monthly Specific Day Scheduling: Users can choose specific dates each month for scheduling. Weekly Specific Day Scheduling: Users can specify specific dates on a weekly basis.
Scheduling FrequencyExecute Once: When the user selects to execute once, only the start scheduling time needs to be configured. Execute Multiple Times: When the user selects to execute multiple times, the scheduling interval, start scheduling time, and end scheduling time need to be configured.
Effective TimeThe date from which the task becomes effective.
Expiry TimeThe date from which the task becomes ineffective.
Preview Scheduling TimeClick to preview the scheduling time to see the specific run times after configuration.
Parameter ConfigurationClick "Add Parameter" to add a new parameter. Click Load Parameters in Code to automatically load the parameters already used in the code. In the code, the way to reference the parameter is: '${bizdate}', note that the quotation marks here are English quotation marks. Note: More system built-in parameters will be supported in subsequent versions.

Instance Information

Studio supports two instance generation methods: Effective the next day and Effective after release.

ParameterDescription
Instance Generation MethodEffective after release: The method of instant instance conversion. After release, the task instance of the day will be generated immediately and run according to the configured schedule time. Effective the next day: The task instance will be scheduled to run the next day.
Instance Retry on ErrorInstance retry attribute, mainly considering data idempotency, to set whether the task instance can be retried. Please set as needed. Can be retried after success or failure Cannot be retried after success, can be retried after failure Cannot be retried after success or failure
Automatic Retry CountCustom retry count
Instance Timeout DurationNot enabled Custom
Preview Schedule TimeClick Preview Schedule Time to view the specific running time after the schedule is configured.
Parameter ConfigurationClick "Add Parameter" to add a new parameter. Click Load Parameters in Code to automatically load the parameters already used in the code. In the code, the way to reference the parameter is: '${bizdate}', note that the quotation marks here are English quotation marks. Note: More system built-in parameters will be supported in subsequent versions.

Scheduling Dependencies

Complex production tasks usually have upstream and downstream dependencies, such as DWD processing tasks that need to depend on ODS layer tasks. In the "Scheduling Dependencies" configuration, you can add upstream tasks that the current task depends on. The supported methods are:

Parent node's file name

Table name produced by the parent node

Task Output

There are two ways to configure the task output information:

Method 1: Automatically generate the output table name through the code mentioned above.

Method 2: Manually add by searching the table name.

Submit for Release

Click the Submit for Release button in the function bar above the editor to submit the task.

Operations and Maintenance View

Click to jump to the "Operations and Maintenance" center to view task operations and maintenance.

Running Results

Running History

The running history provides the running result data of the current task tab for the past 7 days, up to 20 entries.

Running Results

Logs

SQL jobs can be diagnosed based on logs. For details, please see JOB PROFILE. After clicking to run the task, you can view the detailed log information of the task in the log area at the bottom of the page.

You can adjust the log display area by using the expand and collapse buttons.

You can force refresh the logs using the refresh button.

Data

After the task is completed, for those with results returned (such as SELECT statements), the results will be displayed through the "Data" tab:

Others: Common Shortcut Operations

In the code editing box, the following shortcuts are currently supported to help improve editing efficiency:

Save

  • MAC: CMD + s
  • Windows: Ctrl + s

Comment or uncomment the line or code block where the cursor is located

  • MAC: CMD + /
  • Windows: Ctrl + /

Cut the line or code block where the cursor is located

  • MAC: CMD + x
  • Windows: Ctrl + x

Copy the line or code block where the cursor is located

  • MAC: CMD + c
  • Windows: Ctrl + c

Paste

  • MAC: CMD + v
  • Windows: Ctrl + v