Task Development and Orchestration
Overview
"Development" refers to the big data development IDE, integrating both stream and batch development, providing developers with an efficient and intelligent development environment. It supports users in task development, debugging, configuration scheduling, and submission and release operations, completing the key processes of big data aggregation, processing, and analysis.
Click "Development" in the left navigation menu to enter the main interface.
Interface Overview
No. | Function |
---|---|
1 | Function Switch |
2 | Switch Workspace: This section displays the name of the current data development project space and the region you are in. You can click the dropdown icon to switch to other project spaces. |
3 | Development Directory Tree: The directory tree is used to manage task codes in an orderly manner. You can create directory trees according to business needs to manage tasks by category and hierarchy. |
4 | SQL Compilation Area: The main operation interface for development, with different types of compilation areas having different operation interfaces. |
5 | Execution Results: Presents the results after running the operations in the SQL compilation area. |
Directory Tree
The data development module displays all tasks under the current project in the form of a directory tree, making it easy to add, delete, modify, and query tasks.
The supported operations on the directory are as follows:
-
Switch tab: Switch the type of the directory tree below by clicking the tab, i.e., task directory tree or data directory tree.
-
Task Directory Tree: The list below shows all tasks in the current space.
-
Data Directory Tree: The list below shows all data directory trees in the user's region, presented in the hierarchy of workspace-schema-table/view.
-
New: Create folders and various development tasks by clicking the + button on the directory tree.
-
Development Task Types: Real-time synchronization, offline synchronization, SQL script, Python script, shell script
-
Filter Tasks: Supports filtering tasks by development task type, submission status, and responsible person.
-
Refresh: Refresh the current directory tree information.
-
Search Tasks: Search and locate specific tasks based on task name or path.
-
Task Name: Fuzzy match keywords with "task name" to return file directories or tasks that meet the criteria.
-
Path: Fuzzy match keywords with the path where the task is located to return file directories or tasks that meet the criteria.
-
Create Subdirectory & Create Subtask: When hovering over a subdirectory, more icons will appear, allowing the following operations:
-
Current Directory is a Folder: Create any task type, new folder, delete, rename at the current level.
-
Current Directory is a Task: Open task, create a copy, copy name, rename, move, delete at the current level.
Operation Instructions:
- Create Copy: Copy the current task, including the scheduling configuration within the task.
- Copy Name: Copy the current directory name.
- Rename: Rename the directory name.
- Move: Move the directory to another directory within the project. If the directory contains subdirectories or tasks, they will be moved together.
- Delete: Delete the directory.
- Task Directory Tree: Icons distinguish task types, and colors distinguish online status.
Status Legend | Description |
---|---|
![]() | After the task is submitted, the saved version differs from the submitted version. |
![]() | After the task is submitted, the server's saved version is consistent with the submitted version. |
![]() | After submitting the task, click the offline operation. |
![]() | Other status types. |
Icon Legend | Description |
---|---|
![]() | Real-time synchronization |
![]() | Offline synchronization |
![]() | SQL script |
![]() | Shell script |
![]() | Python script |
Task Development Process
The data development module is based on the underlying engine capabilities of Lakehouse, providing various development types. Users can choose the task type for interactive data development work.
Create a New Task
Currently, two types of task nodes are supported: data integration and data development.
After clicking "New", select the specific development type, a pop-up window will appear. Enter the task name, select the specific directory level where the current task needs to be saved, and then enter the development interface.
Operation Bar Description
Task Tab
After the user clicks the task directory tree on the left or creates a new task node, the task tab will open (or locate). In the task tab bar, users can perform the following operations:
New: Users can click the "+" sign in the tab status bar to quickly create a new task, which is saved by default under the root node of the current space.
Close Tab: When the mouse hovers over the tab, a downward button appears, allowing the user to perform batch closing operations on the task tab.
Tab color status correspondence:
Status Icon | Description |
---|---|
Green | Run successfully |
Red | Run failed |
Blue | Running |
Gray | No run |
SQL Functional Area
No. | Function Name | Description |
---|---|---|
1 | Save | Save the task, including the current node code and related configurations. |
2 | Formatting | Formats the written code to make its syntax structure look concise and clear |
3 | Parameters | Not yet online. |
4 | Versions | Click on Versions to view the committed and saved versions of the current task. Supports code viewing and rollback between versions. |
5 | Scheduling | Click to pop up the sidebar window for scheduling settings. For detailed configuration, see Scheduling Settings. |
6 | Submit | The "Submit" function is only needed for tasks that require scheduled dispatch. After "Submit", the task will be submitted and published to the Operations Center, and will run according to the configured schedule. Before submission, scheduling configuration is required. For detailed configuration, see Scheduling Settings. |
7 | Operations | Click to enter the "Operations" center. |
8 | Task Flow Description | Hover the mouse over the tips of the task flow to display the online flowchart of the development task. |
9 | Cluster Filtering | The "default" shown in the figure indicates the Virtual Cluster required for task execution. You can click to switch and select other Virtual Clusters. To add a new Virtual Cluster, you can operate in "Compute". |
10 | Run/Stop | Run/Terminate the code of the current node. When running SQL code, it supports running selected parts of the code. |
SQL Editing Area
- Schema Switching: Click the dropdown box to switch the schema in the current workspace, default is public.
- Shortcuts: Various capabilities supported in the Studio development compilation area. For details, see Others: Common Shortcut Operations.
- SQL Editing Area: The SQL compiler in the Studio development function provides the following features to enhance the efficiency of data development and data analysis.
Function | Description |
---|---|
Code Folding | Collapse code blocks to reduce reading interference. |
Real-time Syntax Error Prompt | Prompt users of syntax errors found during code writing to help avoid mistakes. |
Syntax Highlighting | Use different colors or fonts to highlight keywords and syntax structures in the code in the editor or IDE to enhance readability. |
Intelligent Completion | Automatically complete keywords, function names, variable names, etc., in the code based on context and known information to improve coding efficiency and accuracy. |
Partial Code Execution | Run only a part of the code instead of the entire program to quickly test small segments of code or debug errors. |
Orchestration
After the task testing is completed and everything works well, submit it to the operations system for orchestration. You need to configure the scheduling properties of the task. You can click "Scheduling" in the SQL function area.
Basic Information
Parameter | Description |
---|---|
Person in Charge | Required. Only one member is allowed, defaulting to the task creator. It can be modified to other members within the workspace as needed. |
Description | Optional. You can provide a detailed description of the task for future reference and management. |
Run Attributes | Required Normal Scheduling: Scheduled according to the user's set scheduling rules. Dry Run Scheduling: When a task's logic temporarily does not need to run, but you do not want to change the entire data link relationship, you can set it to dry run, and the task will be marked as successful. Pause: After setting the task to "Pause Scheduling" and publishing it to the operations center, the entire task status will be "Paused", no task instances will be generated, and data supplementation operations can be performed. |
Cluster | Required. Used to define the scheduling resource group used when the task is published to the production environment for scheduling and running. |
Schema | Required. Used to define the prefix schema used when the task is published to the production environment for running. |
Task Priority | Optional. Specifies the scheduling priority for Lakehouse SQL tasks, supporting 10 configurable levels (0-9) where higher numerical values indicate elevated execution precedence. |
Parameter Configuration | Click "Add Parameter" to add a new parameter. Click "Load Parameters in Code" to automatically load parameters already used in the code. In the code, parameters are referenced as: '${bizdate}', note that the quotes are English quotes. Note: More system built-in parameters will be supported in future versions. |
Scheduling Time
Parameter | Description |
---|---|
Scheduling Cycle | Daily Scheduling: When selected, the scheduling will be executed daily. Monthly Specific Day Scheduling: Users can choose specific dates each month for scheduling. Weekly Specific Day Scheduling: Users can specify specific dates on a weekly basis. |
Scheduling Frequency | Execute Once: When the user selects to execute once, only the start scheduling time needs to be configured. Execute Multiple Times: When the user selects to execute multiple times, the scheduling interval, start scheduling time, and end scheduling time need to be configured. |
Effective Time | The date from which the task becomes effective. |
Expiry Time | The date from which the task becomes ineffective. |
Preview Scheduling Time | Click to preview the scheduling time to see the specific run times after configuration. |
Parameter Configuration | Click "Add Parameter" to add a new parameter. Click Load Parameters in Code to automatically load the parameters already used in the code. In the code, the way to reference the parameter is: '${bizdate}', note that the quotation marks here are English quotation marks. Note: More system built-in parameters will be supported in subsequent versions. |
Instance Information
Studio supports two instance generation methods: Effective the next day and Effective after release.
Parameter | Description |
---|---|
Instance Generation Method | Effective after release: The method of instant instance conversion. After release, the task instance of the day will be generated immediately and run according to the configured schedule time. Effective the next day: The task instance will be scheduled to run the next day. |
Instance Retry on Error | Instance retry attribute, mainly considering data idempotency, to set whether the task instance can be retried. Please set as needed. Can be retried after success or failure Cannot be retried after success, can be retried after failure Cannot be retried after success or failure |
Automatic Retry Count | Custom retry count |
Instance Timeout Duration | Not enabled Custom |
Preview Schedule Time | Click Preview Schedule Time to view the specific running time after the schedule is configured. |
Parameter Configuration | Click "Add Parameter" to add a new parameter. Click Load Parameters in Code to automatically load the parameters already used in the code. In the code, the way to reference the parameter is: '${bizdate}', note that the quotation marks here are English quotation marks. Note: More system built-in parameters will be supported in subsequent versions. |
Scheduling Dependencies
Complex production tasks usually have upstream and downstream dependencies, such as DWD processing tasks that need to depend on ODS layer tasks. In the "Scheduling Dependencies" configuration, you can add upstream tasks that the current task depends on. The supported methods are:
Parent node's file name
Table name produced by the parent node
Task Output
There are two ways to configure the task output information:
Method 1: Automatically generate the output table name through the code mentioned above.
Method 2: Manually add by searching the table name.
Submit for Release
Click the Submit for Release button in the function bar above the editor to submit the task.
Operations and Maintenance View
Click to jump to the "Operations and Maintenance" center to view task operations and maintenance.
Running Results
Running History
The running history provides the running result data of the current task tab for the past 7 days, up to 20 entries.
Running Results
Logs
SQL jobs can be diagnosed based on logs. For details, please see JOB PROFILE. After clicking to run the task, you can view the detailed log information of the task in the log area at the bottom of the page.
You can adjust the log display area by using the expand and collapse buttons.
You can force refresh the logs using the refresh button.
Data
After the task is completed, for those with results returned (such as SELECT statements), the results will be displayed through the "Data" tab:
Others: Common Shortcut Operations
In the code editing box, the following shortcuts are currently supported to help improve editing efficiency:
Save
- MAC: CMD + s
- Windows: Ctrl + s
Comment or uncomment the line or code block where the cursor is located
- MAC: CMD + /
- Windows: Ctrl + /
Cut the line or code block where the cursor is located
- MAC: CMD + x
- Windows: Ctrl + x
Copy the line or code block where the cursor is located
- MAC: CMD + c
- Windows: Ctrl + c
Paste
- MAC: CMD + v
- Windows: Ctrl + v