Monitoring Item Specification

This document describes the specific definitions and scope of "Events" and "Metrics" monitoring items available when configuring monitoring rules.

Event-Based Monitoring

Event NameTarget ObjectsScope Description
Scheduled Task Instance Run FailureIncluded: Periodically scheduled task instances, backfill task instances, and manual reruns of scheduled instances. Excluded: Manually run temporary instances, real-time sync task instancesAn alert event is generated when a scheduled task instance fails due to various reasons (code logic errors, timeout, system exceptions, etc.). If the scheduled task is configured with automatic retry on error, no alert events are generated during the automatic retry process; only one alert is generated at the end.
Scheduled Task Instance Completion TimeIncluded: Periodically scheduled task instances, manually rerun scheduled task instances, etc. Excluded: Manually run temporary instances, backfill task instancesAn alert event is generated when the completion time of a scheduled task exceeds the expected time point (does not finish running by the specified time).
Scheduled Task Instance Scheduling Duration (Including Wait Time)Included: Periodically scheduled task instances. Excluded: Manually run temporary instances, backfill task instances, manually rerun scheduled task instances, etc.An alert event is generated when the total duration from the planned time to the end time of a scheduled task instance exceeds the threshold. This includes the time the task instance waits for upstream tasks to complete.
Scheduled Task Run Time DelayIncluded: Periodically scheduled task instances. Excluded: Manually run temporary instances, backfill task instances, manually rerun scheduled task instances, etc.An alert event is generated when the actual start time of a scheduled task instance (when it enters running state) is delayed beyond the threshold compared to the planned time.
Task Instance Execution Duration (Excluding Wait Time)All task instances.<br>Includes periodically scheduled task instances, manually run temporary instances, backfill task instances, manually rerun scheduled task instances, etc.An alert event is generated when the pure execution time of the instance (excluding the time waiting for upstream completion) exceeds the threshold.
Dynamic Table Refresh TimeoutDynamic TablesAn alert event is generated when a dynamic table refresh operation exceeds the set timeout period.
Streaming SQL Task Execution FailureStreaming SQL (Continuous Job) tasksAn alert event is generated when the running status of a streaming SQL task changes from "Running" to "Failed".
Quality Rule Validation FailureQuality Rules + Target Table: Select tables configured with quality rules. When a rule validation fails on that table, an alert history is generated.An alert event is generated when the execution result of a data quality validation rule is "Failed".
Quality Rule Validation TimeoutQuality RulesAn alert event is generated when the execution result of a data quality validation rule is "Timeout".
Multi-Table Real-Time Sync Task Run FailureMulti-table real-time sync tasksAn alert event is generated when the status of a multi-table real-time sync task changes from "Running" to "Failed".
Multi-Table Real-Time Sync Task Table Enters BlacklistMulti-table real-time sync tasksAn alert event is generated when a table in the task is added to the blacklist due to consecutive sync failures.
Multi-Table Real-Time Sync Task Single Table Full Data Sync ExceptionMulti-table real-time sync tasksAn alert event is generated when an exception occurs during the full data synchronization of a single table in a multi-table real-time sync task.
Multi-Table Real-Time Sync Task Single Table Incremental Data Sync ExceptionMulti-table real-time sync tasksAn alert event is generated when an exception occurs during the incremental data synchronization of a single table in a multi-table real-time sync task.
Full & Incremental Integrated Sync Single Table Full Data Sync ExceptionFull & incremental integrated sync tasksAn alert event is generated when an exception occurs during the full data synchronization of a single table in a full & incremental integrated sync task.
Full & Incremental Integrated Sync Single Table Incremental Data Sync ExceptionFull & incremental integrated sync tasksAn alert event is generated when an exception occurs during the incremental data synchronization of a single table in a full & incremental integrated sync task.
Full & Incremental Integrated Sync Task Run FailureFull & incremental integrated sync tasksAn alert event is generated when the status of a full & incremental integrated sync task changes from "Running" to "Failed".
Full & Incremental Integrated Sync Task Target Table Schema Change FailureFull & incremental integrated sync tasksAn alert event is generated when automatic schema evolution of the target table encounters an error.

Metric-Based Monitoring

Metric NameTarget ObjectsScope Description
Dynamic Table Refresh FailureDynamic TablesNumber of dynamic table refresh operation failures
Full & Incremental Integrated Sync Task DelayFull & incremental integrated real-time sync tasksTime delay of data synchronization
Full & Incremental Integrated Sync Task Single Table Sync FailureFull & incremental integrated real-time sync tasksStatistics on single table synchronization failure count
Multi-Table Real-Time Sync Task DelayMulti-table real-time sync tasksOverall delay of multi-table real-time synchronization
Multi-Table Real-Time Sync Task Job FailoverMulti-table real-time sync tasksNumber of failover occurrences in multi-table sync tasks
Multi-Table Real-Time Sync Task Read Position DelayMulti-table real-time sync tasksDelay of data read position relative to the source
Multi-Table Real-Time Sync Task Source Database Read DelayMulti-table real-time sync tasksDifference between source database data generation rate and synchronization rate
Multi-Table Real-Time Sync Task Data Sync Status ExceptionMulti-table real-time sync tasksMonitoring of abnormal conditions in data synchronization status
Queued Jobs CountSingle SQL JobSampled every 5 seconds. Counts all SQL jobs whose status is “queued” at each sampling point.
Average Job Queue TimeSingle SQL JobSingle SQL Job Sampled every 5 seconds. Calculates the tp90 job queue time within each 5-second window.
General-Purpose / Integration Vcluster LoadVirtual ClusterAt each sampling point, computes the ratio of the compute resources currently processing tasks to the maximum compute resources supported by that vcluster size.
Analytics Vcluster LoadVirtual ClusterAt each sampling point, computes the ratio of the current job concurrency to the vcluster’s maximum supported concurrency (calculated as max replicas × max concurrency per replica).