Analyzing Usage with the information_schema Job History View

Overview

This guide helps you use the sys.information_schema.job_history table to analyze Singdata system usage, understand resource consumption patterns, identify performance bottlenecks, and discover optimization opportunities. All analysis is based on SQL queries, requiring no additional tools.

Data Source Introduction

Primary Analysis Table

  • Table Name: sys.information_schema.job_history
  • Purpose: Records the execution history of all jobs in the system
  • Permissions: Requires query permissions on sys.information_schema

Key Field Descriptions

Field NameData TypeDescription
workspace_nameStringWorkspace name
virtual_clusterStringVirtual cluster name
job_idStringUnique job identifier
execution_timeFloatJob execution time (seconds)
start_timeTimestampJob start time
input_tablesStringInput table information (JSON format)
input_bytesStringNumber of bytes read
cache_hitStringNumber of cache hit bytes
statusStringJob execution status

Analysis Goals and Methods

Analysis Goals

  1. Resource Usage Analysis: Identify the busiest workspaces and virtual clusters
  2. Data Access Analysis: Find the most frequently accessed tables and data read patterns
  3. Performance Optimization Analysis: Evaluate cache hit rates and query efficiency
  4. Capacity Planning Analysis: Provide data support for resource scaling
  • Daily Monitoring: Last 7 days of data
  • Periodic Analysis: Last 30 days of data
  • Long-term Trends: Last 90 days of data

1. Workspace and Virtual Cluster Activity Analysis

Analysis Purpose

Identify the busiest workspaces and virtual clusters in the system, providing a basis for resource allocation and capacity planning.

1.1 Workspace Activity Analysis

Query Goal: Sort by total execution time to find the busiest workspaces

-- Workspace activity statistics (last 30 days)
SELECT 
    workspace_name,
    COUNT(*) as job_count,                    -- Job count
    SUM(execution_time) as total_execution_time,  -- Total execution time
    AVG(execution_time) as avg_execution_time,    -- Average execution time
    SUM(CASE WHEN status = 'SUCCESS' THEN 1 ELSE 0 END) as success_jobs,  -- Successful jobs
    SUM(CASE WHEN status = 'FAILED' THEN 1 ELSE 0 END) as failed_jobs,    -- Failed jobs
    ROUND(SUM(CASE WHEN status = 'SUCCESS' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) as success_rate -- Success rate
FROM sys.information_schema.job_history 
WHERE start_time >= CURRENT_DATE() - INTERVAL 30 DAY
GROUP BY workspace_name
ORDER BY total_execution_time DESC;

1.2 Virtual Cluster Activity Analysis

Query Goal: Analyze the workload distribution across virtual clusters

-- Virtual Cluster activity statistics (last 30 days)
SELECT 
    virtual_cluster,
    COUNT(*) as job_count,
    SUM(execution_time) as total_execution_time,
    AVG(execution_time) as avg_execution_time,
    MIN(execution_time) as min_execution_time,
    MAX(execution_time) as max_execution_time
FROM sys.information_schema.job_history 
WHERE start_time >= CURRENT_DATE() - INTERVAL 30 DAY
  AND virtual_cluster IS NOT NULL
GROUP BY virtual_cluster
ORDER BY total_execution_time DESC;

Result Example:

Virtual Cluster NameJob CountTotal Execution Time (s)Avg Execution Time (s)Min Execution Time (s)Max Execution Time (s)
MET***_ETL_GP36,695996,551.8927.160.005745.531
DEFAULT338,797558,213.831.650.0063,825.289
CUS***_BILLING531,01445,493.620.090.003165.597
BI_ANALYSE49,1281,725.920.040.003104.061
VC_***_CAL80373.294.670.00760.184
MY_FIRST_VC140.650.050.0110.097
MY_SECOND_VC40.120.030.0150.072

1.3 Analyzing Workload by Time Period

Query Goal: Understand system load across different time periods

-- Job distribution statistics by hour
SELECT 
    HOUR(start_time) as hour_of_day,
    COUNT(*) as job_count,
    SUM(execution_time) as total_execution_time,
    AVG(execution_time) as avg_execution_time
FROM sys.information_schema.job_history 
WHERE start_time >= CURRENT_DATE() - INTERVAL 7 DAY
GROUP BY HOUR(start_time)
ORDER BY hour_of_day;

Result Example:

HourJob CountTotal Execution Time (s)Avg Execution Time (s)
024,18918,479.990.76
123,82311,243.610.47
217,72112,227.460.69
319,74628,425.321.44
424,53512,300.860.50
828,22418,066.540.64
920,44327,761.991.36
1525,00429,525.281.18
1820,34329,472.921.45
2317,46111,217.910.64
-- Job distribution statistics by day of week
SELECT 
    DAYOFWEEK(start_time) as day_of_week,
    CASE DAYOFWEEK(start_time)
        WHEN 1 THEN 'Sunday'
        WHEN 2 THEN 'Monday'
        WHEN 3 THEN 'Tuesday'
        WHEN 4 THEN 'Wednesday'
        WHEN 5 THEN 'Thursday'
        WHEN 6 THEN 'Friday'
        WHEN 7 THEN 'Saturday'
    END as day_name,
    COUNT(*) as job_count,
    SUM(execution_time) as total_execution_time
FROM sys.information_schema.job_history 
WHERE start_time >= CURRENT_DATE() - INTERVAL 30 DAY
GROUP BY DAYOFWEEK(start_time)
ORDER BY day_of_week;

Result Example:

Day NumberDay NameJob CountTotal Execution Time (s)
1Sunday86,383162,597.21
2Monday103,041172,924.33
3Tuesday158,431276,514.79
4Wednesday208,982322,615.64
5Thursday174,951278,444.13
6Friday143,648238,794.32
7Saturday80,380150,478.52

2. Table Usage Statistics Analysis

Analysis Purpose

Identify the most frequently accessed tables, analyze data read patterns, and provide guidance for table optimization and indexing strategies.

2.1 Most Frequently Accessed Tables

Query Goal: Find the tables with the highest access frequency

-- Parse input_tables JSON and calculate table access statistics
SELECT 
    GET_JSON_OBJECT(input_tables, '$.table[0].tableName') as table_name,
    CONCAT(
        GET_JSON_OBJECT(input_tables, '$.table[0].namespace[0]'), 
        '.', 
        GET_JSON_OBJECT(input_tables, '$.table[0].namespace[1]')
    ) as schema_name,
    COUNT(*) as access_count,
    SUM(CAST(input_bytes AS BIGINT)) as total_bytes_read,
    AVG(CAST(input_bytes AS BIGINT)) as avg_bytes_per_access,
    SUM(CAST(GET_JSON_OBJECT(input_tables, '$.table[0].record') AS BIGINT)) as total_records_read
FROM sys.information_schema.job_history 
WHERE start_time >= CURRENT_DATE() - INTERVAL 30 DAY
  AND input_tables IS NOT NULL 
  AND input_tables != ''
  AND input_tables != '{"table":[]}'
  AND input_bytes > 0
GROUP BY 
    GET_JSON_OBJECT(input_tables, '$.table[0].tableName'),
    CONCAT(
        GET_JSON_OBJECT(input_tables, '$.table[0].namespace[0]'), 
        '.', 
        GET_JSON_OBJECT(input_tables, '$.table[0].namespace[1]')
    )
HAVING table_name IS NOT NULL
ORDER BY access_count DESC
LIMIT 20;

Result Example:

Table NameSchema NameAccess CountTotal Bytes ReadAvg Bytes Per AccessTotal Records Read
bil***_summary_mvmet_bill.bil_mv662,7147,815,536,374,23111,793,230521,718,965,089
vc_***_calculatemet***_bill.public65,837164,257,938,0612,494,9186,127,770,647
met***_events_allmet***_bill.raw8,78711,177,614,832,7141,272,063,000527,117,351,038
cli****gateway**_log_beginsto***_metering.public8,779110,104,760,84212,541,830198,025,739
sku_categorymet***_bill.sku_meta3,8531,734,507,214,974450,170,6001,029,852
bil***_compute_detail_mvmet_bill.bil_mv2,92897,644,902,29633,348,6706,685,089,232
vc_bil***_without_zd_detail_mvmet_bill.bil_mv1,473227,399,306,618154,378,3008,328,596,693
met***_details_allmet***_bill.raw1,4054,312,047,296,0073,069,073,000339,604,011,874
mv_vc_met***_detailsmet***_bill.public1,1858,165,515,4646,890,730856,688,041
sto_***oss_bil**_detail_mvmet_bill.bil_mv7484,350,578,5515,816,281945,398,941

2.2 Top Tables by Data Read Volume

Query Goal: Find the tables with the largest data read volumes

-- Table statistics sorted by data read volume
SELECT 
    GET_JSON_OBJECT(input_tables, '$.table[0].tableName') as table_name,
    CONCAT(
        GET_JSON_OBJECT(input_tables, '$.table[0].namespace[0]'), 
        '.', 
        GET_JSON_OBJECT(input_tables, '$.table[0].namespace[1]')
    ) as schema_name,
    COUNT(*) as access_count,
    SUM(CAST(input_bytes AS BIGINT)) as total_bytes_read,
    SUM(CAST(input_bytes AS BIGINT)) / 1024 / 1024 / 1024 as total_gb_read,
    AVG(CAST(input_bytes AS BIGINT)) / 1024 / 1024 as avg_mb_per_access
FROM sys.information_schema.job_history 
WHERE start_time >= CURRENT_DATE() - INTERVAL 30 DAY
  AND input_tables IS NOT NULL 
  AND input_tables != ''
  AND input_tables != '{"table":[]}'
  AND input_bytes > 0
GROUP BY 1, 2
HAVING table_name IS NOT NULL
ORDER BY total_bytes_read DESC
LIMIT 20;

Result Example:

Table NameSchema NameAccess CountTotal Bytes ReadTotal Read (GB)Avg Per Access (MB)
met***_events_allmet***_bill.raw8,78711,177,614,832,71410,409.971,213.13
bil***_summary_mvmet_bill.bil_mv662,7147,815,536,374,2317,278.7911.25
met***_details_allmet***_bill.raw1,4054,312,047,296,0074,015.912,926.90
sku_categorymet***_bill.sku_meta3,8531,734,507,214,9741,615.39429.32
dwd_cz_jobssys_meta_warehouse.inf_schema35387,223,640,942360.6310,551.01
vc_met***_detailsmet***_bill.public743371,186,266,727345.69476.43
vc_bil***_without_zd_detail_mvmet_bill.bil_mv1,473227,399,306,618211.78147.23
vc_***_calculatemet***_bill.public65,837164,257,938,061152.982.38
dim_stu***_instance_dmin_fmet_bill.stu_dw_tenant405130,022,636,178121.09306.17
ins***_account_mappingmet***_bill.public730118,682,911,967110.53155.05

2.3 Table Access Time Distribution Analysis

Query Goal: Analyze the time patterns of table access

-- Analyze access time distribution for major tables
WITH top_tables AS (
    SELECT GET_JSON_OBJECT(input_tables, '$.table[0].tableName') as table_name
    FROM sys.information_schema.job_history 
    WHERE start_time >= CURRENT_DATE() - INTERVAL 30 DAY
      AND input_tables IS NOT NULL 
      AND input_tables != '{"table":[]}'
    GROUP BY 1
    ORDER BY COUNT(*) DESC
    LIMIT 5
)
SELECT 
    GET_JSON_OBJECT(h.input_tables, '$.table[0].tableName') as table_name,
    HOUR(h.start_time) as hour_of_day,
    COUNT(*) as access_count,
    SUM(CAST(h.input_bytes AS BIGINT)) / 1024 / 1024 as total_mb_read
FROM sys.information_schema.job_history h
JOIN top_tables t ON GET_JSON_OBJECT(h.input_tables, '$.table[0].tableName') = t.table_name
WHERE h.start_time >= CURRENT_DATE() - INTERVAL 7 DAY
GROUP BY 1, 2
ORDER BY table_name, hour_of_day;

Result Example:

Table NameHourAccess CountTotal Read (MB)
bil***_summary_mv021,826238,651.21
bil***_summary_mv122,557251,626.92
bil***_summary_mv215,100173,747.51
bil***_summary_mv318,436216,057.70
bil***_summary_mv422,117249,271.86
bil***_summary_mv824,900286,801.29
bil***_summary_mv917,682207,026.35
bil***_summary_mv1519,234225,847.45
bil***_summary_mv1816,891198,234.12
bil***_summary_mv2314,567167,432.89

3. Cache Hit Rate Analysis

Analysis Purpose

Evaluate system cache efficiency, identify cache optimization opportunities, and improve query performance.

3.1 Overall Cache Hit Rate

Query Goal: Calculate the system-wide cache hit rate

-- System-wide cache hit rate statistics
SELECT 
    CASE 
        WHEN cache_hit = '0' OR cache_hit IS NULL THEN 'Cache Miss'
        ELSE 'Cache Hit'
    END as cache_status,
    COUNT(*) as job_count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage,
    SUM(execution_time) as total_execution_time,
    AVG(execution_time) as avg_execution_time
FROM sys.information_schema.job_history 
WHERE start_time >= CURRENT_DATE() - INTERVAL 30 DAY
GROUP BY 1
ORDER BY job_count DESC;

Result Example:

Cache StatusJob CountPercentage (%)Total Execution Time (s)Avg Execution Time (s)
Cache Hit738,78477.29883,488.521.20
Cache Miss217,03222.71718,880.423.31

3.2 Cache Hit Rate by Workspace

Query Goal: Compare cache usage effectiveness across different workspaces

-- Cache hit rate analysis by workspace
SELECT 
    workspace_name,
    SUM(CASE WHEN cache_hit != '0' AND cache_hit IS NOT NULL THEN 1 ELSE 0 END) as cache_hit_jobs,
    SUM(CASE WHEN cache_hit = '0' OR cache_hit IS NULL THEN 1 ELSE 0 END) as cache_miss_jobs,
    COUNT(*) as total_jobs,
    ROUND(SUM(CASE WHEN cache_hit != '0' AND cache_hit IS NOT NULL THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) as cache_hit_rate,
    SUM(CAST(cache_hit AS BIGINT)) / 1024 / 1024 / 1024 as total_cache_gb
FROM sys.information_schema.job_history 
WHERE start_time >= CURRENT_DATE() - INTERVAL 30 DAY
GROUP BY workspace_name
ORDER BY cache_hit_rate DESC;

Result Example:

Workspace NameCache Hit JobsCache Miss JobsTotal JobsCache Hit Rate (%)Total Cache (GB)
met***_n_bill732,157136,263868,42084.3112,336.17
sto***_metering6,29029,08235,37217.7836.20
cos***_analyse33751,66452,0010.6598.62
qui***_ws018180.000.00
cli***_sample_data0110.000.00
dev_envirment0440.000.00

3.3 Cache Hit Rate Trend Analysis

Query Goal: Observe cache hit rate trends over time

-- Daily cache hit rate trend statistics
SELECT 
    DATE(start_time) as date,
    SUM(CASE WHEN cache_hit != '0' AND cache_hit IS NOT NULL THEN 1 ELSE 0 END) as cache_hit_jobs,
    COUNT(*) as total_jobs,
    ROUND(SUM(CASE WHEN cache_hit != '0' AND cache_hit IS NOT NULL THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) as cache_hit_rate
FROM sys.information_schema.job_history 
WHERE start_time >= CURRENT_DATE() - INTERVAL 30 DAY
GROUP BY DATE(start_time)
ORDER BY date;

Result Example:

DateCache Hit JobsTotal JobsCache Hit Rate (%)
2025-04-2320,14526,83475.08
2025-04-2422,56728,90178.09
2025-04-2524,12331,24577.21
2025-04-2625,89033,12778.15
2025-04-2723,45630,23477.57
2025-04-2821,78928,56776.27
2025-04-2926,23434,12376.88
2025-04-3024,56731,89077.04
2025-05-0122,89029,56777.42
2025-05-0225,12332,45677.40

4. Performance Issue Diagnostic Queries

4.1 Long-Running Jobs

Query Goal: Identify jobs with abnormally long execution times

-- Find long-running jobs
SELECT 
    job_id,
    workspace_name,
    virtual_cluster,
    job_type,
    execution_time,
    start_time,
    end_time,
    status,
    LEFT(job_text, 100) as job_text_preview
FROM sys.information_schema.job_history 
WHERE start_time >= CURRENT_DATE() - INTERVAL 7 DAY
  AND execution_time > 300  -- Jobs running longer than 5 minutes
ORDER BY execution_time DESC
LIMIT 50;

Result Example:

Job IDWorkspace NameVirtual ClusterJob TypeExecution Time (s)Start TimeStatusJob Preview
202505***96423met***_n_billMET***_ETL_GPSELECT3,825.292025-05-22 03:15:23SUCCESSSELECT SUM(CAST(record_count AS BIGINT)) as total_records, SUM(CAST(data_size AS BIGINT))...
202505***84521met***_n_billMET***_ETL_GPSELECT2,456.782025-05-21 15:42:11SUCCESSWITH billing_data AS (SELECT workspace_id, SUM(compute_time) FROM billing_summary...
202505***73941met***_n_billDEFAULTINSERT1,923.452025-05-20 09:33:47SUCCESSINSERT INTO meter SELECT event_id, workspace_id, timestamp, event_type...
202505***62847sto***_meteringDEFAULTSELECT1,567.232025-05-19 14:28:36FAILEDSELECT storage_type, bucket_name, SUM(storage_size) FROM sto***_usage WHERE date...
202505***51238met***_n_billBI_ANALYSESELECT1,234.562025-05-18 11:17:29SUCCESSSELECT DATE_TRUNC('hour', start_time) as hour, COUNT(*) as job_count FROM job_his...

4.2 Failed Job Analysis

Query Goal: Analyze patterns and causes of job failures

-- Failed job statistics and analysis
SELECT 
    workspace_name,
    virtual_cluster,
    job_type,
    COUNT(*) as failed_count,
    COUNT(*) * 100.0 / SUM(COUNT(*)) OVER() as failure_percentage,
    LEFT(error_message, 100) as common_error
FROM sys.information_schema.job_history 
WHERE start_time >= CURRENT_DATE() - INTERVAL 7 DAY
  AND status = 'FAILED'
GROUP BY workspace_name, virtual_cluster, job_type, LEFT(error_message, 100)
ORDER BY failed_count DESC
LIMIT 20;

Result Example:

Workspace NameVirtual ClusterJob TypeFailed CountFailure Percentage (%)Common Error
cos***_analyseBI_ANALYSESELECT1,24545.67CZLH-40000 Table 'cost_data.billing_temp' doesn't exist
met***_n_billDEFAULTINSERT56720.82CZLH-42000 Duplicate key error: PRIMARY KEY constraint violated
sto***_meteringDEFAULTSELECT2348.59CZLH-42000 Semantic analysis exception - cannot resolve column
met***_n_billMET***_ETL_GPUPDATE1565.73CZLH-41000 Lock timeout: Table locked by another transaction
cos***_analyseBI_ANALYSEDELETE893.27CZLH-43000 Syntax error: Invalid column reference 'unknown_column'

4.3 Top Resource-Consuming Jobs

Query Goal: Find the job types with the highest resource consumption

-- High resource consumption job analysis
SELECT 
    job_type,
    workspace_name,
    COUNT(*) as job_count,
    SUM(execution_time) as total_execution_time,
    AVG(execution_time) as avg_execution_time,
    SUM(CAST(input_bytes AS BIGINT)) / 1024 / 1024 / 1024 as total_input_gb,
    AVG(CAST(input_bytes AS BIGINT)) / 1024 / 1024 as avg_input_mb
FROM sys.information_schema.job_history 
WHERE start_time >= CURRENT_DATE() - INTERVAL 30 DAY
  AND input_bytes > 0
GROUP BY job_type, workspace_name
ORDER BY total_execution_time DESC
LIMIT 20;

Result Example:

Job TypeWorkspace NameJob CountTotal Execution Time (s)Avg Execution Time (s)Total Input (GB)Avg Input (MB)
SELECTmet***_n_bill345,6781,234,567.893.5715,234.5645.67
INSERTmet***_n_bill67,890456,789.126.738,901.23135.45
UPDATEmet***_n_bill12,345234,567.8919.013,456.78289.34
DELETEcos***_analyse8,901123,456.7813.871,234.56142.78
CREATEsto***_metering2,34556,789.1224.21567.89249.12

5. Practical Analysis Templates

5.1 Daily Monitoring Report

-- Daily system health report
SELECT 
    'Overall Overview' as metric_category,
    'Jobs Total' as metric_name,
    CAST(COUNT(*) AS STRING) as metric_value
FROM sys.information_schema.job_history 
WHERE DATE(start_time) = CURRENT_DATE() - INTERVAL 1 DAY

UNION ALL

SELECT 
    'Overall Overview',
    'Execution Time (Hours)',
    CAST(ROUND(SUM(execution_time) / 3600, 2) AS STRING)
FROM sys.information_schema.job_history 
WHERE DATE(start_time) = CURRENT_DATE() - INTERVAL 1 DAY

UNION ALL

SELECT 
    'Overall Overview',
    'Success Rate (%)',
    CAST(ROUND(SUM(CASE WHEN status = 'SUCCESS' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) AS STRING)
FROM sys.information_schema.job_history 
WHERE DATE(start_time) = CURRENT_DATE() - INTERVAL 1 DAY

ORDER BY metric_category, metric_name;

Result Example:

Metric CategoryMetric NameMetric Value
Overall OverviewExecution Time (Hours)427.35
Overall OverviewJobs Total34,567
Overall OverviewSuccess Rate (%)97.85

5.2 Resource Usage Assessment

-- Resource usage assessment query
WITH resource_summary AS (
    SELECT 
        workspace_name,
        COUNT(*) as jobs,
        SUM(execution_time) as total_time,
        SUM(CAST(input_bytes AS BIGINT)) as total_bytes
    FROM sys.information_schema.job_history 
    WHERE start_time >= CURRENT_DATE() - INTERVAL 30 DAY
    GROUP BY workspace_name
)
SELECT 
    workspace_name,
    jobs,
    ROUND(total_time / 3600, 2) as total_hours,
    ROUND(total_bytes / 1024 / 1024 / 1024, 2) as total_gb,
    ROUND(jobs * 100.0 / SUM(jobs) OVER(), 2) as job_percentage,
    ROUND(total_time * 100.0 / SUM(total_time) OVER(), 2) as time_percentage
FROM resource_summary
ORDER BY total_time DESC;

Result Example:

Workspace NameJob CountTotal HoursTotal Data (GB)Job Percentage (%)Time Percentage (%)
met***_n_bill868,420430.5424,567.8990.8596.74
sto***_metering35,37213.011,234.563.702.93
cos***_analyse52,0011.55567.235.440.35
qui***_ws180.000.010.000.00
dev_envirment40.000.020.000.00
cli***_sample_data10.000.000.000.00

Analysis Frequency Recommendations

  • Daily Monitoring: Execute the overall overview and failed job analysis
  • Weekly Analysis: Run the full activity and table usage analysis
  • Monthly Assessment: Conduct cache efficiency and resource planning analysis

It is recommended to save frequently used queries as views for easy reuse.

Optimization Action Guide

  • High Execution Time Jobs: Check for SQL optimization opportunities; try using Dynamic Table incremental computation pipelines to reduce computation volume and execution time
  • Low Cache Hit Rate: Adjust the auto-suspend time for analytics compute clusters; avoid shutting down during peak query periods to prevent cache loss
  • High-Frequency Access Tables: Consider partitioning and index optimization
  • Resource Imbalance: Redistribute compute cluster resource specifications within workspaces; for frequently used compute clusters, consider appropriate scaling if you want to reduce job execution time