Singdata Lakehouse MCP Server Use Case: Cost Anomaly Analysis and Troubleshooting

Core Q&A Process Demo

User Question Sequence

👤 User: "Analyze the data in sys.information_schema.instance_usage table, check the costs for September 4th and September 5th"

🤖 MCP Server: [Auto analyze table structure] → [Generate intelligent query] → [Output cost report]
   Found: September 4 cost ¥85.77, September 5 cost ¥66.65, mainly from GP compute cluster

👤 User: "Why are the costs so high on these two days?"

🤖 MCP Server: [Historical data comparison] → [Anomaly pattern recognition] → [Trend analysis]
   Found: Normal daily cost ¥1-6, surged 15x during the anomaly period, GP cluster usage duration abnormal

👤 User: "Check which instance and which workspace's cluster"

🤖 MCP Server: [Multi-dimensional analysis] → [Responsible party identification] → [Job history tracking]
   Found: Instance 270397's quick_start workspace, caused by frequent refresh of the mcp_test_daily_summary dynamic table

Key Insights

  • Issue location time: <= '2025-09-08' GROUP BY date(measurement_start) ORDER BY usage_date
  1. Anomaly Pattern Recognition

Historical Cost Trend Analysis:

Time PeriodAverage Daily CostCost RangeStatus
Aug 25-Sep 2¥1.8¥0.99-¥6.06✅ Normal baseline
Sep 3¥25.13-⚠️ First anomaly
Sep 4-5¥76.21¥66.65-¥85.77🚨 Severe anomaly
After Sep 6¥0.73¥0.53-¥1.05✅ Returned to normal
  1. Usage Pattern Change Analysis

    SELECT 
        date(measurement_start) as usage_date,
        sku_name,
        count(*) as record_count,
        sum(measurements_consumption) as total_consumption_hours,
        avg(measurements_consumption) as avg_consumption_per_record,
        sum(total_after_discount) as total_cost
    FROM sys.information_schema.instance_usage 
    WHERE sku_category = 'compute'
    GROUP BY date(measurement_start), sku_name

GP Cluster Usage Pattern Comparison:

DateAverage Hourly ConsumptionRecord CountTotal Consumption DurationPer-Instance Cost
Sep 1-20.03-0.04 hours14-20 records0.37-0.76 hours¥1-3
Sep 40.57 hours42 records24.05 hours¥3.5
Sep 50.58 hours32 records18.71 hours¥3.5
After Sep 60.02 hours12 records0.20 hours¥0.7

Key Findings:

  • GP cluster changed from short query mode to long-running mode
  • Hourly consumption duration increased 15x (0.04 → 0.57 hours)
  • Nearly full 1-hour billing, indicating the cluster was running continuously

Step 3: Precise Responsible Party Identification

User continues to ask:

Check which instance and which workspace's cluster

MCP Server Multi-Dimensional Analysis:

  1. Instance and Workspace Cost Distribution

    SELECT 
        instance_id,
        workspace_name,
        sku_name,
        sum(total_after_discount) as total_cost
    FROM sys.information_schema.instance_usage 
    WHERE date(measurement_start) IN ('2025-09-04', '2025-09-05')
      AND sku_category = 'compute'
    GROUP BY instance_id, workspace_name, sku_name
    ORDER BY total_cost DESC

Responsible Party Analysis Results:

Instance IDWorkspaceSep 4 CostSep 5 CostAnomaly Severity
270397quick_start¥83.48¥64.77🚨 Primary responsible party
270397lakehouse_quick_experience¥0.71¥0.71✅ Normal
270397ns227206¥0.17¥0.17✅ Normal
  1. Specific Cluster Usage Details

    SELECT 
        date(measurement_start) as usage_date,
        hour(measurement_start) as usage_hour,
        measurements_consumption,
        amount
    FROM sys.information_schema.instance_usage 
    WHERE workspace_name = 'quick_start' 
      AND sku_name = 'GP Compute Cluster'
      AND amount >

GP Cluster Runtime Analysis:

Sep 4 Anomaly Pattern:

  • 00:00-23:59: 24 hours continuous runtime
  • Hourly consumption: ~1.00 hours (nearly full hourly billing)
  • Per-hour cost: Around ¥3.5

Sep 5 Anomaly Pattern:

  • 00:00-18:59: 19 hours continuous runtime
  • Cost pattern: Similarly close to full hourly billing

Key Findings:

  • Issue identified in the GP cluster of quick_start workspace
  • Cluster shows nearly 24-hour continuous runtime
  • Not a normal short query pattern

Step 4: Root Cause Tracking

MCP Server Auto Deep Investigation:

  1. Job History Analysis

    <= '2025-09-06 23:59:59'

Normal Period Job Characteristics:

  • Query execution time: seconds to minutes
  • Cluster auto-suspend: quick release after query completion
  • Cost pattern: billed by actual usage time

Root Cause Deep Analysis

Technical Analysis

1. Dynamic Table Configuration Error

-- Problematic configuration (speculated)
CREATE DYNAMIC TABLE mcp_test_daily_summary
REFRESH_INTERVAL = '5 MINUTES'  -- ❌ Too frequent
AS SELECT ...

2. Cluster Resource Management Failure

  • Normal mode: Query completes → Cluster idle → Auto-suspend (within minutes)
  • Anomaly mode: Query completes → New query after 5 minutes → Cluster stays active → Cannot suspend

3. Billing Calculation Logic

  • Hourly billing: GP cluster billed by usage duration, ¥3.5/hour
  • Minimum billing unit: Even if used for minutes, the active state accumulates billing time
  • Cumulative effect: 5-minute refresh interval → Nearly full 24-hour billing

Business Impact Assessment

1. Direct Cost Loss

  • Anomaly period: 2-day total cost ¥152.42
  • Normal period: 2-day expected cost ¥2-12
  • Direct loss: ¥140-150

2. Potential Risks

  • If not discovered in time: Daily additional cost ¥80+
  • Monthly impact: Could generate ¥2400+ additional costs
  • Resource waste: Cluster resources not effectively utilized

3. Systemic Issues

  • Configuration management: Dynamic table refresh policy lacks reasonableness checks
  • Monitoring alerts: Cost anomalies not discovered in time
  • Resource optimization: Cluster auto-management policy needs optimization

Solutions and Best Practices

Immediate Remediation Measures

1. Dynamic Table Configuration Correction

-- Check current configuration
DESC DYNAMIC TABLE mcp_test_daily_summary;

-- Option 1: Adjust refresh interval (recommended)
ALTER DYNAMIC TABLE mcp_test_daily_summary 
SET REFRESH_INTERVAL = '1 HOUR';

-- Option 2: Adjust based on business requirements
ALTER DYNAMIC TABLE mcp_test_daily_summary 
SET REFRESH_INTERVAL = '4 HOURS';  -- or other reasonable intervals

-- Option 3: Suspend dynamic table (emergency)
ALTER DYNAMIC TABLE mcp_test_daily_summary SUSPEND;

2. Cluster Configuration Optimization

-- Check cluster auto-suspend settings
SHOW VIRTUAL_CLUSTERS;

-- Ensure reasonable auto-suspend time
ALTER VIRTUAL_CLUSTER your_gp_cluster 
SET AUTO_SUSPEND_IN_SECOND = 300;  -- 5-minute auto-suspend

Preventive Measures

1. Monitoring and Alerting

-- Create cost monitoring view
CREATE VIEW daily_cost_monitor AS
SELECT 
    date(measurement_start) as cost_date,
    sum(total_after_discount) as daily_cost,
    CASE 
        WHEN sum(total_after_discount) >

**2. Dynamic Table Best Practices**

* **Refresh interval recommendation**: Set based on data update frequency, minimum no less than 30 minutes
* **Resource consideration**: Evaluate refresh impact on compute resources
* **Business alignment**: Refresh frequency should match actual business requirements

**3. Resource Management Strategy**

* **Cluster separation**: Separate scheduled tasks and interactive queries into different clusters
* **Time window**: Execute resource-intensive tasks during off-peak hours
* **Cost budget**: Set workspace-level cost budget and alerts

## Case Value and Technical Highlights

### 1. MCP Server Intelligent Analysis Capability

**Natural Language Understanding**

* No need for users to master complex SQL syntax
* Intelligently understand analysis intent and convert to precise queries
* Support multi-turn conversations, progressively dig deeper into issues

**Automated Data Analysis**

User intent: "Analyze costs" → Auto table structure identification → Intelligent query generation → Multi-dimensional data correlation → Anomaly pattern recognition → Visualized result presentation


**Context Memory Capability**

* Remember previous analysis results
* Conduct deeper investigation based on existing findings
* Maintain analytical logic continuity

### 2. Systematic Approach to Troubleshooting

**Hierarchical Analysis Framework**

1. **Overview layer**: Overall cost trend identification
2. **Decomposition layer**: Break down cost composition by dimensions
3. **Localization layer**: Precisely identify responsible parties
4. **Root cause layer**: Trace to specific technical causes

**Multi-Dimensional Correlation Analysis**

* Time dimension: Historical comparison, trend analysis
* Spatial dimension: Instance, workspace, cluster
* Business dimension: Job type, execution frequency
* Technical dimension: Resource usage, configuration parameters

### 3. Practical Value

**Rapid Response Capability**

* **Issue discovery to localization**: < 5 minutes
* **Solution output**: Immediately executable SQL commands
* **Effect verification**: Real-time review of fix results

**Economic Value**

* **Direct savings**: ¥80+/day cost savings
* **Avoid losses**: Prevent ongoing losses from long-term configuration errors
* **Efficiency improvement**: Significantly reduce manual investigation time

**Knowledge Accumulation**

* **Reusable process**: Analysis method applicable to other similar issues
* **Best practices**: Form standardized troubleshooting processes
* **Prevention guide**: Provide experience reference for other users

## Result Verification and Effectiveness Evaluation

### Resolution Effectiveness Confirmation

**Cost Recovery Verification**

```sql
-- Create cost monitoring view
CREATE VIEW daily_cost_monitor AS
SELECT 
    date(measurement_start) as cost_date,
    sum(total_after_discount) as daily_cost,
    CASE 
        WHEN sum(total_after_discount) >

**Results**:

* Sep 6: ¥1.05 ✅ Returned to normal
* Sep 7: ¥0.96 ✅ Continuously normal
* Sep 8: ¥0.53 ✅ Further optimized

**Cluster Usage Pattern Recovery**

* Average usage duration per instance: 0.02 hours (returned to normal short query mode)
* Per-instance cost: ¥0.02-0.7 (returned to reasonable range)
* Cluster auto-suspend: Working normally

### Lessons Learned

**Technical Lessons**

1. **Dynamic table design**: Refresh interval should balance business needs and cost considerations
2. **Resource monitoring**: Establish multi-level cost and resource usage monitoring
3. **Configuration management**: Important configuration changes should be evaluated and tested

**Operations Lessons**

1. **Timely discovery**: Establish automated anomaly detection mechanisms
2. **Rapid localization**: Master systematic troubleshooting methods
3. **Prevention first**: Avoid similar issues through best practices

**Business Lessons**

1. **Cost awareness**: Technical decisions should consider cost implications
2. **Requirement assessment**: Balance between functional requirements and resource consumption
3. **Continuous optimization**: Regularly review and optimize resource usage

## Summary

This real-world case perfectly demonstrates the core value of Singdata Lakehouse MCP Server in enterprise-level data platform operations:

### 🚀 Technological Innovation

* **AI-driven natural language analysis**: Making complex data analysis simple and intuitive
* **Intelligent issue diagnosis**: Automatically identify anomaly patterns and root causes
* **Multi-dimensional correlation analysis**: Quickly extract key information from massive data

### 💰 Business Value

* **Immediate cost savings**: Single case save ¥80+/day
* **Significant operations efficiency improvement**: Troubleshooting time reduced from hours to minutes
* **Risk prevention capability**: Avoid long-term losses from configuration errors

### 🛠️ Practicality

* **Zero barrier usage**: No SQL knowledge needed, natural language completes complex analysis
* **Immediately executable solutions**: Not just diagnosing issues, but providing specific solutions
* **Knowledge transfer**: Analysis processes and methods reusable for similar scenarios

Through this case, we see the enormous potential of AI technology in data platform operations. Singdata Lakehouse MCP Server is not just a tool, but an intelligent operations partner, helping users quickly locate issues, optimize costs, and improve efficiency in complex data environments.

***

*This case is based on real user scenarios with anonymized data, demonstrating the practical value of Singdata Lakehouse MCP Server in cost management, troubleshooting, and operations optimization.*



## Reference

[Instance-level Information Schema](instance-information_schema.md)

[Dynamic Table](dynamic-table.md)