Singdata Lakehouse MCP Server User Case

End-to-End Natural Language Conversational Approach from Data Acquisition to Visual Analytics

Can you help me create a volume? Name it NYC Green Taxi Data
The bucket is mcp-data-hangzhou, path is /nyc-green-taxi/
Download each month's data from 2025 from the NYC website into this volume
Please analyze it and generate a dashboard

Case Overview

This case demonstrates how to use Claude Desktop with Singdata Lakehouse MCP Server to complete an end-to-end data science workflow from data acquisition, storage, and analysis to visualization through natural language conversation.

Case Scenario: Analyze New York City Green Taxi 2025 operational data and build a business insights dashboard

Technologies Involved:

  • Claude Desktop (AI assistant interface)
  • Singdata Lakehouse (cloud-native data lakehouse)
  • MCP Server (Model Context Protocol service)
  • NYC TLC public dataset

Solution Highlights

Zero-Code Data Analysis

  • Describe requirements in natural language, and AI automatically executes all complex data operations
  • No need to write SQL, Python, or any other code
  • Automatically handles data download, cleaning, storage, and analysis

Enterprise-Grade Data Infrastructure

  • Cloud-native Lakehouse architecture supporting petabyte-scale data processing
  • Automatic backup and version management
  • High availability and security guarantees

Intelligent Data Insights

  • AI-driven data analysis and pattern discovery
  • Automatically generates business insights and recommendations
  • Interactive visualization dashboard

Implementation Steps

Step 1: Environment Preparation

Prerequisites:
āœ… Claude Desktop client
āœ… Singdata Lakehouse account
āœ… MCP Server configuration

Step 2: Data Acquisition and Storage

User Input:

AI Execution Process:

  1. Create Storage Space

    • Automatically create external Volume: nyc_green_taxi_data
    • Connect OSS storage: oss://mcp-data-hangzhou/nyc-green-taxi/
    • Configure auto-refresh and recursive scanning
  2. Batch Data Download

    • Intelligently identify NYC TLC official data source
    • Download Jan-Jul 2025 Parquet files
    • Automatically handle file format and compression
  3. Data Ingestion

    • Create unified data table: nyc_green_taxi_2025
    • Standardize data schema and field types
    • Add data month identifier field

Results Summary:

  • Total records: 351,612 trips
  • Time span: Jan-Jul 2025
  • Data size: approx. 8.5 MB (Parquet compressed)
  • Processing time: <5 minutes

Step 3: Intelligent Data Analysis

User Input:

AI Analysis Dimensions:

  1. Time Trend Analysis

    • Monthly trip volume and revenue changes
    • 24-hour usage patterns
    • Weekday vs. weekend comparison
  2. Spatial Distribution Analysis

    • Identify popular pickup locations
    • Discover high-value areas
    • Trip distance statistics
  3. Business Metrics Analysis

    • Fare distribution and structure
    • Payment method preferences
    • Congestion fee impact assessment
  4. User Behavior Analysis

    • Passenger count patterns
    • Trip duration distribution
    • Tipping habits !

Key Findings

Revenue Trends

  • Peak month: May revenue of $1.33 million, up 31.6% month-over-month
  • Average fare: $24.45, showing an upward trend
  • Total revenue: $8.6 million accumulated over 7 months

Time Patterns

  • Peak hours: 4-6 PM, over 25,000 trips per hour
  • Nighttime premium: Average fare of $30+ from 2-4 AM
  • Weekend effect: Weekend fares are 3.9% higher than weekdays

Geographic Distribution

  • Popular areas: Locations 74 and 75 contribute 38.7% of total trip volume
  • High-value points: Location 244 averages $35.47 per fare
  • Coverage: Primarily serves outer boroughs and Upper Manhattan

Payment Trends

  • Digital adoption rate: 74% use credit card payments
  • Cash share: 26%, showing a declining trend
  • Congestion fee: 9.6% of trips include it, averaging $0.75

Visualization Dashboard

Interface Features

  • Responsive design: Adapts to desktop, tablet, and mobile
  • Interactive charts: Supports zoom, filter, and drill-down
  • Real-time data: Configurable auto-refresh
  • Modern UI: Gradient backgrounds, animated effects

Chart Types

  • šŸ“ˆ Trend charts: Dual-axis display of monthly revenue and trip volume
  • šŸ“Š Bar charts: 24-hour usage patterns, popular location rankings
  • šŸ© Donut charts: Fare distribution, payment method breakdown
  • šŸ“‹ Metric cards: Key KPIs at a glance

Business Value

šŸš€ Efficiency Gains

  • Development time: Reduced from weeks to minutes
  • Technical barrier: Zero code, business users can directly use it
  • Iteration speed: Rapid response to changing requirements

Insight Quality

  • Multi-dimensional analysis: Full coverage of time, space, and user behavior
  • Intelligent discovery: AI identifies patterns easily overlooked by humans
  • Decision support: Direct business recommendations and predictions

Cost Optimization

  • Infrastructure: Cloud-native architecture, pay-as-you-go
  • Human resources: Reduced dependency on specialized data teams
  • Maintenance cost: Automated operations, lower management overhead

Applicable Scenarios

šŸ“Š Business Analytics

  • Sales data analysis and forecasting
  • Customer behavior insights
  • Market trend research
  • Competitive analysis

Operations Optimization

  • Supply chain efficiency analysis
  • Equipment maintenance prediction
  • Quality control monitoring
  • Cost structure optimization

Product Analytics

  • User retention analysis
  • Feature usage statistics
  • A/B test evaluation
  • Product roadmap planning

Marketing Analytics

  • Advertising effectiveness evaluation
  • Channel ROI analysis
  • User profiling
  • Precision marketing strategies

Core Tools Deep Dive

1. Claude Desktop

Role: Intelligent conversational interface and AI assistant

Core Capabilities:

  • šŸ¤– Natural language understanding: Converts user requirements into specific operational commands
  • šŸ”„ Multi-modal interaction: Supports text, charts, code, and other output formats
  • 🧠 Intelligent reasoning: Performs logical reasoning and decision-making based on context
  • šŸ“‹ Task orchestration: Automatically plans and executes complex data workflows

Application in this case:

  • Understands natural language commands such as "create a volume" and "download data"
  • Automatically plans the complete workflow from data acquisition to visualization
  • Intelligently analyzes data patterns and generates business insights
  • Creates interactive HTML dashboards

2. MCP Server (Model-Context Protocol)

Role: Bridge between AI and external systems

Core Capabilities:

  • šŸ”Œ Protocol translation: Converts AI commands into database operations
  • šŸ›”ļø Security proxy: Provides secure data access control
  • ⚔ Efficient communication: Optimizes interaction performance between AI and data systems
  • šŸ”§ Tool integration: Supports 70+ specialized data operation tools

Key MCP Tools:

šŸ“Š Data Management Tools

āœ“ create_volume - Create data storage volumes
āœ“ import_data_src - Import data from URLs
āœ“ preview_volume_data - Data preview and validation
āœ“ list_files_on_volume - File management

šŸ—ļø Table Structure and Volume Tools

āœ“ create_table - Create data tables
āœ“ desc_object - View object details
āœ“ show_object_list - List database objects
āœ“ alter_table - Modify table structure
āœ“ create_volume - Create data lake storage volumes

šŸ” Query and Analysis Tools

āœ“ read_query - Execute SELECT queries
āœ“ write_query - Execute write operations
āœ“ vector_search - Vector similarity search
āœ“ match_all - Full-text search

🌐 Data Acquisition Tools

āœ“ smart_crawl_to_volume - Intelligent web data crawling
āœ“ crawl_single_page - Single page content extraction
āœ“ import_data_from_db - Database data import

šŸ”— Connection Management Tools

āœ“ create_storage_connection - Storage connection configuration
āœ“ create_api_connection - API service connection
āœ“ switch_context - Context environment switching

3. Singdata Lakehouse

Role: Cloud-native data lakehouse integrated platform

Core Architecture:

  • šŸ—ļø Storage-compute separation: Independent scaling, cost optimization
  • šŸ“Š Multi-mode support: Batch processing, stream processing, interactive queries
  • ⚔ Query acceleration: Intelligent caching and pre-computation

Data Storage Layer:

  • Volume management: Unified data storage abstraction
  • Multi-cloud support: OSS, S3, COS, and other object storage
  • Format compatibility: Parquet, ORC, JSON, CSV, and more
  • Version control: Data lineage and time travel

Compute Engine:

  • SQL engine: Standard SQL, compatible with multiple dialects
  • Vector computing: Supports AI/ML workloads
  • Stream processing: Real-time data pipelines
  • Federated queries: Unified queries across data sources

Application in this case:

  • Created nyc_green_taxi_data volume for storing raw data
  • Established nyc_green_taxi_2025 table for structured storage
  • Executed complex multi-dimensional analysis queries
  • Supported sub-second query response for 350,000+ records

4. External Data Sources

NYC TLC (Taxi & Limousine Commission)

Data Characteristics:

  • šŸ“… Regular updates: Monthly releases, 2-month latency
  • šŸ“Š Data quality: Officially authoritative, standardized format
  • šŸ’¾ Storage format: Parquet compressed format
  • 🌐 Access method: CDN accelerated download

Data Fields (21 core fields):

• vendorid - Service provider ID
• lpep_pickup_datetime - Pickup time
• lpep_dropoff_datetime - Dropoff time
• pulocationid/dolocationid - Pickup/Dropoff location
• trip_distance - Trip distance
• fare_amount - Base fare
• total_amount - Total fare
• payment_type - Payment method
• cbd_congestion_fee - Congestion fee (new in 2025)

5. Visualization Component (Claude Desktop Artifacts)

Chart.js + Custom HTML

Chart Types:

  • šŸ“ˆ Dual-axis line charts: Monthly trends (trip volume + revenue)
  • šŸ“Š Bar charts: 24-hour patterns, popular locations
  • šŸ© Donut charts: Fare distribution, payment methods
  • šŸ“‹ Metric cards: Key KPI display

Interactive Features:

  • ✨ Hover effects: Display detailed values
  • šŸ”„ Responsive: Adapts to different screen sizes
  • šŸŽØ Animations: Smooth transition effects
  • šŸ“± Mobile-friendly: Touch optimized

Technical Architecture

Data Flow Architecture

Tool Call Examples

Actual tool call sequence in this case:

1. get_current_context() - Retrieve current environment information
2. create_volume() - Create nyc_green_taxi_data volume
3. import_data_src() Ɨ 7 - Batch download 7 months of data files
4. create_table() - Create nyc_green_taxi_2025 unified table
5. write_query() Ɨ 7 - Insert each month's data into the main table
6. read_query() Ɨ 8 - Execute multi-dimensional analysis queries
7. add_data_insight() - Save analysis insights
8. artifacts.create() - Generate visualization dashboard

Performance Optimization Features

  • Query cache: Millisecond-level response for repeated queries
  • Intelligent indexing: Automatic index creation for frequently queried fields
  • Partitioning strategy: Monthly partitioning to improve query efficiency
  • Compressed storage: Parquet format saves 90% storage space
  • Parallel processing: Concurrent multi-file download and processing

Quick Start

1. Apply for Trial

  • Visit the Singdata official website to apply for an account
  • Download and install the Claude Desktop client
  • Configure the MCP Server connection

2. Prepare Data

  • Define analysis goals and data sources
  • Prepare data access permissions
  • Plan storage structure

3. Start the Conversation

Example dialogue:
User: "I want to analyze our company's sales data"
AI: "Sure, let me help you with that analysis..."

4. Obtain Insights

  • Review automatically generated analysis reports
  • Engage in conversational deep dives with AI
  • Export results and visualizations

Summary

Singdata Lakehouse MCP Server, combined with Claude Desktop, provides enterprises with an unprecedented data analysis experience:

Core Advantages

  • Natural interaction: Complete complex data analysis through conversation
  • Zero technical barrier: Business users directly obtain insights
  • Enterprise-grade capability: Supports large-scale, high-concurrency scenarios
  • Rapid iteration: From idea to insight in minutes

Innovative Breakthroughs

  • Redefining human-machine collaboration models
  • Dramatically lowering the barrier to data analysis
  • Accelerating enterprise digital transformation
  • Unlocking the true value of data

Future Outlook

As AI technology continues to advance, this "conversational data analysis" approach will become an enterprise standard, enabling every business user to become a data analyst and letting data truly drive every business decision.


Start your intelligent data analysis journey today!