Datus and Singdata Lakehouse Integration Overview

What is Datus

Datus is an open-source data engineering agent designed to build evolvable contextual environments for data systems. Datus represents a paradigm shift in data engineering: from the traditional approach of "building tables and data pipelines" to "providing domain-aware intelligent agents for analysts and business users."

CLI Quick Overview:

Web Quick Overview:

Core Components

Datus-CLI: An AI-driven command-line interface for data engineers, which can be understood as "Claude Code for data engineers." Key features include:

  • Interactive SQL Writing: Generate and optimize SQL queries through natural language
  • Subagent Building: Create domain-specific intelligent agents (subagents)
  • Context Building: Interactively build and evolve contextual knowledge for data systems

Datus-Chat: A web chatbot that provides for data analysts:

  • Multi-turn Conversations: Continuous data exploration and analysis dialogue
  • Feedback Mechanisms: Built-in feedback systems including likes, issue reporting, success cases, etc.
  • User-friendly: Optimized interface experience for non-technical users

Datus-API: A stable, accurate data service API for other agents or applications.

Technical Features

  • Multi-AI Model Support: Integrates Qwen, DeepSeek, OpenAI, Claude, and other AI models
  • Extensible Architecture: Supports MCP (Model Context Protocol) tool integration.
  • Multi-data Source Connectivity: Supports various database and data warehouse platforms.
  • Chinese Language Optimization: Specially optimized for Chinese language contexts and usage habits.

Integration Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      User Interface Layer                       │
├──────────────────────────────┬──────────────────────────────────┤
│         Datus-CLI            │         Datus-Chat               │
│      (Command Line)          │       (Web Interface)            │
│  ┌─────────────────────────┐ │  ┌─────────────────────────────┐ │
│  │ • Natural Lang Query    │ │  │ • Multi-turn Conversations  │ │
│  │ • SQL Generation        │ │  │ • Subagent Selection        │ │
│  │ • MCP Tool Invocation   │ │  │ • Feedback Mechanisms       │ │
│  └─────────────────────────┘ │  └─────────────────────────────┘ │
└──────────────────────────────┴──────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Datus Agent Core                            │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────────┐ │
│ │ AI Models   │ │ Subagents   │ │     Context Management      │ │
│ │             │ │             │ │                             │ │
│ │ • Qwen      │ │ • lakehouse │ │ • Database Schema           │ │
│ │ • DeepSeek  │ │ • mcp_agent │ │ • Query History             │ │
│ │ • OpenAI    │ │             │ │ • Embedding Vectors         │ │
│ │ • Claude    │ │             │ │ • Knowledge Base            │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
                                │
               ┌────────────────┴────────────────┐
               ▼                                 ▼
    (Datus-Singdata)                 (MCP Protocol)
┌─────────────────────────┐      ┌─────────────────────────┐
│      Data Layer         │      │    Tool Extension       │
├─────────────────────────┤      ├─────────────────────────┤
│  Singdata Lakehouse     │◄─────┤ Singdata MCP Server     │
│                         │      │                         │
│ ┌─────────────────────┐ │      │ ┌─────────────────────┐ │
│ │ • Data Storage      │ │      │ │ • Instance Mgmt     │ │
│ │ • Compute Engine    │ │      │ │ • Job Monitoring    │ │
│ │ • SQL Execution     │ │      │ │ • System Ops        │ │
│ │ • Metadata Mgmt     │ │      │ │ • Analytics         │ │
│ └─────────────────────┘ │      │ └─────────────────────┘ │
│                         │      │                         │
│ Connection:             │      │ Connection:             │
│ • Service Endpoint      │      │ • HTTP Transport        │
│ • Username/Password     │      │ • SSE Transport         │
│ • Instance/Workspace    │      │ • Tool Filtering        │
└─────────────────────────┘      └─────────────────────────┘

Architecture Description

User Interface Layer:

  • Datus-CLI: Provides a command-line interface for data engineers
  • Datus-Chat: Provides a web interface for data analysts and business users

Datus Agent Core:

  • AI Model Layer: Supports multiple large language models, allowing selection of the most suitable model based on task type
  • Subagent Management: Different intelligent agents handle different business scenarios.
  • Context Management: Maintains the knowledge graph and query context of the data system.

Data Layer:

  • Singdata Lakehouse: Provides data storage, computing, and SQL execution capabilities

Tool Extension Layer:

  • Singdata Lakehouse MCP Server: The official MCP server provided by Singdata Lakehouse, extending system capabilities through standardized protocols and offering advanced management and analysis tools

Connection Relationship Description

  1. Datus <-> Singdata Lakehouse: Connected via the Datus-Singdata connector for database connectivity, supporting SQL query execution and metadata retrieval.
  2. Datus <-> Singdata Lakehouse MCP Server: Connected via the MCP protocol, invoking advanced management and analysis tools.
  3. Singdata Lakehouse MCP Server <-> Singdata Lakehouse: The MCP Server serves as an extension service for Singdata Lakehouse, able to access and manage the underlying data platform.

Integration Value

Datus + Singdata Lakehouse

Singdata Lakehouse, as a modern data lakehouse platform, has powerful data processing and storage capabilities. After integration with Datus:

  1. Lower the Barrier to Entry: Business users can directly query and analyze massive datasets without learning SQL
  2. Improve Analysis Efficiency: Natural language queries significantly reduce the time cost of data exploration
  3. Intelligent Insights: AI-driven query optimization and result interpretation help users better understand data
  4. Chinese-friendly: Optimized for Chinese language contexts, better suited for local users' habits.

Datus + Singdata Lakehouse MCP Server

Through integration with the official Singdata Lakehouse MCP Server, system capabilities are further extended:

  1. Instance Management: Intelligently switch between different Singdata Lakehouse instances and environments
  2. Job Monitoring: Query and analyze SQL job execution history and performance metrics.
  3. System Operations: Perform system status queries and configuration management through natural language.
  4. Advanced Analytics: Utilize specialized analysis tools for deep data insights.
  5. Workflow Automation: Encapsulate complex data processing workflows as simple natural language instructions.

Use Cases

  • Data Analysts: Quickly explore and analyze business data, generate reports and insights
  • Business Users: Users without technical backgrounds can easily query the data they need
  • Data Engineers: Perform system management and job monitoring through MCP tools
  • Decision Makers: Quickly access key business metrics and trend analysis

Last updated: November 2025