Installation and Usage Guide
Overview
This guide will help you set up and use Datus Agent from scratch to connect to Singdata Lakehouse, enabling natural language queries and intelligent data analysis. Through step-by-step configuration, you will be able to:
- Establish a connection between Datus and Singdata Lakehouse
- Configure support for multiple AI models
- Enable MCP tool integration (optional)
- Start querying and analyzing data using natural language
Requirements
- Python Version: 3.12 or higher
- Datus: 0.2.23 or higher
- Operating System: macOS, Linux, or Windows
- Singdata Lakehouse Access: Including service endpoint, user credentials, etc.
- Network Requirements: Ability to access Singdata Lakehouse API endpoints
Step 1: Create Project Directory
# Create project directory
mkdir my-lakehouse-datus
cd my-lakehouse-datus
Step 2: Create Python Virtual Environment
Choose one of the following three methods to create a virtual environment:
Method 1: Using conda (Recommended)
conda create -n lakehouse-env python=3.12
conda activate lakehouse-env
Method 2: Using virtualenv
python3.12 -m venv lakehouse-env
source lakehouse-env/bin/activate # Linux/macOS
# or
lakehouse-env\Scripts\activate # Windows
uv venv --python 3.12 lakehouse-env
source lakehouse-env/bin/activate # Linux/macOS
Step 3: Install Datus Agent Package
# Install Datus Agent
pip install datus-agent
# Datus plugin for Singdata Lakehouse
pip install datus-clickzetta
If you need the latest development version of Datus Agent:
pip install git+https://github.com/Datus-ai/Datus-agent.git
Create a .env file to store sensitive information:
# Create environment variable configuration file
touch .env
Add the following configuration to the .env file (modify according to your actual situation):
# Singdata Lakehouse Connection Configuration
CLICKZETTA_SERVICE=cn-shanghai-alicloud.api.singdata.com
CLICKZETTA_USERNAME=your_username
CLICKZETTA_PASSWORD=your_password
CLICKZETTA_INSTANCE=your_instance_id
CLICKZETTA_WORKSPACE=quick_start
CLICKZETTA_SCHEMA=mcp_demo
CLICKZETTA_VCLUSTER=default_ap
# AI Model Configuration (choose one)
# Alibaba Cloud Tongyi Qianwen (Recommended)
DASHSCOPE_API_KEY=your_dashscope_api_key
# Or DeepSeek
DEEPSEEK_API_KEY=your_deepseek_api_key
# Or OpenAI
OPENAI_API_KEY=your_openai_api_key
# Or Claude
ANTHROPIC_API_KEY=your_claude_api_key
Create the configuration directory and agent.yml configuration file:
mkdir -p conf
touch conf/agent.yml
Copy the following content into the conf/agent.yml file:
agent:
target: qwen_main # Use Tongyi Qianwen as the primary model
home: .datus
# Model configuration
models:
qwen_main:
type: qwen
vendor: aliyun
base_url: https://dashscope.aliyuncs.com/compatible-mode/v1
api_key: ${DASHSCOPE_API_KEY}
model: qwen-plus
enable_thinking: false
qwen_reasoning:
type: qwen
vendor: aliyun
base_url: https://dashscope.aliyuncs.com/compatible-mode/v1
api_key: ${DASHSCOPE_API_KEY}
model: qwen3-max
enable_thinking: true
# Alternative model configuration
deepseek_chat:
type: deepseek
vendor: deepseek
base_url: https://api.deepseek.com
api_key: ${DEEPSEEK_API_KEY}
model: deepseek-chat
# Intelligent node configuration
agentic_nodes:
lakehouse_assistant:
node_type: gensql
model: qwen_main
system_prompt: gen_sql
prompt_version: '1.0'
prompt_language: zh # Supports Chinese
max_turns: 15
tools: db_tools.*, context_search_tools.*
agent_description: Singdata Lakehouse intelligent assistant, supports natural language queries and data analysis
rules:
- Prioritize responding to users in Chinese
- Explain SQL query logic in detail
- Provide executable SQL statements
- Focus on data objects within the Singdata Lakehouse environment
# Database connection configuration
namespace:
lakehouse:
type: clickzetta
service: ${CLICKZETTA_SERVICE}
username: ${CLICKZETTA_USERNAME}
password: ${CLICKZETTA_PASSWORD}
instance: ${CLICKZETTA_INSTANCE}
workspace: ${CLICKZETTA_WORKSPACE}
schema: ${CLICKZETTA_SCHEMA}
vcluster: ${CLICKZETTA_VCLUSTER}
secure: false
# Storage configuration
storage:
embedding_device_type: cpu
document:
registry_name: sentence-transformers
model_name: all-MiniLM-L6-v2 # Lightweight embedding model
dim_size: 384
batch_size: 64
# Workflow configuration
workflow:
plan: reflection
chat_default_node: lakehouse_assistant
# Schema linking rate (affects query performance)
schema_linking_rate: medium
Step 6: Test Connection
Before starting the full system, test the database connection:
python -c "
from datus.tools.db_tools.db_manager import DBManager
from datus.configuration.agent_config import DbConfig
import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Create database configuration
db_config = DbConfig(
type='clickzetta',
service=os.getenv('CLICKZETTA_SERVICE'),
username=os.getenv('CLICKZETTA_USERNAME'),
password=os.getenv('CLICKZETTA_PASSWORD'),
instance=os.getenv('CLICKZETTA_INSTANCE'),
workspace=os.getenv('CLICKZETTA_WORKSPACE'),
schema=os.getenv('CLICKZETTA_SCHEMA'),
vcluster=os.getenv('CLICKZETTA_VCLUSTER')
)
# Test connection
namespaces = {'lakehouse': {'lakehouse': db_config}}
db_manager = DBManager(namespaces)
try:
connector = db_manager.get_conn('lakehouse', 'lakehouse')
result = connector.test_connection()
print('Singdata Lakehouse connection test successful!')
print(f'Connection result: {result}')
except Exception as e:
print(f'Connection test failed: {e}')
"
Step 7: Start Datus
Method 1: Command Line Mode
# Start interactive CLI
datus-cli --namespace lakehouse --config conf/agent.yml
Method 2: Web Mode (Recommended, supports subagent selection)
# Start Web interface, supports selecting different subagents
datus-cli --namespace lakehouse --config conf/agent.yml --web --host 0.0.0.0
# Or local access only
datus-cli --namespace lakehouse --config conf/agent.yml --web --host 127.0.0.1
After Web Mode Starts:
Interface After Successful Startup:
CLI Mode:
Initializing AI capabilities in background...
Datus - AI-powered SQL command-line interface
Type '.help' for a list of commands or '.exit' to quit.
Namespace lakehouse selected
Connected to lakehouse using database quick_start
Context: Current: database: quick_start
Type SQL statements or use ! @ . commands to interact.
Datus>
Web Mode:
- The terminal displays server startup information and access address
- Open the corresponding address in a browser to see the Web interface
- The left side of the Web interface displays a list of selectable subagents
- Click to select a subagent and start a conversation
Step 8: Start Using (Command Line Mode)

View Available Tables
Query Using Natural Language
Datus> / Show statistics for all user tables
Execute SQL Queries
Datus> SELECT * FROM your_table LIMIT 10;
Get Help
Web Mode

The Web mode startup page is shown above. If you added a SubAgent in command line mode, it will be displayed on the home page.
Entering chat content directly runs in Agent mode (MCP Tools will not be called). Selecting a specific SubAgent enables SubAgent mode for conversation, which will call MCP Tools.

Multi-Model Configuration
Use different models for different tasks:
agentic_nodes:
quick_query:
model: qwen_main # Use basic model for quick queries
# ... Other configuration
complex_analysis:
model: qwen_reasoning # Use reasoning model for complex analysis
enable_thinking: true
# ... Other configuration
FAQ
Q: Failed to connect to Singdata Lakehouse
A: Please check:
- Whether the network connection is normal
- Whether the credentials in the
.env file are correct
- Whether the Singdata Lakehouse service is accessible
- Whether parameters such as instance ID and workspace are correct
Q: AI model response is slow
A: You can try:
- Switching to a faster model (e.g.,
qwen-plus -> qwen-turbo)
- Reducing parameters such as
max_context_length
- Enabling GPU acceleration (if available)
Q: Query results are inaccurate
A: Suggestions:
- Increase
schema_linking_rate to slow for more precise schema matching
- Provide more context information in queries
- Use
.schema tablename to view the table structure before querying
Q: How to switch to a different database instance
A:
- Modify the
CLICKZETTA_* variables in the .env file
- Restart
datus-cli
- Or add multiple namespace configurations in the config
This guide was last updated: November 2025