Lakehouse MCP Server Introduction and Quick Deployment
Introduction
MCP (Model Context Protocol) Server is a standardized interface protocol that allows AI assistants (such as Claude) to interact securely and controllably with external systems and tools. Through the MCP Server, AI assistants can directly access and operate various data sources, execute complex data analysis tasks, and provide more intelligent and efficient services.
Lakehouse MCP Server is an MCP server designed specifically for the Lakehouse platform. It seamlessly integrates the powerful data lakehouse capabilities of the Singdata Lakehouse with AI assistants, enabling users to interact with the data lakehouse through natural language.
Core Features
- Protocol Support: Supports three transport protocols: HTTP (Streamable), SSE, and Stdio
- Standards Compliance: Fully adheres to the official MCP specification, providing a standard
/mcpendpoint - Broad Compatibility: Supports major platforms such as Claude Desktop, Dify, n8n, and Cursor
Deployment Environment Requirements
System Requirements
- Operating System: MacOS, Windows, Linux
- Docker: Version 20.10+
- Memory: Minimum 2GB, recommended 8GB
- CPU: Minimum 2 cores, recommended 4 cores
- Storage: Minimum 10GB available space
Quick Start
This example demonstrates deployment using the HTTP (Streamable) protocol (recommended), with both Claude Desktop (MCP client) and MCP Server running on the same local machine (localhost) (this architecture also supports distributed deployment, where the client, server, and Lakehouse platform can be located on different remote hosts).

Step 0: MCP Server Configuration Preparation
Docker environment preparation: Visit https://www.docker.com/products/docker-desktop/ to download Docker Desktop for Mac.
- Verify Docker Desktop installation (MacOS environment), ensure Docker version is 20.10+
- Configure Docker Desktop:
- Allocate at least 4GB of memory to Docker
- Enable file sharing
Step 1: MCP Server Side: Pull the Latest Singdata MCP Server Image
Step 2: MCP Server Side: Create Working Directory (if it does not exist)
-
macOS:
-
Windows PowerShell:
Under the above path, create a configuration file named connections.json and add the connection information for the Lakehouse instance. The configuration template is as follows (if connecting to two Lakehouse instances, separate them with commas):
Parameter Descriptions:
| Parameter Name | Description | Example Value |
|---|---|---|
| is_default | Whether it is the default connection configuration | true |
| service | Service endpoint address, refer to the documentation | Shanghai Alibaba Cloud: cn-shanghai-alicloud.api.clickzetta.com Beijing Tencent Cloud: ap-beijing-tencentcloud.api.clickzetta.com Beijing AWS: cn-north-1-aws.api.clickzetta.com Guangzhou Tencent Cloud: ap-guangzhou-tencentcloud.api.clickzetta.com Singapore Alibaba Cloud: ap-southeast-1-alicloud.api.singdata.com Singapore AWS: ap-southeast-1-aws.api.singdata.com |
| username | Username for authentication | "your_name" |
| password | Password for authentication | "your_password" |
| instance | Instance ID, identifies a specific Lakehouse instance | "your_instanceid" |
| workspace | Workspace name, used for data isolation and organization | "your_workspacename" |
| schema | Database schema name | "public" |
| vcluster | Virtual cluster name, used for compute resource management | "default_ap" |
| description | Description of the connection configuration | "UAT environment for testing" |
| hints | Performance optimization and identification configuration object | {...} |
| hints.sdk.job.timeout | SDK job timeout (seconds) | 300 |
| hints.query_tag | Query tag, used for query tracking and identification | "mcp_uat" |
| name | Name identifier for the connection configuration | "Shanghai production env" |
| is_active | Whether the connection is in an active state | false |
| last_test_time | Timestamp of the last connection test (ISO format) | "2025-06-30T19:55:51.839166" |
| last_test_result | Result status of the last connection test | "success" |
Step 3: MCP Server Side: Start the MCP Server Image
Create a docker-compose.yml file and copy the content into it (see the appendix for the file content).
Open a terminal or command line in the directory containing this file and execute the following command.
Expected output:
Verify the status using the docker compose ps --format "table {{.Name}}\t{{.Service}}\t{{.Status}}" command. The expected output is as follows (ignore WARNING messages):
Step 4: Configure Claude Desktop
The MCP client tool chosen for this example is Claude Desktop, with the host located on the same machine as the MCP Server.
Locate and open the Claude Desktop configuration file:
macOS Steps:
- Open Finder
- Press
Cmd+Shift+G - Paste the path:
~/Library/Application Support/Claude - Double-click to open
claude_desktop_config.json(with a text editor)
Windows Steps:
- Press
Win+Rto open the Run dialog - Type
%APPDATA%\Claudeand press Enter - Right-click
claude_desktop_config.json - Select "Edit" or "Open with Notepad"
- Copy the following content into the configuration file (replace the existing content or add it to
mcpServers):
Enter the MCP Server address: if the server and client are running on the same machine, use localhost; otherwise, enter the server's IP address.
Configuration Complete!
Additionally: Claude Desktop supports connecting to the backend MCP Server through multiple methods to accommodate different deployment environments and performance requirements. The example above introduced the HTTP (Streamable) protocol connection method. If you want to use the SSE or STDIO protocol connections, configuration is also simple:
SSE Connection Method (Remote Service)
SSE (Server-Sent Events) is a long-connection technology based on HTTP that allows the server to push messages unidirectionally to the client. Compared to traditional polling, SSE enables real-time communication with lower latency.
- Applicable Scenario: Scenarios that require receiving real-time data streams or update notifications from the server.
- Docker Server Configuration Reference: This method corresponds to starting the
clickzetta-sseservice in the container, which provides services on port8003. - Configuration Example:
In Claude Desktop's
claude_desktop_config.jsonconfiguration file, update the following information to connect to the remote SSE endpoint.
Notes:
- Replace
<YOUR_SERVER_IP>with the actual IP address or domain name of the MCP Server. - The target port is
8003, and the endpoint path is/sse. - The
--transport sseparameter specifies the use of the SSE communication protocol.
STDIO Connection Method (Local Process)
This method is mainly used for local development and debugging. Claude Desktop will launch the MCP Server as a subprocess directly on the local machine and communicate through standard input/output (STDIO). This method has the lowest latency but is not suitable for remote connections.
- Applicable Scenario: Local development, single-machine deployment.
- Docker Server Configuration Reference: This method corresponds to starting the
clickzetta-stdioservice in the container. The container image will automatically start and stop as Claude Desktop opens and closes. - Configuration Example: In Claude Desktop's
claude_desktop_config.jsonconfiguration file, update the following information to directly specify the command to start the local Server.
Note:
- In the configuration file, the path after
-vinUSERNAMEshould be modified according to the actual system path. - Use
docker compose downto shut down the created containers, because in this mode, Claude Desktop will automatically start and stop the container image as it opens and closes.
Notes:
commandandargsdirectly define how to start the MCP Server locally.- No need to specify an IP address and port.
- The
--transport stdioparameter specifies the use of the STDIO communication protocol.
Getting Started
Deployment Verification
- Open Claude Desktop and send the following command in the input box:
If the connection is successful, you will see a list containing 50+ tools (note: as versions update, the specific number of tools may vary).
- Verify the WebUI Interface
- Access the following address in your browser:
http://localhost:8503, and you should see the following page:

If both steps above complete successfully, congratulations, your application has been successfully installed!
Step 2: Configure Your First Data Source (Lakehouse)
Next, let's configure a Lakehouse connection so that Claude can access your data.
-
Open Connection Manager
Access the WebUI at
http://localhost:8503, then select Connection Management from the left menu. -
Add and Fill in Connection Information
Click the Add New Connection button and accurately fill in your Lakehouse connection information (such as host, port, credentials, etc.) according to the prompts.

Test and Save
- After filling in the information, click the Test Connection button to ensure all configuration is correct and the network is accessible.
- After the test passes, click Save to complete the configuration.
Step 3: Start Your First Query
Now everything is ready! You can start interacting with your data. Try asking in Claude Desktop:
Advanced Configuration: Configure the Singdata Product Documentation Knowledge Base
This step integrates the Singdata Lakehouse product knowledge base table, building an intelligent Q&A knowledge base. After configuration, you will be able to quickly obtain official guidance and answers about Lakehouse operations through natural language questions in the MCP Client (such as Claude Desktop).
The core of this functionality leverages Embedding Services and Vector Search technologies to transform unstructured documents into a knowledge base that can be understood and retrieved by machines.
Step 1: Configure the Embedding Service
The purpose of this step is to tell the MCP system how to convert user "questions" into vectors for matching within the knowledge base.
-
In the MCP Server management interface, navigate to System Configuration from the left navigation bar.
-
In the main configuration area, select the Embedding Service tab.
-
Locate and fill in the DashScope Configuration (default) section:
- API Key: Paste your Alibaba Cloud Bailian platform API key. This is the credential for calling the model; please keep it secure.
- Vector Dimension: Enter the vector dimension output by the embedding model you selected. This value must be exactly the same as the dimension used when vectorizing the knowledge base documents. For example, the dimension of the
text-embedding-v4model shown in the screenshot is1024. - Embedding Model: Select or enter the name of the model used to convert text into vectors, such as
text-embedding-v4. - Max Text Length: Set the maximum number of text units (Tokens) the model can process at one time. If the question is too long, the excess will be ignored.

-
Click the Save Embedding Service Configuration button.
Step 2: Configure Vector Search
The purpose of this step is to tell the MCP system where and how to search the already stored document knowledge base.
-
On the System Configuration page, switch to the Vector Search tab.
-
Fill in the Vector Table Configuration section:
- Vector Table Name: Accurately enter the full table name where the document vectors are stored. The format is typically
database_name.schema_name.table_name, for exampleclickzetta_sample_data.clickzetta_doc.kb_dashscope_clickzetta_elements. - Embedding Column: Enter the column name in that table used to store the text vectors, such as
embeddings. - Content Column: Enter the column name in that table used to store the original text content, such as
text. When the system finds relevant answers, the content here will serve as the primary reference. - Other Columns: Optional. Fill in the metadata columns you wish to retrieve together, such as
file_directory,filename, which helps users trace the original source of information.
- Vector Table Name: Accurately enter the full table name where the document vectors are stored. The format is typically
-
Configure Search Parameters: Keep the defaults. If you want to make changes, refer to the instructions below.
- Distance Threshold: Set a strictness level for similarity matching. The system calculates the "distance" between the question vector and the document vectors; only documents with a distance less than this value are considered relevant. Smaller values mean stricter matching requirements. It is generally recommended to start with
0.80. - Number of Results to Return: Defines the number of the most relevant documents retrieved from the database in a single query. For example, setting it to
5means retrieving the 5 most relevant document fragments each time. - Enable Reranking: When checked, the system will perform a secondary intelligent ranking of the initially retrieved results to increase the probability that the most accurate answer appears at the top.

- Distance Threshold: Set a strictness level for similarity matching. The system calculates the "distance" between the question vector and the document vectors; only documents with a distance less than this value are considered relevant. Smaller values mean stricter matching requirements. It is generally recommended to start with
-
Click the Save Vector Search Configuration button.
Other Typical Use Cases
Please refer to the public account article: How MCP Server Empowers Lakehouse to Achieve 6 AI-Driven Data Application Scenarios
We look forward to exploring the new era of AI-driven data analysis with you!
Appendix
Contents of the docker-compose.yml file:
