Overview

Singdata Lakehouse is a next-generation cloud lakehouse independently developed by Singdata Technology. Built on an incremental computation engine, it delivers up to 10× performance improvement over traditional open-source architectures (such as Spark), enabling full-chain, low-cost, real-time processing of massive data. The platform supports integration, storage, and computation of all data types, providing solid data infrastructure for AI innovation and helping enterprises upgrade from traditional Spark architectures to the AI era.

For enterprises with existing data lakes (OSS / S3 / COS), Singdata Lakehouse can directly mount existing object storage and federate queries over Hive, Iceberg, Delta Lake, and other formats via External Catalog — no data migration required, with high-performance SQL analytics immediately available. This is the lowest-cost path from data lake to lakehouse.

Supports seven global clouds, already live in multiple Asia-Pacific regions, and supports private deployment. Infrastructure costs reduced to 1/5–1/3 of traditional solutions, with near-zero operations overhead.

Migrating from Spark / Databricks
Migration Guide · Migration Best Practices · SQL Syntax Comparison · Spark Connector · Performance Testing
Lake Acceleration (existing data lake)
In-Place Lake Acceleration Guide · External Catalog Federation · External Tables · Object Storage Mount · Performance Testing
AI Data Infrastructure
Lakehouse AI Overview · AI Data Readiness · Vector Search · AI Gateway · Data Analytics Agent
Cloud Platforms and Deployment
Supported Cloud Platforms and Regions · Pricing and Billing

First Time Here?

① Set Up Your Account
5 minutes

Register an account, activate a service instance, complete initialization

Get Started →
② Quick Start Experience
30 minutes

Run through data ingestion, SQL queries, and Dynamic Table incremental computation

Start Experience →
③ Go Deeper by Role
As needed

Dedicated paths for data engineers, analysts, AI engineers, and administrators

Choose Your Path →

Who Am I, What Do I Want to Do

Data Integration / Data Sync
Data ingestion, CDC sync, file import, streaming writes

Studio Data Integration (visual configuration for 40+ data sources) · Real-time Sync Tasks (full-database CDC for MySQL / PG / Oracle) · Batch Sync Tasks (scheduled batch sync) · Pipe Continuous Ingestion (auto-write from object storage / Kafka) · COPY INTO (one-time file import) · Complete Data Ingestion Guide

Data Engineer
Build data pipelines, ETL processing, manage data warehouse layers

Dynamic Table Incremental Computation · Dynamic Table Overview · Real-time Data Pipeline · Studio Task Development and Scheduling · Task Parameters · CREATE TABLE Syntax Reference · SQL Syntax Reference · COPY INTO · cz-cli Command-Line Tool · TPC-DS Performance Testing

Data Analyst
SQL queries, BI connections, ad-hoc analysis

Run Your First SQL Query · Connect BI Tools · Data Analytics Agent (natural language queries) · Semantic View · Pricing and Billing · SSB Performance Testing · TPC-H Performance Testing

AI / ML Engineer
Vector search, RAG, AI functions, model invocation

AI Data Readiness · Vector Search · AI Functions (AI_COMPLETE / AI_EMBEDDING) · AI Gateway · Python SDK (SQL interface) · ZettaPark (DataFrame API)

Platform Administrator
User management, permission configuration, compute clusters, cost control

Account and Service Instance Setup · User and Permission Management · Compute Cluster Management · Pricing and Billing

AI Agent / Automation
Deterministic interface calls, semantic layer queries, automated data pipelines

cz-cli Command-Line Tool (deterministic interface, ideal for Agent calls) · Semantic View (business semantic layer, natural-language friendly) · Python SDK (SQL interface) · ZettaPark (DataFrame API) · Data Analytics Agent


Core Capabilities

Data Ingestion

40+ data sources out of the box: full-database CDC real-time sync for MySQL / PG / Oracle, Kafka streaming writes, continuous import from OSS / S3 / COS files, one-time batch import via COPY INTO.

Data Ingestion Guide · Studio Data Integration · Pipe · COPY INTO

Lakehouse Unification

Existing data lakes (OSS / S3 / COS) require no migration — mount existing object storage directly and federate queries over Hive, Iceberg, Delta Lake formats via External Catalog for high-performance SQL analytics.

External Catalog · External Volume · Lake Acceleration Guide

Incremental Computation

Define transformation logic with standard SQL. Dynamic Table automatically detects upstream changes and incrementally refreshes, replacing manual scheduling scripts to build low-latency data pipelines.

Incremental Computation Mechanism · Dynamic Table Overview · Real-time Data Pipeline

High-Performance SQL Analytics

Vectorized execution engine. Leading industry performance on TPC-DS / TPC-H / SSB benchmarks. Supports OLAP multi-dimensional analysis and ad-hoc queries — up to 10× faster than traditional Spark architectures.

Performance Testing · SQL Usage Guide · TPC-H Sample Experience

AI-Native

Vector indexes, full-text search, AI Functions (AI_COMPLETE / AI_EMBEDDING), and Semantic Views are built into the data platform. Build RAG knowledge bases and AI-enhanced analytics without external services. Data Analytics Agent supports natural language conversational data queries.

Lakehouse AI Overview · Vector Search · AI Functions · Semantic View · Data Analytics Agent

Studio and AI Agent Integration

Built-in IDE, task scheduling, data integration, data quality, and operations monitoring — one-stop data development. cz-cli provides a deterministic command interface; Semantic Views provide a business semantic layer; both support AI Agents calling data capabilities directly.

Studio User Guide · cz-cli Installation and Usage · Semantic View


What's New

Release Notes


This Section

PageDescription
Before You BeginWays to access Lakehouse: Studio, CLI, drivers and connectors
Account Signup and SetupRegister an account, activate a service instance, complete initialization
Supported Cloud PlatformsSupported cloud providers and available regions
Pricing and BillingBilling model and cost breakdown
Trial Account Quotas and LimitsResource quota limits during the trial period