Getting Started

Choose your onboarding path by role. Most scenarios can be completed in 30 minutes.


Data Engineer

Goal: Ingest data and run through an ODS → DWD → ADS processing pipeline

Step 1 — Run through core features (30 minutes)

Lakehouse Quick Start Experience

Step 2 — Ingest your data

Data sourceRecommended approach
MySQL / PG / Oracle, real-time CDCStudio Real-time Sync Tasks
Full-database migration, multi-table syncMulti-table Real-time Sync
Object storage files (OSS / S3 / COS)Pipe Continuous Ingestion · COPY INTO
Kafka message streamsKafka Pipe
Local CSV / Excel filesUpload Local Data

Step 3 — Build your data processing pipeline

Dynamic Table Incremental Computation · Studio Task Development and Scheduling · End-to-End CDC Complete Example

Step 4 — Connect external tools

JDBC Driver · cz-cli Command Line · SQLAlchemy · Python SDK

Data Analyst

Goal: Connect to data, run SQL, use AI-assisted analysis

Step 1 — Run your first SQL query (5 minutes)

Run Your First SQL Query

Step 2 — Connect your tools

Tool typeConnection method
BI tools (FineBI / PowerBI / Tableau, etc.)JDBC Driver
Database clients (DataGrip / DBeaver / Navicat, etc.)MySQL Protocol
Python scriptsSQLAlchemy
Terminal command lineCommand-Line Client

Step 3 — Advanced analysis

Data Analytics Agent (DataGPT) · SQL Usage Guide · Experience Performance with TPC-H Sample Data

AI / ML Engineer

Goal: Build vector search, RAG knowledge bases, AI-enhanced analytics

Step 1 — Understand Lakehouse AI capabilities

Lakehouse AI Overview

Step 2 — Choose your scenario

ScenarioEntry point
Semantic search / RAG knowledge baseAI Data Readiness · Vector Search
Call LLMs in SQLAI Functions (AI_COMPLETE / AI_EMBEDDING)
Manage and switch between multiple LLM modelsAI Gateway
Natural language conversational data analysisData Analytics Agent
Python data processing + AI inferenceZettapark Quick Start

Platform Administrator

Goal: Set up accounts, assign permissions, configure environments

  1. Manage Users — Create users, assign roles
  2. Create and Use Workspaces — Workspace isolation and configuration
  3. Manage Workspace Users — Workspace-level permission management
  4. Build a Data Development Environment Using Workspaces — Set up a complete data development environment for your team
  5. Configure Monitoring and Alerting Rules — Alerts for task failures and performance anomalies

AI Agent / Automation

Goal: Call data capabilities through deterministic interfaces, build automated data pipelines

ScenarioRecommended approach
SQL execution and result retrievalcz-cli sql · Python connector
Task scheduling and triggeringcz-cli task / runs refill
Studio task development and data source managementcz-cli task create/save · Studio Task Development · Studio Data Integration
ScenarioRecommended approach
Python data read/writeZettapark · clickzetta-connector
Business semantic layer queriesSemantic View
Collaborate with a specialized data sub-agentcz-cli agent run

Quick Start by Feature

What I want to doEntry point
Experience core product features quicklyLakehouse Quick Start Experience
Understand the Studio interface layoutLakehouse Studio Tour
Upload a local CSV fileUpload Local Data
Real-time CDC sync from MySQL / PGStudio Real-time Sync Tasks
Create a scheduled sync taskCreate Sync Task to Import Data
Mount OSS / S3 / COS object storageExternal Volume
Configure ETL scheduling workflowConfigure ETL Orchestration and Scheduling
Federate queries over data lake (Hive / Iceberg)External Catalog Federation
What I want to doEntry point
Configure data quality rulesConfigure Data Quality Rules
Configure monitoring and alertingConfigure Monitoring and Alerting Rules
Experience engine performance (TPC-H)Experience Performance with TPC-H Sample Data
Write complex business analytics SQLSQL Usage Guide
Use AI to analyze data conversationallyData Analytics Agent (DataGPT)
Build vector search / RAG knowledge baseVector Search
Process data with Python (Zettapark)Zettapark Quick Start
Migrate from Spark to LakehouseMigration Guide