Getting Started

Choose an onboarding path based on your role. Most starter scenarios can be completed within 30 minutes.

Data Engineer

Goal: Ingest data and complete an ODS → DWD → ADS processing pipeline

Step 1 — Try the core features (30 minutes)

Step 2 — Ingest your data

Data Source	Recommended Method
MySQL / PG / Oracle, real-time CDC	Studio Real-time Sync Tasks
Full-database migration, multiple tables at once	Multi-table Real-time Sync
Object storage (S3 / OSS / COS)	Pipe Continuous Ingestion · COPY INTO
Kafka message streams	Kafka Pipe
Local CSV / Excel files	Upload Local Data

Step 3 — Build data processing pipelines

Step 4 — Connect external tools

JDBC Driver · cz-cli Command Line · SQLAlchemy · Python SDK

Goal: Connect to data, run SQL, use AI-assisted analysis

Step 1 — Run your first SQL (5 minutes)

Step 2 — Connect your tools

Tool Type	Connection Method
FineBI / Power BI / Tableau and other BI tools	JDBC Driver
DataGrip / DBeaver / Navicat and other clients	MySQL Protocol
Python scripts	SQLAlchemy
Terminal command line	Command-Line Client

Step 3 — Advanced analysis

Goal: Build vector search, RAG knowledge bases, AI-enhanced analytics

Step 1 — Learn about Lakehouse AI capabilities

Lakehouse AI Overview

Step 2 — Choose your scenario

Scenario	Entry Point
Semantic search / RAG knowledge base	AI Data Preparation · Vector Search
Call LLMs from SQL	AI Functions (AI_COMPLETE / AI_EMBEDDING)
Manage and switch between multiple LLM models	AI Gateway
Conversational data analysis in natural language	Data Analytics Agent
ETL development, task management, and operations diagnostics in natural language	Data Engineering Agent
Python data processing + AI inference	ZettaPark Quick Start

Goal: Set up accounts, grant permissions, and configure environments

Quickly Add and Manage Users — Create users and assign roles
Quickly Create and Use Workspaces — Workspace isolation and configuration
Quickly Manage Workspace Users — Workspace-level permission management
Build a Data Development Environment with Workspaces — Set up a complete data development environment for your team
Quickly Configure and Use Monitoring and Alerting Rules — Task failure and performance anomaly alerts

Goal: Use deterministic interfaces to call data capabilities and build automated data pipelines

Scenario	Recommended Integration
SQL execution and result retrieval	cz-cli sql · Python connector
Task scheduling and triggering	cz-cli task / runs refill
Studio task development and data source management	cz-cli task create/save · Studio Task Development · Studio Data Integration

Scenario	Recommended Integration
Python data read/write	ZettaPark · clickzetta-connector
Business semantic layer queries	Semantic Views
Collaborate with specialized data sub-agents	cz-cli agent run
Browser automation Web Agent	Singclaw

What I Want to Do	Entry Point
Quickly experience core product features	Lakehouse Quick Start Experience
Learn the Studio interface layout	Lakehouse Studio Quick Tour
Upload a local CSV file	Quickly Upload and Import Local Data
Real-time CDC sync from MySQL / PG	Studio Real-time Sync Tasks
Create a scheduled sync task	Quickly Create Sync Tasks to Import Data
Mount S3 / OSS / COS object storage	External Volume
Configure ETL scheduling workflows	Quickly Configure and Schedule ETL Workflows
Run federated queries on a data lake (Hive / Iceberg)	External Catalog Federated Query
Configure data quality rules	Quickly Configure and Use Data Quality Rules
Configure monitoring and alerting	Quickly Configure and Use Monitoring and Alerting Rules
Experience engine performance (TPC-H)	Experience Engine Performance with TPC-H Sample Data
Write complex business analysis SQL	SQL Usage Guide
Use AI to analyze data conversationally	Data Analytics Agent
Use AI to develop ETL / manage tasks	Data Engineering Agent
Build vector search / RAG knowledge base	Vector Search
Process data with Python (ZettaPark)	ZettaPark Quick Start
Migrate from Spark to Lakehouse	Migration Guide