ZettaPark Python SDK

ZettaPark is the Python DataFrame API for Singdata Lakehouse — you write data processing logic in a pandas-like syntax, and ZettaPark automatically translates it into SQL for distributed execution on Lakehouse, with no need to write SQL by hand.

When to use ZettaPark: Ideal for scenarios where you have existing Python/PySpark data processing code and want to migrate to Lakehouse, or when you prefer using Python control flow (loops, conditionals) to dynamically build queries.

NeedRecommended Tool
DataFrame operations, pandas/PySpark-styleZettaPark (this section)
Execute fixed SQL, script automationPython Connector
High-speed bulk writes (millions of rows)BulkLoad
ML feature engineering + model trainingZettaPark + Python ML libraries

Core Mechanism

ZettaPark uses a lazy execution model: calling methods like filter(), select(), and groupBy() only builds an execution plan — nothing runs immediately. Only when you call collect(), show(), to_pandas(), or save_as_table() does the entire plan get translated into a single SQL statement and sent to Lakehouse for execution.

The following three steps only build the plan and produce no network requests:

df = session.table("orders") df_filtered = df.filter(F.col("amount") > 100) df_grouped = df_filtered.groupBy("region").agg(F.sum("amount").alias("total"))

Calling collect() triggers execution — the entire chain is translated into a single SQL statement sent to Lakehouse:

result = df_grouped.collect()

This means complex multi-step transformations produce only one network round-trip, with computation running distributed on the Lakehouse cluster — not limited by local memory.

This Section

DocumentContent
Quick StartInstallation, establishing a session, your first DataFrame
DataFrame API Guidefilter / select / join / groupBy / window functions / reading and writing tables
Functions ReferenceQuick reference for the functions module
Data Engineering in PracticeComplete ETL workflow example
Volume and File OperationsPUT / GET files, object storage integration
Consuming Table StreamsIncremental data processing
Creating Dynamic TablesDefine auto-refreshing computed tables with Python
Feature EngineeringMachine learning feature processing
Credit Scoring in PracticeEnd-to-end case: ZettaPark + Python ML libraries