ZettaPark Python SDK
ZettaPark is the Python DataFrame API for Singdata Lakehouse — you write data processing logic in a pandas-like syntax, and ZettaPark automatically translates it into SQL for distributed execution on Lakehouse, with no need to write SQL by hand.
When to use ZettaPark: Ideal for scenarios where you have existing Python/PySpark data processing code and want to migrate to Lakehouse, or when you prefer using Python control flow (loops, conditionals) to dynamically build queries.
| Need | Recommended Tool |
|---|---|
| DataFrame operations, pandas/PySpark-style | ZettaPark (this section) |
| Execute fixed SQL, script automation | Python Connector |
| High-speed bulk writes (millions of rows) | BulkLoad |
| ML feature engineering + model training | ZettaPark + Python ML libraries |
Core Mechanism
ZettaPark uses a lazy execution model: calling methods like filter(), select(), and groupBy() only builds an execution plan — nothing runs immediately. Only when you call collect(), show(), to_pandas(), or save_as_table() does the entire plan get translated into a single SQL statement and sent to Lakehouse for execution.
The following three steps only build the plan and produce no network requests:
Calling collect() triggers execution — the entire chain is translated into a single SQL statement sent to Lakehouse:
This means complex multi-step transformations produce only one network round-trip, with computation running distributed on the Lakehouse cluster — not limited by local memory.
This Section
| Document | Content |
|---|---|
| Quick Start | Installation, establishing a session, your first DataFrame |
| DataFrame API Guide | filter / select / join / groupBy / window functions / reading and writing tables |
| Functions Reference | Quick reference for the functions module |
| Data Engineering in Practice | Complete ETL workflow example |
| Volume and File Operations | PUT / GET files, object storage integration |
| Consuming Table Streams | Incremental data processing |
| Creating Dynamic Tables | Define auto-refreshing computed tables with Python |
| Feature Engineering | Machine learning feature processing |
| Credit Scoring in Practice | End-to-end case: ZettaPark + Python ML libraries |
