Zettapark is the Python DataFrame API for Singdata Lakehouse, providing a pandas/PySpark-like interface. The Python code you write is translated into SQL and executed distributedly in Lakehouse — no manual SQL required.
This guide covers: Creating DataFrames → Basic transformations → Aggregations → Joins → Set operations → Null handling → Window functions → Reading and writing tables → Views and Dynamic Tables.
💡 When to use what:
Need DataFrame operations (pandas/PySpark-like) → Use Zettapark (this guide)
Need standard SQL execution or script automation → Use Python Connector
Need high-speed bulk writes (millions of rows) → Use BulkLoad
from clickzetta.zettapark.window import Window
data = [(1,"A",100),(2,"A",200),(3,"B",300),(4,"B",150),(5,"A",50)]
df = session.create_dataframe(data, schema=["id","category","amount"])
# Rank within group
w_rank = Window.partition_by("category").order_by(F.col("amount").desc())
# Running sum within group
w_sum = Window.partition_by("category").order_by("amount")
result = df \
.with_column("rank", F.rank().over(w_rank)) \
.with_column("running_total", F.sum("amount").over(w_sum))
result.show()
# +---+--------+------+----+-------------+
# | id|category|amount|rank|running_total|
# +---+--------+------+----+-------------+
# | 5| A| 50| 3| 50|
# | 1| A| 100| 2| 150|
# | 2| A| 200| 1| 350|
# | 4| B| 150| 2| 150|
# | 3| B| 300| 1| 450|
# +---+--------+------+----+-------------+
Reading and Writing Tables
Write to a Table
df = session.create_dataframe([(1,"Alice",100.0),(2,"Bob",200.0)], schema=["id","name","amount"])
# Overwrite (creates the table if it doesn't exist)
df.write.save_as_table("my_table", mode="overwrite")
# Append
df.write.save_as_table("my_table", mode="append")
Read from a Table
df = session.table("my_table")
df.show()
Convert to pandas DataFrame
pdf = df.to_pandas()
print(type(pdf)) # <class 'pandas.core.frame.DataFrame'>
print(pdf.head())
Views and Dynamic Tables
Temporary View (valid within the session)
df.filter(F.col("amount") > 100).create_or_replace_temp_view("high_value_orders")
# Query the temporary view with SQL
session.sql("SELECT * FROM high_value_orders").show()