Managing Files on Lakehouse Volume with Zettapark
Overview
Singdata Lakehouse provides unified management of data lake files and data warehouse tables through its abstract storage layer (Volume, Schema, and Table) and Python API. This guide demonstrates how to perform file management operations in the data lake, including uploading (PUT), downloading (GET), and listing (LIST) files.
Key Concepts:
- Volume Storage Abstraction: All data lake storage is mapped to Volume objects.
- External Volume: Managed by the customer, supporting integration with cloud storage such as Alibaba Cloud OSS, Tencent Cloud COS, AWS S3, and more.
- Internal Volume: Managed by Singdata, divided into USER VOLUME and TABLE VOLUME.
- Zettapark Python API: Provides a unified interface for file and table integration.
You can get the source code from the GitHub repository (Jupyter Notebook ipynb file).
Environment Setup
Install Dependencies
Import Libraries and Create a Session
Load connection parameters from configuration file:
Create session:
File Operations
Clean Up USER VOLUME
Before starting, clean up the USER VOLUME to ensure a clean environment:
List Files in USER VOLUME
Confirm that the user volume is empty:
Upload Files to USER VOLUME
Upload local files to different directories in the user volume based on file type:
Iterate through the local directory and upload files:
Verify Upload Results
Confirm that the files were uploaded successfully:
View and Download Files
Download Image Files
Download images from the user volume and display them:
Download image to local directory:
Open and display the image:
Close the Session
After completing operations, close the session to release resources:
Summary
Through this guide, you have learned how to:
- Manage files in the user volume using the Python API.
- Upload, download, and list files in the data lake.
- View image files and verify operation results.
Next Steps:
- Explore Volume usage to achieve seamless integration of files and tables.
- Try running federated queries on files and tables to experience the advantages of a unified Lakehouse.
