Managing Files on Datalake Volume with Zettapark
—— A Guide to File Operations in Singdata Lakehouse
1. Overview
Singdata Lakehouse provides unified management of data lake files and data warehouse tables through its abstract storage layer (Volume, Schema and Table) and Python API. This guide demonstrates how to perform file management operations in the data lake, including uploading (PUT), downloading (GET), and listing (LIST) files.
Key Concepts:
- Volume Storage Abstraction: All data lake storage is mapped to Volume objects.
- External Volume: Managed by customers, supporting integration with cloud storage like AWS S3 and Alibaba Cloud OSS.
- Internal Volume: Managed by Singdata, divided into USER VOLUME and TABLE VOLUME.
- Zettapark Python API: Provides a unified interface for seamless integration of files and tables.
You could Get the Source Code(Jupyter Notebook ipynb file) From Github Repository.
2. Environment Setup
1. Install Dependencies
2. Import Libraries and Create a Session
3. File Operations
1. Clean Up USER VOLUME
Before starting, clean up the USER VOLUME to ensure a clean environment:
2. List Files in USER VOLUME
Confirm that the USER VOLUME is empty:
3. Upload Files to USER VOLUME
Upload local files to different directories in USER VOLUME based on their types:
4. Verify Upload Results
Confirm that the files have been successfully uploaded:
4. Viewing and Downloading Files
1. Download an Image File
Download an image from USER VOLUME and display it:
5. Close the Session
After completing the operations, close the session to release resources:
6. Summary
Through this guide, you have learned how to:
- Use the Python API to manage files in USER VOLUME.
- Upload, download, and list files in the data lake.
- View image files and verify operation results.
Next Steps:
- Explore the use of VOLUME for seamless integration of files and tables.
- Try running joint queries on files and tables to experience the benefits of a unified Lakehouse.
Appendix: