Data Lake Storage Management: Volume
Overview
Lakehouse Volume is an object in Singdata Lakehouse that represents the location of cloud object storage. It provides access to cloud object storage, storage, management, and organization of files, and can be used to store and access files in various formats, including structured, semi-structured, and unstructured data. It can be organized and managed under the Schema of Lakehouse just like tables, views, and other objects. Using the Volume feature brings the following benefits:
- Unified Data Analysis: Supports calling AI workloads in Singdata Lakehouse to process images, PDFs, and specially formatted unstructured data in object storage, and perform unified processing and analysis with structured data in the platform.
- Unified Permission Management: Supports using the permission system of Singdata Lakehouse platform to manage permissions for files in object storage as well as for databases and tables.
- Unified Data Governance: Data in object storage will be uniformly managed and governed by the Singdata Lakehouse platform.
Lakehouse Volume is divided into two types according to the data storage location: Internal Volume and External Volume:
Feature | External Volume | Internal Volume |
---|---|---|
Storage Location | External storage location specified by the customer, Singdata only retains path metadata. Supported storage products include: - Alibaba Cloud OSS - Tencent Cloud COS - Amazon Cloud S3 - Google Cloud GCS | Internal storage within the Singdata account, stored together with internal table objects under the specified Schema path. |
Usage Scenarios | When you are already using object storage services to store and manage data, you can use the External Volume Singdata Lakehouse to mount to the existing storage service, treating object storage as a data lake to share existing data. Load or export External Volume as a temporary area: Lakehouse can import data from External Volume or export data to External Volume via the COPY method; directly process and analyze External Volume as a raw layer: use the direct Singdata Lakehouse engine to perform SQL queries and AI analysis on files (e.g., using External Function). By creating different External Volumes and setting permissions for different users or roles, users can uniformly access the data lake through Lakehouse, achieving flexible data lake management and access control. | No additional creation is required, Internal Volume by default provides the following two predefined Volume objects: Table Volume: The file storage area associated with the data table by default, with permissions consistent with the data table permissions. Table Volume is often used to simplify batch import and export scenarios. As long as the user has read and write permissions for the target table, they can directly exchange files in the Table Volume directory associated with the target table using the PUT/GET method, simplifying the Volume permission requirements in batch import/export scenarios. Table Volume is often used as temporary storage for structured file data, and it is important to clean up unnecessary file data after use. User Volume: The file storage area associated with the user account, with the Workspace User having default management permissions for this area. Each Workspace has a default User Volume with management permissions, where users can upload files of various structures, including structured, unstructured data, or resource packages dependent on UDF. The current user can access and use the data in User Volume through the Lakehouse engine by default. Table/User Volume has default permissions and does not support additional authorization. |
Operation Management | Create External Volume Upload local files via the PUT command Download files to local via the GET command View the file list under the specified Volume Delete files under the specified Volume path Import and export data using the COPY INTO command Query file data using SQL Use the get_presigned_url function to obtain the file access address Delete External Volume | Upload local files via the PUT command Delete files under the specified Volume path |
Storage Fees | Not part of Lakehouse storage billing items. | Same as internal table storage, part of Lakehouse storage billing items. |
Related Links
Alibaba Cloud OSS VOLUME Creation
Tencent Cloud COS VOLUME Creation
Amazon Cloud S3 VOLUME Creation
Import Data from VOLUME to Table
DDL Operations
Command | Description | Applicable to USER Volume | Applicable to Table Volume | Applicable to External Volume |
---|---|---|---|---|
CREATE VOLUME | Create an internal or external Volume. | No | No | Yes |
DROP VOLUME | Remove an internal or external Volume. | No | No | Yes |
DESC VOLUME | Show properties of an internal or external Volume. | No | No | Yes |
SHOW VOLUME DIRECTORY | Return a list of files saved in the Volume. | Yes | Yes | Yes |
REMOVE | Remove saved files from the Stage. | Yes | Yes | Yes |
SHOW VOLUMES | Return a list of created internal and external Volumes. | No | No | Yes |
Permissions
Permission | Description |
---|---|
READ METADATA | Permission to view Volume object metadata. |
READ VOLUME | Permission to read files and directories under the Volume object. Required when viewing the file list under the Volume, reading Volume files via SQL, and downloading files via the GET command. |
WRITE VOLUME | Permission to write data to the Volume. Required when uploading files via the PUT command and deleting files via the REMOVE command. |
ALTER VOLUME | Permission required for the ALTER VOLUME command. For example: ALTER VOLUME <name> REFRESH to refresh the file metadata under the Volume. |
ALL | All permissions for the Volume object. |
Cost
- External Volume: No additional storage fees on the Lakehouse side.
- Internal Volume: Storage fees are charged based on the actual storage size.
Constraints and Limitations
- The size of a single uploaded file must not exceed 5 GB.
- JDBC driver requires version 1.4.4 and above to support local PUT/GET interfaces.