Introduction to External Tables
Function Overview: External tables are a feature of Lakehouse that allows you to query data stored in external systems as if the data were stored inside the Lakehouse. This external data is not stored or managed within the Lakehouse.
Performance Tip: Since the data in external tables is stored outside the Lakehouse, querying external tables may be slightly slower than querying tables stored locally. To improve query efficiency, it is recommended to use the INSERT INTO internal_table_name SELECT * FROM external_table_name
statement or the COPY INTO
command to import data into the Lakehouse.
Supported External Table Data Sources
- kafka external table
- delta lake external table
- hudi external table
Differences Between External Tables and External Schema
- External Schema: Directly interacts with the Hive Metastore Service (HMS) and directly obtains Hive metadata information through the HMS interface.
- External Tables: Allow users to create tables by specifying column contents and table names.
- External Schema Limitations: Since it directly maps to HMS, it is not possible to directly create, delete, or rename tables under an External Schema.
- Advantages of External Tables: Support operations such as renaming and modifying comments because they are created under an internal schema.
Specific Cases
Connecting to Alibaba Cloud Oss
Connect to Google Gcs
When Lakehouse connects to Google Cloud Storage (GCS), it uses a service account key for authentication. Please follow the steps below:
-
Obtain the service account key:
- Log in to the Google Cloud Console.
- Follow the instructions in the Google Cloud documentation to create and download the JSON key file for the service account.
-
Configure the `` parameter:
- Open the downloaded JSON key file and copy the entire content of the private key.
-
Note:
- When configuring the
private_key
, add an 'r' to private_key to indicate that escape characters will not be escaped.
- When configuring the
External Table Billing
- Storage Fees: External tables do not incur storage fees because the data is not stored in the Lakehouse.
- Compute Fees: Using external tables for computation will consume compute resources, thus incurring compute fees.
External Table Permissions
External tables have the same permission points as internal tables. Operations such as insert\update\truncate\delete\undrop cannot be performed externally, so there are no corresponding operations.
- Create Permission: Requires permission to create tables.
- Delete Permission: Requires permission to drop tables.
- Read Permission: Requires permission to select.
Usage Notes
- Connection Configuration: When creating a connection, ensure the endpoint is configured correctly so that the Lakehouse can connect successfully. If the Lakehouse and the object storage are in the same cloud service and in the same region, usually using the internal network address can achieve network connectivity. If they are not in the same network environment, it is recommended to use the public address of the object storage.