Lakehouse External Table Query Guide
Overview
External Tables allow Lakehouse to directly query data stored in external systems (such as Delta Lake, Hudi, Kafka) without importing the data into Lakehouse. External tables manage only metadata; the actual data files remain in external storage. This guide is organized by business scenario to help you quickly master external table creation and query methods.
Quick Navigation
- Create Delta External Table -- Directly query Delta Lake data
- Create Hudi External Table -- Directly query Hudi data
- Create Kafka External Table -- Real-time query of Kafka topics
- Query External Tables -- Use standard SQL to query external data
- Drop External Tables -- Clean up external table metadata
SQL Commands Covered
| Command | Purpose | Use Case |
|---|---|---|
CREATE EXTERNAL TABLE ... USING DELTA | Create Delta external table | Query Delta Lake format data |
CREATE EXTERNAL TABLE ... USING HUDI | Create Hudi external table | Query Apache Hudi format data |
CREATE EXTERNAL TABLE ... USING KAFKA | Create Kafka external table | Real-time query of Kafka message streams |
SELECT * FROM external_table | Query external table | Cross-system federated query |
DROP EXTERNAL TABLE | Drop external table | Clean up metadata (does not affect external data) |
Prerequisites
The following examples assume external data is already stored in object storage or a Kafka cluster:
Create Delta External Table
Use CREATE EXTERNAL TABLE with USING DELTA to mount a Delta Lake table.
Parameter Descriptions:
USING DELTA: Specifies the external data format as Delta Lake.LOCATION: Points to the Volume path where the Delta files are stored.
Create Hudi External Table
Use USING HUDI to mount an Apache Hudi table.
Use Cases:
- Real-time data lake architectures where data is continuously written to Hudi by Flink/Spark.
- External data sources that require Upsert and incremental query support.
Create Kafka External Table
Use USING KAFKA to map a Kafka topic as an external table, supporting real-time SQL queries.
Query External Tables
Once created, external tables can be queried using standard SQL with exactly the same syntax as querying local tables.
Performance Notes:
- When querying external tables, Lakehouse reads data files directly from external storage; network latency may be higher than with local tables.
- For frequently queried data on external tables, consider using
COPY INTOto import it into local tables.
Drop External Tables
Use DROP EXTERNAL TABLE to remove external table metadata.
Clean Up Test Data
After completing external table verification, it is recommended to clean up metadata:
Notes
- Read-only Access: External tables only support queries, not write operations such as
INSERT,UPDATE,DELETE. - Schema Synchronization: External table schemas are managed by external systems. If the external table structure changes, you need to execute
REFRESHor recreate the external table. - Network Connectivity: Querying external tables requires network connectivity between Lakehouse and the external storage (OSS/S3/Kafka).
- Format Support: Currently supports mainstream external formats such as Delta Lake, Hudi, and Kafka. For Parquet/ORC files, consider querying directly via Volume or importing with
COPY INTO. - Permission Requirements: Creating external tables requires read permission on the underlying Volume or Kafka.
