Lakehouse Federated Query Guide (External Catalog)
Overview
Federated Queries allow Lakehouse to directly query data in external data catalogs (such as Hive Metastore, Databricks Unity Catalog) without migrating or copying data. Using CREATE EXTERNAL CATALOG, you can map external data sources as Catalogs within Lakehouse and use standard SQL for cross-system queries. This guide is organized by business scenario to help you quickly master federated query configuration methods.
Quick Navigation
- Create Hive External Catalog -- Mount a Hive Metastore
- Query External Catalog Data -- Use three-level namespace queries
- View External Schemas and Tables -- Explore external catalog structure
- Drop External Catalog -- Clean up the federation connection
SQL Commands Covered
| Command | Purpose | Use Case |
|---|---|---|
CREATE EXTERNAL CATALOG | Create an external catalog | Mount external catalogs such as Hive/Databricks |
SHOW CATALOGS | View all Catalogs | Confirm external Catalog registration |
SHOW SCHEMAS IN ext_catalog | View external Schemas | Explore external database structure |
SHOW TABLES IN ext_catalog.schema | View external tables | Explore external table lists |
SELECT * FROM ext_catalog.schema.table | Query external tables | Cross-system federated query |
Prerequisites
The following examples assume that the corresponding Storage Connection and Catalog Connection have been created:
Create Hive External Catalog
Use CREATE EXTERNAL CATALOG with a Catalog Connection to mount a Hive data catalog.
Parameter Descriptions:
hive_prod: The name of the external Catalog in Lakehouse.hive_catalog_conn: A pre-created Catalog Connection containing HMS address and authentication information.
Query External Catalog Data
Use the three-level namespace (catalog.schema.table) to directly query external data.
Execution Notes:
- Lakehouse requests metadata from the external Catalog and reads data files directly from external storage (such as OSS/S3).
- Query syntax is identical to local tables, supporting JOIN, aggregation, window functions, etc.
View External Schemas and Tables
Use SHOW commands to explore the structure of external catalogs.
Returned Information:
| workspace_name | created_time | category |
|---|---|---|
| databricks_main_catalog | 2025-11-20 12:00:49 | EXTERNAL |
| quick_start | 2025-01-15 10:27:21 | MANAGED |
| ... | ... | ... |
Drop External Catalog
Use DROP CATALOG to remove an external catalog mapping.
Clean Up Test Data
After completing federated query verification, it is recommended to clean up the external Catalog mapping:
Notes
- Read-only Access: External Catalogs only support
SELECTqueries, not DML operations (INSERT/UPDATE/DELETE). - Network Latency: Querying external data requires cross-network file reads; performance is typically lower than local tables. Consider syncing frequently queried data to Lakehouse.
- Authentication Expiry: External Catalogs depend on the Catalog Connection's authentication information (such as OAuth Tokens). If authentication expires,
Unauthenticated: invalid_clienterrors will occur; recreate the Connection. - Schema Synchronization: External table structure changes are not automatically synced to Lakehouse. To obtain the latest schema, recreate the External Catalog or refresh metadata.
- Data Type Mapping: External system data types are mapped to Lakehouse equivalent types; some complex types may not be supported.
- Permission Requirements: Creating an External Catalog requires the
instance_adminrole; querying requiresUSAGEpermission on the Catalog.
