Lakehouse Federated Query Guide (External Catalog)

Overview

Federated Queries allow Lakehouse to directly query data in external data catalogs (such as Hive Metastore, Databricks Unity Catalog) without migrating or copying data. Using CREATE EXTERNAL CATALOG, you can map external data sources as Catalogs within Lakehouse and use standard SQL for cross-system queries. This guide is organized by business scenario to help you quickly master federated query configuration methods.


SQL Commands Covered

CommandPurposeUse Case
CREATE EXTERNAL CATALOGCreate an external catalogMount external catalogs such as Hive/Databricks
SHOW CATALOGSView all CatalogsConfirm external Catalog registration
SHOW SCHEMAS IN ext_catalogView external SchemasExplore external database structure
SHOW TABLES IN ext_catalog.schemaView external tablesExplore external table lists
SELECT * FROM ext_catalog.schema.tableQuery external tablesCross-system federated query

Prerequisites

The following examples assume that the corresponding Storage Connection and Catalog Connection have been created:

-- Ensure a Storage Connection has been created (pointing to external storage)
-- Ensure a Catalog Connection has been created (pointing to Hive Metastore or Databricks)

Create Hive External Catalog

Use CREATE EXTERNAL CATALOG with a Catalog Connection to mount a Hive data catalog.

-- Create a Hive External Catalog
CREATE EXTERNAL CATALOG hive_prod
CONNECTION hive_catalog_conn;

Parameter Descriptions:

  • hive_prod: The name of the external Catalog in Lakehouse.
  • hive_catalog_conn: A pre-created Catalog Connection containing HMS address and authentication information.

Query External Catalog Data

Use the three-level namespace (catalog.schema.table) to directly query external data.

-- Query an external table
SELECT customer_id, SUM(amount) as total_spent
FROM databricks_main_catalog.sales_db.customer_orders
GROUP BY customer_id
LIMIT 10;

Execution Notes:

  • Lakehouse requests metadata from the external Catalog and reads data files directly from external storage (such as OSS/S3).
  • Query syntax is identical to local tables, supporting JOIN, aggregation, window functions, etc.

View External Schemas and Tables

Use SHOW commands to explore the structure of external catalogs.

-- View all Catalogs
SHOW CATALOGS;

Returned Information:

workspace_namecreated_timecategory
databricks_main_catalog2025-11-20 12:00:49EXTERNAL
quick_start2025-01-15 10:27:21MANAGED
.........
-- View Schemas under an external Catalog
SHOW SCHEMAS IN databricks_main_catalog;

-- View tables under an external Schema
SHOW TABLES IN databricks_main_catalog.default;

Drop External Catalog

Use DROP CATALOG to remove an external catalog mapping.

-- Drop an external Catalog
DROP CATALOG IF EXISTS hive_prod;

Clean Up Test Data

After completing federated query verification, it is recommended to clean up the external Catalog mapping:

-- Drop external Catalog
DROP CATALOG IF EXISTS hive_prod;

Notes

  1. Read-only Access: External Catalogs only support SELECT queries, not DML operations (INSERT/UPDATE/DELETE).
  2. Network Latency: Querying external data requires cross-network file reads; performance is typically lower than local tables. Consider syncing frequently queried data to Lakehouse.
  3. Authentication Expiry: External Catalogs depend on the Catalog Connection's authentication information (such as OAuth Tokens). If authentication expires, Unauthenticated: invalid_client errors will occur; recreate the Connection.
  4. Schema Synchronization: External table structure changes are not automatically synced to Lakehouse. To obtain the latest schema, recreate the External Catalog or refresh metadata.
  5. Data Type Mapping: External system data types are mapped to Lakehouse equivalent types; some complex types may not be supported.
  6. Permission Requirements: Creating an External Catalog requires the instance_admin role; querying requires USAGE permission on the Catalog.