Federation Query

Federation Query lets you query data in external data systems (Hive, Databricks, Iceberg, Snowflake, etc.) directly using standard SQL — no data migration or copying required. By creating an EXTERNAL CATALOG, you map an external data catalog into Lakehouse for unified cross-system queries.


Supported External Data Sources

External SystemConnection MethodTypical Use Case
Apache HiveHive Metastore URIsIn-place acceleration of existing Hive warehouses, replacing Presto/Trino
Databricks Unity CatalogDatabricks APICross-platform federated analytics without moving Databricks data
Iceberg REST CatalogIceberg REST APIQuery any data lake compatible with the Iceberg REST protocol
Snowflake Open CatalogIceberg REST API + OAuthAccess Iceberg tables managed by Snowflake

Core Concepts

There are three independent approaches to access external data, each suited to different scenarios:

External Catalog (Recommended)

Maps an external data system (Hive/Databricks/Snowflake/Iceberg REST) as a top-level catalog; the Schemas and Tables underneath automatically correspond to the external system's structure:

External Catalog ← Top-level catalog, maps the external system └── External Schema ← Corresponds to a Schema/Database in the external system └── External Table ← Corresponds to an actual table in the external system

Queries use three-level naming: external_catalog.schema.table

SELECT * FROM databricks_catalog.table_types_demo.orders_external;

Supports: Hive, Databricks Unity Catalog, Iceberg REST (including Snowflake Open Catalog)

External Schema (Standalone)

Without going through an External Catalog, directly mounts an external Hive Database into the current Workspace's internal Catalog, using two-level naming schema.table (equivalent to <current internal catalog>.schema.table). Direct HMS mapping — all tables under the entire Database are immediately queryable, and newly added tables are automatically visible without per-table definitions.

SELECT * FROM hive_orders.order_detail LIMIT 100;

Supports: Hive (OSS/COS/GCS/HDFS). Read-only; DML is not supported.

External Table (Standalone)

Creates a single table pointing to external storage under an ordinary Schema in the current Workspace's internal Catalog, using two-level naming schema.table. Unlike External Schema: column names and types can be customized, and renaming and modifying comments are supported.

CREATE EXTERNAL TABLE my_schema.delta_orders LOCATION 's3://bucket/orders/' USING DELTA;

Supports: Kafka, Delta Lake, Hudi. Read-only; DML is not supported.

Comparison of the Three Approaches

External CatalogExternal Schema (Standalone)External Table (Standalone)
Catalog locationIndependent external CatalogCurrent Workspace's internal CatalogCurrent Workspace's internal Catalog
NamingThree-level catalog.schema.tableTwo-level schema.tableTwo-level schema.table
Use caseCross-platform federated analyticsMount an entire Hive database into the workspaceCustom mapping for a single external table
Supported sourcesHive, Databricks, Iceberg REST, SnowflakeHiveKafka, Delta Lake, Hudi
Schema definitionAuto-mapped from external systemAuto-mapped from HMSManually define column names and types
New external tables visibleRequires re-mappingAutomatically visibleMust be created one by one

Selection guide: External Catalog vs External Schema


Quick Start

External Catalog depends on a pre-created Catalog Connection (Storage Connection → Catalog Connection → External Catalog). For the complete configuration steps, see the External Object User Guide.

Once configured, use standard SQL to query:

-- 1. Create an External Catalog (requires a Catalog Connection first) CREATE EXTERNAL CATALOG hive_prod CONNECTION hive_catalog_conn; -- 2. View all Catalogs (including external ones) SHOW CATALOGS; -- category = 'EXTERNAL' indicates a federated catalog -- 3. View Schemas and tables in the external Catalog SHOW SCHEMAS IN hive_prod; SHOW TABLES IN hive_prod.default; -- 4. Query directly without migrating data SELECT * FROM hive_prod.default.orders LIMIT 100; -- 5. Cross-system JOIN SELECT o.order_id, u.name FROM hive_prod.default.orders o JOIN my_lakehouse_table u ON o.user_id = u.id;


This Section

PageDescription
External Object User GuideComplete operations for creating, querying, and managing External Catalog / Schema / Table
Query Snowflake OpenCatalog Iceberg TablesFederated queries on Snowflake-managed Iceberg data via Iceberg REST API
Databricks Unity Catalog Federation Query PracticeFull step-by-step setup guide with verified results and common error troubleshooting