External Catalog Federation Query
External Catalog is Singdata Lakehouse's federation query feature, allowing you to directly query data in external data sources (Hive, Databricks, Snowflake Iceberg, etc.) within the Lakehouse without copying data.
What Is an External Catalog
An External Catalog is a securable object in the Lakehouse that maps the database structure of an external data system. Through an External Catalog, users can:
- Execute read-only queries in the Lakehouse to access external data
- Import external data into the Lakehouse to build a unified data lake
- Centrally manage metadata from multiple data sources
Key Features:
- Read-Only Access: External Catalog only supports queries; external data cannot be modified
- No Copying Required: Data stays in the external system; the Lakehouse only reads metadata and data files
- Unified Entry Point: Query external data through standard SQL as if querying local tables
Architecture
Comparison with Data Synchronization
| Dimension | External Catalog | Data Synchronization |
|---|---|---|
| Data Location | Remains in the external system | Copied to the Lakehouse |
| Query Latency | Depends on network and data source performance | Local query, better performance |
| Storage Cost | Does not consume Lakehouse storage | Consumes Lakehouse storage |
| Data Consistency | Real-time (directly reads external data) | Has latency (depends on sync frequency) |
| Applicable Scenarios | Ad-hoc query, data exploration, federation analysis | Frequent queries, data processing, warehouse construction |
Supported Data Sources
| Data Source | Connection Method | Description |
|---|---|---|
| Apache Hive | Hive Metastore URIs | Access Hive tables via Hive Metastore |
| Databricks Unity Catalog | Databricks API | Access tables in Databricks |
| Iceberg REST Catalog | Iceberg REST Catalog API | Connect to any external catalog that conforms to the Iceberg REST Catalog standard |
| Snowflake Open Catalog | Iceberg REST Catalog + OAuth | Access Iceberg tables in Snowflake |
General Iceberg REST Catalog Description
The Lakehouse supports connecting any external catalog service conforming to this standard via the Iceberg REST Catalog protocol, not limited to Snowflake. Common application scenarios include:
- Connecting to self-built Iceberg REST Catalog services
- Connecting to third-party data platforms' Iceberg Catalogs
- Connecting to other services supporting the Iceberg REST API, such as Dremio, Polaris, Unity Catalog, etc.
Connection Architecture:
Connection Steps:
- Create a Storage Connection (specify the underlying storage type: OSS/S3/HDFS, etc.)
- Create a Catalog Connection (
TYPE ICEBERG_REST, configure URI and authentication information) - Create an External Catalog
Detailed Configuration Guide: Access Snowflake OpenCatalog Iceberg Tables via External Catalog
Snowflake Open Catalog Description
Snowflake Open Catalog is a concrete implementation based on the Iceberg REST Catalog protocol. The Lakehouse connects to Snowflake Open Catalog via the Iceberg REST Catalog protocol to enable cross-platform data federation queries.
Connection Architecture:
Usage Restrictions:
- Only read-only queries are supported; write and update operations are not supported
- External table names must exactly match the source table names in Snowflake (lowercase)
- Credential Vending must be enabled on the Snowflake side
Detailed Configuration Guide: Access Snowflake OpenCatalog Iceberg Tables via External Catalog
Key Configuration Parameters:
| Parameter | Description | Example |
|---|---|---|
| TYPE | Connection type, fixed as ICEBERG_REST | ICEBERG_REST |
| URI | Polaris API endpoint or custom REST Catalog address | https://account.snowflakecomputing.com/polaris/api/catalog |
| OAUTH_CLIENT_ID | OAuth client ID | Obtained when creating the Service Connection |
| OAUTH_CLIENT_SECRET | OAuth client secret | Obtained when creating the Service Connection |
| NAMESPACE | Catalog namespace | my_catalog.my_schema |
| WAREHOUSE | Catalog name | my_warehouse |
Usage Restrictions:
- Only read-only queries are supported; write and update operations are not supported
- External table names must exactly match the source table names in Snowflake (lowercase)
- Credential Vending must be enabled on the Snowflake side
Detailed Configuration Guide: Access Snowflake OpenCatalog Iceberg Tables via External Catalog
Core Concepts
Catalog Connection
A Catalog Connection is a securable object for connecting to external data sources, containing:
- The access address of the external data source
- Authentication information (Access Key, Secret Key, OAuth Token, etc.)
- Network configuration (whether PrivateLink is required)
External Catalog
An External Catalog is a catalog object created based on a Catalog Connection, mirroring the database structure of the external data source:
- External Catalog -> External database/namespace
- External Schema -> External Schema
- External Table -> External Table
External Schema
An External Schema is a namespace within an External Catalog, corresponding to a Schema/Database in the external data source.
External Table
An External Table is a table object within an External Schema, corresponding to a table in the external data source. When querying an External Table, the Lakehouse:
- Retrieves the table's metadata (column definitions, storage location, etc.) from the external Metastore
- Reads data files directly from external storage (OSS/S3/HDFS)
- Performs query computation in the Lakehouse engine
Typical Application Scenarios
Scenario 1: Historical Data Query
An enterprise retains historical data in Hive and queries it through an External Catalog:
- No need to import large amounts of historical data into the Lakehouse
- Query on demand, saving storage costs
- Unified SQL interface, no need to learn new tools
Scenario 2: Cross-Platform Data Federation
Simultaneously query local Lakehouse data and data in Databricks:
Scenario 3: Pre-Migration Verification
Before migrating data from an external system to the Lakehouse:
- Verify data completeness through the External Catalog
- Compare data consistency before and after migration
- Confirm data quality before executing the migration
Operation Flow
Permission Management
- Currently, only the
instance_adminrole can query a created External Catalog - Authentication information through the Catalog Connection is required to access the external data source
- The external data source itself also needs corresponding access permissions configured
Notes
- Read-Only Access: External Catalog does not support INSERT/UPDATE/DELETE operations
- Performance Considerations: Querying external data across networks may be slower than local queries
- Authentication Security: Catalog Connection contains sensitive authentication information and must be properly safeguarded
- Network Connectivity: Ensure network connectivity between the Lakehouse and the external data source
Typical Operation Examples
The following examples are based on a real environment execution, using clickzetta_sample_data (SHARED type) and databricks_main_catalog (EXTERNAL type) to demonstrate common operations.
1. View All Catalogs
category field description: MANAGED is a locally managed Catalog, SHARED is a shared dataset, EXTERNAL is an External Catalog.
2. View External Catalog Details
connection_name is the name of the Catalog Connection used by this Catalog, and origin_catalog is the corresponding Catalog name in the external system.
3. Browse Catalog Structure
4. Query Tables in an External Catalog
Use the three-level naming structure catalog_name.schema_name.table_name to directly query external data:
5. Cross-Catalog Federation Join
Join tables from an external Catalog with local Lakehouse tables without copying data in advance:
Cross-platform federation query example (local table JOIN external Catalog table):
