CREATE EXTERNAL SCHEMA
Overview
The CREATE EXTERNAL SCHEMA statement maps an external data source (Hive Metastore or Databricks Unity Catalog) as a schema in Lakehouse, allowing users to query external data directly in Lakehouse without migrating the data.
Usage Restrictions
The following external data sources are currently supported:
- Hive on OSS (Alibaba Cloud Object Storage, read and write supported)
- Hive on COS (Tencent Cloud Object Storage, read and write supported)
- Hive on S3 (AWS Object Storage, read and write supported)
- Hive on HDFS (Preview, read-only, contact Lakehouse support)
- Databricks Unity Catalog
Write formats supported: Parquet, ORC, Text.
Syntax
Parameters
| Parameter | Required | Description |
|---|---|---|
schema_name | Yes | Name of the external schema, must be unique within the current workspace |
IF NOT EXISTS | No | If the schema already exists, skip without error |
CONNECTION connection_name | Yes | Pre-created Catalog Connection name; see Create Catalog Connection |
OPTIONS | No | Additional configuration options for specifying the database or catalog name in the external data source |
Common OPTIONS
| Option | Applicable Scenario | Description |
|---|---|---|
SCHEMA | Hive | The Hive database name to map. Defaults to schema_name if not specified |
catalog | Databricks | The Databricks Unity Catalog name |
schema | Databricks | The Databricks Unity Catalog schema name |
Creation Process
Hive External Schema (HMS)
Creating a Hive External Schema requires three steps: Create Storage Connection -> Create Catalog Connection -> Create External Schema.
Step 1: Create Storage Connection
Step 2: Create Catalog Connection
Ensure that the HMS server network is connected with the Lakehouse network. See Creating Alibaba Cloud PrivateLink Endpoint Service.
Step 3: Create External Schema
Verify Connectivity
Databricks Unity Catalog External Schema
Step 1: Create Catalog Connection
See Create Databricks Catalog.
Step 2: Create External Schema
Verify Connectivity
Cloud Platform Storage Connection Parameters
Hive on COS (Tencent Cloud)
REGION: COS data center region. See Regions and Access Endpoints.- For network connectivity, see Creating Tencent Cloud PrivateLink Endpoint Service.
Hive on S3 (AWS)
ENDPOINT: Beijing regions3.cn-north-1.amazonaws.com.cn, Ningxia regions3.cn-northwest-1.amazonaws.com.cn.REGION: Beijing regioncn-north-1, Ningxia regioncn-northwest-1.
Hive on HDFS (Preview)
NAME_NODE: Corresponds todfs.nameservicesin HDFS configuration, i.e., the logical cluster name.NAME_NODE_RPC_ADDRESSES: Corresponds todfs.namenode.rpc-address, format:[<host>:<port>].
Notes
- A corresponding Catalog Connection must be created before creating an External Schema.
- When the
SCHEMAparameter inOPTIONSis not specified, Lakehouse defaults to mappingschema_nameto the Hive database. - Hive on HDFS is currently in Preview and only supports reads; contact Lakehouse support if needed.
- Data in an External Schema is stored in the external system; deleting an External Schema does not delete the underlying data.
