CREATE EXTERNAL SCHEMA

Overview

The CREATE EXTERNAL SCHEMA statement maps an external data source (Hive Metastore or Databricks Unity Catalog) as a schema in Lakehouse, allowing users to query external data directly in Lakehouse without migrating the data.

Usage Restrictions

The following external data sources are currently supported:

  • Hive on OSS (Alibaba Cloud Object Storage, read and write supported)
  • Hive on COS (Tencent Cloud Object Storage, read and write supported)
  • Hive on S3 (AWS Object Storage, read and write supported)
  • Hive on HDFS (Preview, read-only, contact Lakehouse support)
  • Databricks Unity Catalog

Write formats supported: Parquet, ORC, Text.

Syntax

CREATE EXTERNAL SCHEMA [ IF NOT EXISTS ] schema_name
    CONNECTION connection_name
    [ OPTIONS ( option_key = 'option_value' [, ...] ) ];

Parameters

ParameterRequiredDescription
schema_nameYesName of the external schema, must be unique within the current workspace
IF NOT EXISTSNoIf the schema already exists, skip without error
CONNECTION connection_nameYesPre-created Catalog Connection name; see Create Catalog Connection
OPTIONSNoAdditional configuration options for specifying the database or catalog name in the external data source

Common OPTIONS

OptionApplicable ScenarioDescription
SCHEMAHiveThe Hive database name to map. Defaults to schema_name if not specified
catalogDatabricksThe Databricks Unity Catalog name
schemaDatabricksThe Databricks Unity Catalog schema name

Creation Process

Hive External Schema (HMS)

Creating a Hive External Schema requires three steps: Create Storage Connection -> Create Catalog Connection -> Create External Schema.

Step 1: Create Storage Connection

CREATE STORAGE CONNECTION IF NOT EXISTS catalog_storage_oss
    TYPE OSS
    ACCESS_ID = 'LTAIxxxxxxxxxxxx'
    ACCESS_KEY = 'T8Gexxxxxxmtxxxxxx'
    ENDPOINT = 'oss-cn-hangzhou-internal.aliyuncs.com';

Step 2: Create Catalog Connection

Ensure that the HMS server network is connected with the Lakehouse network. See Creating Alibaba Cloud PrivateLink Endpoint Service.

CREATE CATALOG CONNECTION IF NOT EXISTS catalog_api_connection
    TYPE hms
    hive_metastore_uris = 'xxxx:9083'
    storage_connection = 'catalog_storage_oss';

Step 3: Create External Schema

CREATE EXTERNAL SCHEMA IF NOT EXISTS my_external_schema
    CONNECTION catalog_api_connection
    OPTIONS (SCHEMA = 'default');

Verify Connectivity

-- Verify metadata reading
SHOW TABLES IN my_external_schema;

-- Verify data reading
SELECT * FROM my_external_schema.my_table LIMIT 10;

Databricks Unity Catalog External Schema

Step 1: Create Catalog Connection

See Create Databricks Catalog.

CREATE CATALOG CONNECTION IF NOT EXISTS conn_databricks
    TYPE databricks
    HOST = 'https://dbc-12345678-9abc.cloud.databricks.com'
    CLIENT_ID = 'client_id_value'
    CLIENT_SECRET = 'client_secret_value'
    ACCESS_REGION = 'us-west-2';

Step 2: Create External Schema

CREATE EXTERNAL SCHEMA IF NOT EXISTS external_db_sch
    CONNECTION conn_databricks
    OPTIONS ('catalog' = 'quick_start', 'schema' = 'default');

Verify Connectivity

SHOW TABLES IN external_db_sch;

Cloud Platform Storage Connection Parameters

Hive on COS (Tencent Cloud)

CREATE STORAGE CONNECTION catalog_storage_cos
    TYPE COS
    ACCESS_KEY = '<access_key>'
    SECRET_KEY = '<secret_key>'
    REGION = 'ap-shanghai'
    APP_ID = '1310000503';

Hive on S3 (AWS)

CREATE STORAGE CONNECTION catalog_storage_s3
    TYPE S3
    ACCESS_KEY = 'AKIAQNBSBP6EIJE33***'
    SECRET_KEY = '7kfheDrmq***'
    ENDPOINT = 's3.cn-north-1.amazonaws.com.cn'
    REGION = 'cn-north-1';
  • ENDPOINT: Beijing region s3.cn-north-1.amazonaws.com.cn, Ningxia region s3.cn-northwest-1.amazonaws.com.cn.
  • REGION: Beijing region cn-north-1, Ningxia region cn-northwest-1.

Hive on HDFS (Preview)

CREATE STORAGE CONNECTION hdfs_conn
    TYPE HDFS
    NAME_NODE = 'zetta-cluster'
    NAME_NODE_RPC_ADDRESSES = ['11.110.239.148:8020'];
  • NAME_NODE: Corresponds to dfs.nameservices in HDFS configuration, i.e., the logical cluster name.
  • NAME_NODE_RPC_ADDRESSES: Corresponds to dfs.namenode.rpc-address, format: [<host>:<port>].

Notes

  • A corresponding Catalog Connection must be created before creating an External Schema.
  • When the SCHEMA parameter in OPTIONS is not specified, Lakehouse defaults to mapping schema_name to the Hive database.
  • Hive on HDFS is currently in Preview and only supports reads; contact Lakehouse support if needed.
  • Data in an External Schema is stored in the external system; deleting an External Schema does not delete the underlying data.