Lakehouse introduces a powerful feature that maps external databases to Lakehouse through EXTERNAL SCHEMA
, allowing users to query external data directly in Lakehouse storage without migrating data. This feature greatly enhances the convenience of cross-data source operations and queries, providing users with a more flexible data integration solution.
Usage Restrictions
- Currently, Singdata Lakehouse's external schema mapping feature supports the following external data sources:
- Hive on OSS (Alibaba Cloud Object Storage Service)
- Hive on COS (Tencent Cloud Object Storage Service)
- Hive on S3 (AWA Object Storage Service)
- Hive on HDFS (Preview, please contact Lakehouse support)
- Databricks Unity Catalog
- Supports both writing and reading. Write formats support parquet, orc, text file formats
Create Hive External Schema
Syntax
Create Storage Connection
First, you need to create a storage connection to connect to the external object storage service.
Create Hive Catalog Connection
Next, create a Catalog connection pointing to the Hive metadata storage service (Hive Metastore).
Create External Schema
Finally, create an External Schema to map the external data source to the Lakehouse.
connection
: Required parameter, specifies the name of the Catalog Connection.SCHEMA
: Optional parameter, used to map the database name of Hive. If not specified, Lakehouse will automatically map the createdschema_name
to the database in Hive.
Examples
Example 1: Hive ON OSS
- Create storage connection
- Create Catalog Connection Please ensure that the network between the server where HMS is located and the Lakehouse is connected. For specific connection methods, please refer to Creating Alibaba Cloud Endpoint Service
- Create External Schema
- Verify if the Hive Catalog is connected
Parameters:
-
TYPE: For object storage type, fill in
COS
for Tencent Cloud (case insensitive) -
ACCESS_KEY / SECRET_KEY: These are the access keys for Tencent Cloud. For how to obtain them, refer to: Access Keys
-
REGION: Refers to the region where the Tencent Cloud Object Storage COS data center is located. When Singdata Lakehouse accesses Tencent Cloud COS within the same region, the COS service will automatically route to internal network access. For specific values, please refer to the Tencent Cloud documentation: Regions and Access Domain Names.
-
Create Catalog Connection Please ensure that the network between the server where HMS is located and Lakehouse is connected. For specific connection methods, you can refer to Create Tencent Cloud Endpoint Service
- Create External Schema
- Verify if the Hive Catalog is connected
Case 3: Hive ON S3
- Create storage connection
Parameters:
- TYPE: The object storage type, AWS should be filled in as S3 (case insensitive)
- ACCESS_KEY / SECRET_KEY: The access key for AWS, refer to: Access Keys for how to obtain it
- ENDPOINT: The service address for S3, AWS China is divided into Beijing and Ningxia regions. The service address for S3 in the Beijing region is
s3.cn-north-1.amazonaws.com.cn
, and for the Ningxia region iss3.cn-northwest-1.amazonaws.com.cn
. Refer to: China Region Endpoints to find the endpoints for the Beijing and Ningxia regions -> Amazon S3 corresponding endpoints - REGION: AWS China is divided into Beijing and Ningxia regions, the region values are: Beijing region
cn-north-1
, Ningxia regioncn-northwest-1
. Refer to: China Region Endpoints - Create Catalog Connection
- Create External Schema
- Verify if the Hive Catalog is connected
Case 4: Hive ON HDFS (Read Support)
-
Create Storage Connection
TYPE HDFS
: Specifies the connection type as HDFS.NAME_NODE
: Corresponds todfs.nameservices
in the HDFS configuration, which is the logical name of the HDFS cluster, such aszetta-cluster
.NAME_NODE_RPC_ADDRESSES
: Corresponds todfs.namenode.rpc-address
in the HDFS configuration, which is the RPC address of the NameNode, formatted as[<host>:<port>]
, such as['11.110.239.148:8020']
.
-
Create Catalog Connection
-
Create External Schema
-
Verify Connectivity to Hive Catalog
Create Databricks External Schema
Syntax
Create Databricks Catalog Connection
Create a Catalog connection pointing to the Databricks metadata storage service. For specific usage, refer to Create Databricks Catalog
Parameter Description
connection_name
: The name of the connection, used to identify the Databricks Unity Catalog connection. The name must be unique and follow naming conventions.TYPE databricks
: Specifies the connection type as Databricks Unity Catalog.HOST
: The URL address of the Databricks workspace. The usual format ishttps://<workspace-url>
. Example:https://dbc-12345678-9abc.cloud.databricks.com
CLIENT_ID
: The client ID used for OAuth 2.0 machine-to-machine (M2M) authentication. Refer to the Databricks OAuth M2M Authentication Documentation to create an OAuth 2.0 application and obtain theCLIENT_ID
.CLIENT_SECRET
: The client secret used for OAuth 2.0 machine-to-machine (M2M) authentication. Refer to the Databricks OAuth M2M Authentication Documentation to create an OAuth 2.0 application and obtain theCLIENT_SECRET
.ACCESS_REGION
: The region where the Databricks workspace is located, such asus-west-2
oreast-us
.
Create External Schema
Parameter Description
schema_name
: The name of the external schema. This name is used to identify the external schema, and it must be unique and comply with naming conventions.CONNECTION connection_name
: Specifies the connection to the external schema.connection_name
is the name of a pre-created connection used to access the external schema.OPTIONS
: Specifies the configuration options for the external schema.'catalog'='catalog_value'
: Specifies the Catalog name of the external schema.'schema'='schema_value'
: Specifies the Schema name of the external schema. Example
Example Analysis:
external_db_sch
: The name of the created external schema.conn_db
: The name of the connection used to connect to the external schema.OPTIONS
:catalog='quick_start'
: Specifies the Catalog name of the external schema.schema='default'
: Specifies the Schema name of the external schema.
- Verify connectivity to Databricks Unity Catalog