Create Hive External Catalog
Steps to create Hive External Catalog
- Create Storage Connection: First, you need to create a storage connection to access the object storage service.
- Create Catalog Connection: Use the storage connection information and Hive Metastore address to create a Catalog Connection.
- Create External Catalog: Use the Catalog Connection to create an external Catalog to access external data in the data lake.
Syntax
Parameter Description
catalog_api_connection: The name of the catalog connection. Currently, only HIVE is supported. Refer to creating catalog connection
Example
Example 1: Hive ON OSS
- Create storage connection
- Create Catalog Connection Please ensure that the network between the server where HMS is located and the Lakehouse is connected. For specific connection methods, please refer to Creating Alibaba Cloud Endpoint Service
- Create Catalog
- Verify connectivity to Hive Catalog
Case 2: Hive ON COS
- Create storage connection
Parameters:
* TYPE: The object storage type, for Tencent Cloud, fill in COS
(case insensitive)
* ACCESS_KEY / SECRET_KEY: The access keys for Tencent Cloud, refer to: Access Keys
* REGION: Refers to the region where the Tencent Cloud Object Storage COS data center is located. When Singdata Lakehouse accesses Tencent Cloud COS within the same region, the COS service will automatically route to internal network access. For specific values, please refer to the Tencent Cloud documentation: Regions and Access Domains.
- Create Catalog Connection Please ensure that the network between the HMS server and Lakehouse is connected. For specific connection methods, refer to Create Tencent Cloud Endpoint Service
- Create Catalog
- Verify connectivity to Hive Catalog
Case 3: Hive ON S3
- Create storage connection
Parameters:
- TYPE: The object storage type, AWS should be filled in as S3 (case insensitive)
- ACCESS_KEY / SECRET_KEY: The access key for AWS, refer to: Access Keys for how to obtain it
- ENDPOINT: The service address for S3, AWS China is divided into Beijing and Ningxia regions. The service address for S3 in the Beijing region is
s3.cn-north-1.amazonaws.com.cn
, and for the Ningxia region, it iss3.cn-northwest-1.amazonaws.com.cn
. Refer to: China Region Endpoints to find the endpoints for the Beijing and Ningxia regions -> Amazon S3 corresponding endpoints - REGION: AWS China is divided into Beijing and Ningxia regions, the region values are: Beijing region
cn-north-1
, Ningxia regioncn-northwest-1
. Refer to: China Region Endpoints - Create Catalog Connection
- Create Catalog
- Verify connectivity to Hive Catalog
Case 4: Hive ON HDFS (Read Support)
-
Create Storage Connection
TYPE HDFS
: Specifies the connection type as HDFS.NAME_NODE
: Corresponds todfs.nameservices
in the HDFS configuration, which is the logical name of the HDFS cluster, such aszetta-cluster
.NAME_NODE_RPC_ADDRESSES
: Corresponds todfs.namenode.rpc-address
in the HDFS configuration, which is the RPC address of the NameNode, formatted as[<host>:<port>]
, such as['11.110.239.148:8020']
.
-
Create Catalog Connection
-
Create Catalog
-
Verify Connectivity to Hive Catalog
Create Databricks External Catalog
Steps to create Databricks External Catalog
- Create Catalog Connection: Store Databricks' Unity Catalog, connection authentication information.
- Create External Catalog: Use Catalog Connection to create an external Catalog to access external data in the data lake.
Syntax
Parameter Description
catalog_name
:The name of the external Catalog. This name is used to identify the Catalog, and it must be unique and comply with naming conventions.CONNECTION catalog_api_connection
:Specifies the connection to the external Catalog.catalog_api_connection
is the name of a pre-created connection used to access the external Catalog.OPTIONS ('catalog'='catalog_name')
:Specifies the configuration options for the external Catalog.'catalog'='catalog_name'
:Indicates the name of the external Catalog,catalog_name
is the name of the target Catalog.