Create Hive External Catalog

Steps to create Hive External Catalog

  1. Create Storage Connection: First, you need to create a storage connection to access the object storage service.
  2. Create Catalog Connection: Use the storage connection information and Hive Metastore address to create a Catalog Connection.
  3. Create External Catalog: Use the Catalog Connection to create an external Catalog to access external data in the data lake.

Syntax

CREATE EXTERNAL CATALOG catalog_name
    CONNECTION catalog_api_connection;

Parameter Description

catalog_api_connection: The name of the catalog connection. Currently, only HIVE is supported. Refer to creating catalog connection

Example

Example 1: Hive ON OSS

  • Create storage connection
CREATE STORAGE CONNECTION if not exists catalog_storage_oss   
 type OSS    
 ACCESS_ID='LTAIxxxxxxxxxxxx'    
 ACCESS_KEY='T8Gexxxxxxmtxxxxxx'   
 ENDPOINT='oss-cn-hangzhou-internal.aliyuncs.com';
  • Create Catalog Connection Please ensure that the network between the server where HMS is located and the Lakehouse is connected. For specific connection methods, please refer to Creating Alibaba Cloud Endpoint Service
CREATE CATALOG CONNECTION if not exists catalog_api_connection    
type hms    
hive_metastore_uris='xxxx:9083'    
storage_connection='catalog_storage_oss';
  • Create Catalog
CREATE EXTERNAL CATALOG my_external_catalog
    CONNECTION catalog_api_connection;
  • Verify connectivity to Hive Catalog
-- Verify reading metadata
SHOW SCHEMAS IN my_external_catalog;
+---------------------------------------------------------------------------+
|                                schema_name                                |
+---------------------------------------------------------------------------+
| air_travel                                                                |
| all_data                                                                  |
| automobile                                                                |
| automv_schema                                                             |
| bigquant                                                                  |
+---------------------------------------------------------------------------+

-- Verify reading data, STORAGE CONNECTION permission will be used when reading data
SELECT * FROM my_external_catalog.my_schema.my_table;

Case 2: Hive ON COS

  • Create storage connection
```sql
CREATE STORAGE CONNECTION catalog_storage_cos 
  TYPE COS
  ACCESS_KEY = '<access_key>'
  SECRET_KEY = '<secret_key>'
  REGION = 'ap-shanghai'
  APP_ID = '1310000503';

Parameters: * TYPE: The object storage type, for Tencent Cloud, fill in COS (case insensitive) * ACCESS_KEY / SECRET_KEY: The access keys for Tencent Cloud, refer to: Access Keys * REGION: Refers to the region where the Tencent Cloud Object Storage COS data center is located. When Singdata Lakehouse accesses Tencent Cloud COS within the same region, the COS service will automatically route to internal network access. For specific values, please refer to the Tencent Cloud documentation: Regions and Access Domains.

  • Create Catalog Connection Please ensure that the network between the HMS server and Lakehouse is connected. For specific connection methods, refer to Create Tencent Cloud Endpoint Service
CREATE CATALOG CONNECTION if not exists catalog_api_connection    
type hms    
hive_metastore_uris='xxxx:9083'    
storage_connection='catalog_storage_cos';
CREATE CATALOG CONNECTION if not exists catalog_api_connection    
type hms    
hive_metastore_uris='xxxx:9083'    
storage_connection='catalog_storage_oss';
  • Create Catalog
CREATE EXTERNAL CATALOG my_external_catalog
    CONNECTION catalog_api_connection;
  • Verify connectivity to Hive Catalog
-- Verify reading metadata
SHOW SCHEMAS IN my_external_catalog;
+---------------------------------------------------------------------------+
|                                schema_name                                |
+---------------------------------------------------------------------------+
| air_travel                                                                |
| all_data                                                                  |
| automobile                                                                |
| automv_schema                                                             |
| bigquant                                                                  |
+---------------------------------------------------------------------------+

-- Verify reading data, STORAGE CONNECTION permission will be used when reading data
SELECT * FROM my_external_catalog.my_schema.my_table;

Case 3: Hive ON S3

  • Create storage connection
CREATE STORAGE CONNECTION catalog_storage_s3
    TYPE S3
    ACCESS_KEY = 'AKIAQNBSBP6EIJE33***'
    SECRET_KEY = '7kfheDrmq***************************'
    ENDPOINT = 's3.cn-north-1.amazonaws.com.cn'
    REGION = 'cn-north-1';

Parameters:

  • TYPE: The object storage type, AWS should be filled in as S3 (case insensitive)
  • ACCESS_KEY / SECRET_KEY: The access key for AWS, refer to: Access Keys for how to obtain it
  • ENDPOINT: The service address for S3, AWS China is divided into Beijing and Ningxia regions. The service address for S3 in the Beijing region is s3.cn-north-1.amazonaws.com.cn, and for the Ningxia region, it is s3.cn-northwest-1.amazonaws.com.cn. Refer to: China Region Endpoints to find the endpoints for the Beijing and Ningxia regions -> Amazon S3 corresponding endpoints
  • REGION: AWS China is divided into Beijing and Ningxia regions, the region values are: Beijing region cn-north-1, Ningxia region cn-northwest-1. Refer to: China Region Endpoints
  • Create Catalog Connection
CREATE CATALOG CONNECTION if not exists catalog_api_connection    
type hms    
hive_metastore_uris='xxxx:9083'    
storage_connection='catalog_storage_s3';
  • Create Catalog
CREATE EXTERNAL CATALOG my_external_catalog
    CONNECTION catalog_api_connection;
  • Verify connectivity to Hive Catalog
-- Verify reading metadata
SHOW SCHEMAS IN my_external_catalog;
+---------------------------------------------------------------------------+
|                                schema_name                                |
+---------------------------------------------------------------------------+
| air_travel                                                                |
| all_data                                                                  |
| automobile                                                                |
| automv_schema                                                             |
| bigquant                                                                  |
+---------------------------------------------------------------------------+

-- Verify reading data, STORAGE CONNECTION permission will be used when reading data
SELECT * FROM my_external_catalog.my_schema.my_table;

Case 4: Hive ON HDFS (Read Support)

  • Create Storage Connection

    CREATE STORAGE CONNECTION hdfs_conn
    TYPE HDFS
    NAME_NODE='zetta-cluster'
    NAME_NODE_RPC_ADDRESSES=['11.110.239.148:8020'];
    • TYPE HDFS: Specifies the connection type as HDFS.
    • NAME_NODE: Corresponds to dfs.nameservices in the HDFS configuration, which is the logical name of the HDFS cluster, such as zetta-cluster.
    • NAME_NODE_RPC_ADDRESSES: Corresponds to dfs.namenode.rpc-address in the HDFS configuration, which is the RPC address of the NameNode, formatted as [<host>:<port>], such as ['11.110.239.148:8020'].
  • Create Catalog Connection

    CREATE CATALOG CONNECTION if not exists catalog_api_connection    
    type hms    
    hive_metastore_uris='xxxx:9083'    
    storage_connection='hdfs_conn';
  • Create Catalog

    CREATE EXTERNAL CATALOG my_external_catalog
        CONNECTION catalog_api_connection;
  • Verify Connectivity to Hive Catalog

    -- Verify metadata reading
    SHOW SCHEMAS IN my_external_catalog;
    
    -- Verify data reading; data is read using the permissions of the STORAGE CONNECTION
    SELECT * FROM my_external_catalog.my_schema.my_table;

Create Databricks External Catalog

Steps to create Databricks External Catalog

  1. Create Catalog Connection: Store Databricks' Unity Catalog, connection authentication information.
  2. Create External Catalog: Use Catalog Connection to create an external Catalog to access external data in the data lake.

Syntax

CREATE EXTERNAL CATALOG catalog_name
    CONNECTION catalog_api_connection;
    OPTIONS ('catalog'='catalog_name');

Parameter Description

  • catalog_name :The name of the external Catalog. This name is used to identify the Catalog, and it must be unique and comply with naming conventions.
  • CONNECTION catalog_api_connection :Specifies the connection to the external Catalog. catalog_api_connection is the name of a pre-created connection used to access the external Catalog.
  • OPTIONS ('catalog'='catalog_name') :Specifies the configuration options for the external Catalog. 'catalog'='catalog_name':Indicates the name of the external Catalog, catalog_name is the name of the target Catalog.