Synchronizing RDS Data via PrivateLink VPC (Alibaba Cloud)

1. Applicable Scenarios:

Lakehouse Studio Data Integration synchronizes RDS data through the VPC network, addressing the following issues caused by public network transmission:

  • High data latency
  • High public network traffic costs
  • Insecure / Non-compliant

2. Summary of Advantages and Disadvantages of Historical Solutions:

  • Pure Private link solution does not support connecting to RDS instances, only supports self-built MySQL on ECS
  • VPC Peering verification passed: but exposes both internal network environments, posing security risks and not recommended
  • Using IGS SDK to push to LH via private link: but the import scheduling strategy is the customer's responsibility and cannot be integrated into the workflow
  • Pure SSH TUNNEL requires a segment to go through the public network, which can solve the security issue of RDS exposure to the public network but cannot solve the public network traffic issue

3. Solution Overview:

In general, this is a solution combining Private link + SSH Tunnel

Network architecture diagram:

(Customer environment can also be in other AZs in the same region)

By default, the endpoint side bears the cost, with two billing items: refer to Billing Rules

4. Configuration Method:

Step1: Environment:

Customer-side environment:

  • VPC: Hangzhou H: Dedicated Network VPC_Customr: vpc_lakehouse / vpc-bp1qmyayneio4mlyoyeb7, subnet: 172.16.0.0/12
  • RDS: Hangzhou H: Private network address: `` (rm-bp15gq963ic327h8f.mysql.rds.aliyuncs.com)
  • ECS: Hangzhou H: Private IP address 172.16.12.182

Lakehouse side:

CZ Studio UAT environment data integration EMR cluster VPC and VWS: Hangzhou H

SQL vpc-bp1jvn***********u vsw-bp1rp************cii

CZ UAT environment Alibaba Cloud main account: 138************83

Step2: Customer-side: Create SSH Port Forwarding on ECS

Create an ECS that communicates with RDS in the customer's network environment, and create port forwarding on this machine: accessing port 12345 of this ECS will be forwarded to port 3306 of RDS

ssh -CfNg -L 12345:rm-bp15gq963ic327h8f.mysql.rds.aliyuncs.com:3306 root@127.0.0.1 -p22

## Verification:
ps aux | grep ssh


Step 3: Create Load Balancer CLB

  • Go to the Load Balancer SLB console -> Left side Traditional Load Balancer CLB (formerly SLB) -> Create Traditional Load Balancer
  • Create the listener as follows:

  • In the tab Default Server Group, add the ECS from Step 0, with the front-end port as 22 (custom), and the back-end port as the port forwarded to RDS, which in this case is 12345

Step 4: Client Side: Create Endpoint Service

  • Go to the VPC console, select the same Region as the CZ Data Integration Cluster, left side Endpoint Service -> Create Endpoint Service
  • Service Resource Type -> Traditional Load Balancer CLB (same as Step 1), select the availability zone Hangzhou Availability Zone H, in the dropdown, select the CLB instance created in Step 1, automatically accept breakpoint connections YES, others default
  • Enter the endpoint service: in the Service Whitelist, add the Singdata environment main account: 1384322691904283 (or Contact Singdata Support: service@singdata.com

Step 5: Create Endpoint on Lakehouse Side

  • Go to the VPC console, select the same Region as the CZ Data Integration Cluster, left side Endpoint
  • In the endpoint service, the endpoint service from Step 2 will be automatically discovered, select the corresponding VPC and confirm creation;
  • In the Endpoint Connections of the Endpoint Service and Endpoint on both the user side and CZ side, you can obtain the domain name

Verification:

Lakehouse Studio Data Integration:

Add MySQL data source using jdbc address:

Jdbc URL: `jdbc:mysql://ep-bp1iabb21a27719ca8a2-cn-hangzhou-h.epsrv-bp1n7rvc8qbpudxk69fr.cn-hangzhou.privatelink.aliyuncs.com:22/mysql`

The endpoint service binds the CLB listener port to 22, and the back-end binds the ECS resource port to 12345;

Data Integration Verification: Import Successful