Create Compute Cluster

Description

This feature allows users to create a compute cluster in the SQL workspace based on the specified name and configuration. The compute cluster (Virtual-Cluster, abbreviated as: VCluster) is a compute resource cluster service provided by Singdata Lakehouse, offering resources such as CPU, memory, and temporary storage required for executing query analysis. Users can use the compute cluster to perform various ETL, streaming analysis, ad-hoc queries, and data integration tasks. When executing SQL Select queries or various DML operations (such as DELETE, INSERT, UPDATE, etc.) that require computation in Lakehouse, the compute cluster will be used.

The compute cluster includes three types: General Purpose Virtual Cluster (abbreviated as: GP type), Analytics Purpose Virtual Cluster (abbreviated as: AP type), and Integration Virtual Cluster.

In the General Purpose compute cluster, jobs submitted to the compute cluster share the cluster's computing resources, suitable for handling offline jobs; the Analytics Purpose cluster has features such as multiple compute instances and automatic scaling, suitable for handling online, high-concurrency jobs. The Integration cluster is specifically used for data integration tasks.

Syntax


-- 创建计算集群
CREATE VCLUSTER [IF NOT EXISTS] <name>
objectProperties
[COMMENT '']

--参数说明 
--创建分析型计算集群（ANALYTICS PURPOSE VIRTUAL CLUSTER）适用属性           
objectProperties ::=
            VCLUSTER_SIZE = num --integer from 1 to 256
            VCLUSTER_TYPE = ANALYTICS
            MIN_REPLICAS = num
            MAX_REPLICAS = num
            AUTO_SUSPEND_IN_SECOND = num
            AUTO_RESUME = TRUE| FALSE
            MAX_CONCURRENCY = num
            QUERY_RUNTIME_LIMIT_IN_SECOND = num
            PRELOAD_TABLES = "<schema_name>.<table_name>[,<schema_name>.<table_name>,...]"
            AUTO_PRELOAD_IN_SECOND = num
            
--创建通用型计算集群（GENERAL PURPOSE VIRTUAL CLUSTER）适用属性             
objectProperties ::=
            [VCLUSTER_SIZE = num | MIN_VCLUSTER_SIZE=num  MAX_VCLUSTER_SIZE=num] --integer from 1 to 256
            VCLUSTER_TYPE = GENERAL 
            AUTO_SUSPEND_IN_SECOND = num
            AUTO_RESUME = TRUE| FALSE
            QUERY_RUNTIME_LIMIT_IN_SECOND = num
            QUERY_RESOURCE_LIMIT_RATIO=num;

--创建同步型计算集群（INTEGRATION VIRTUAL CLUSTER）适用属性             
objectProperties ::=
            [VCLUSTER_SIZE = num | MIN_VCLUSTER_SIZE=num  MAX_VCLUSTER_SIZE=num] --integer from 1 to 256
            VCLUSTER_TYPE = INTEGRATION 
            AUTO_SUSPEND_IN_SECOND = num
            AUTO_RESUME = TRUE| FALSE
            QUERY_RUNTIME_LIMIT_IN_SECOND = num;

1 .name: The name of the compute cluster. It must be unique within the workspace and cannot be changed once created. Naming rules: 3 to 28 characters, only letters, underscores, and decimal numbers (0-9) are supported, and spaces are not allowed.

2. objectProperties: The properties that can be specified when creating a compute cluster, along with their specific meanings and values, are as follows:

Field Name	Field Meaning	Value Range	Default
VCLUSTER_SIZE	Compute cluster size. Supports sizes from 1 CRU to 256 CRU, with increasing computing power. (Synchronous clusters separately support two small sizes: 0.25 CRU and 0.5 CRU)	Number: 1-256, unit is CRU (Compute Resource Unit).	1
MIN_VCLUSTER_SIZE	Applicable only to GENERAL clusters. The minimum size of the compute cluster when scaling, supporting sizes from 1 CRU to 256 CRU, and must be less than or equal to the MAX_VCLUSTER_SIZE parameter. Cannot be used simultaneously with VCLUSTER_SIZE.	Number: 1-256, unit is CRU (Compute Resource Unit).	None
MAX_VCLUSTER_SIZE	Applicable only to GENERAL clusters. The maximum size of the compute cluster when scaling, supporting sizes from 1 CRU to 256 CRU, and must be greater than or equal to the MIN_VCLUSTER_SIZE parameter. Cannot be used simultaneously with VCLUSTER_SIZE.	Number: 1-256, unit is CRU (Compute Resource Unit).	None
VCLUSTER_TYPE	Compute cluster type. GENERAL: Suitable for data ingestion and ELT operations; ANALYTICS: Suitable for scenarios with strong requirements for query latency and concurrency capabilities; INTEGRATION: Used for data integration task scenarios.	GENERAL \| ANALYTICS \| INTEGRATION	GENERAL
MIN_REPLICAS	Minimum number of instances for the compute cluster. Only applicable to analytical compute clusters.	1-10	1
MAX_REPLICAS	Maximum number of instances in the compute cluster. Only applicable to analytical compute clusters.	1-10	1
AUTO_SUSPEND_IN_SECOND	Idle time before the cluster automatically shuts down. Unit: seconds.	Value -1 or other integers greater than or equal to 0.	600
AUTO_RESUME	Whether to automatically resume.	TRUE｜FALSE	TRUE
MAX_CONCURRENCY	Maximum concurrency load per compute instance in the compute cluster. Only applicable to analytical compute clusters.	1-32	8
QUERY_RUNTIME_LIMIT_IN_SECOND	Maximum execution time for jobs submitted to this compute cluster. Unit: seconds.	Integer greater than 0.	86400
PRELOAD_TABLES	The compute cluster can cache specified table data to the local SSD disk of the compute cluster by configuring preload_table, either on a schedule or triggered. You can also set cache policies on the table. Only applicable to analytical compute clusters.	schema_name.table_name, multiple table names separated by commas. Supports wildcards, e.g., sample_schema.*	null
QUERY_RESOURCE_LIMIT_RATIO	Single Job Resource Ratio Threshold, the maximum proportion of CPU/memory resources that a single query task can use, relative to the total cluster resources	`0.0` ~ `1.0` (e.g., `0.1` means 10%)	1.0

Specify the maximum and minimum values for GP type VC during creation


CREATE VCLUSTER [IF NOT EXISTS] <name> 
VCLUSTER_TYPE=GENERAL 
MIN_VCLUSTER_SIZE=num 
MAX_VCLUSTER_SIZE=num;

VCLUSTER_SIZE, MIN_VCLUSTER_SIZE, and MAX_VCLUSTER_SIZE cannot be set simultaneously.

comment Specify the description information of the computing cluster, supporting up to 1024 characters.

Usage Example

Create a computing cluster using default properties:


CREATE VCLUSTER sample_vc;

Specify the creation of a general-purpose computing cluster, XSMALL specification, auto-start, auto-stop time of 60 seconds, maximum job execution time of 600 seconds:


CREATE VCLUSTER demo_gp_vcluster 
VCLUSTER_SIZE = 1 
VCLUSTER_TYPE = GENERAL 
AUTO_SUSPEND_IN_SECOND = 60 
AUTO_RESUME = TRUE 
QUERY_RUNTIME_LIMIT_IN_SECOND = 600;

Specify the creation of an analytical computing cluster, XSMALL specification, auto-start, auto-stop time of 1 minute, minimum instance count of 1, maximum instance count of 2, maximum concurrency per instance of 16, maximum job execution time of 600 seconds, pre-read data from public.demo and billing.payment tables, pull table data cache every 600 seconds:


CREATE VCLUSTER demo_ap_vcluster 
VCLUSTER_SIZE = 1
VCLUSTER_TYPE = ANALYTICS
MIN_REPLICAS = 1
MAX_REPLICAS = 2
MAX_CONCURRENCY = 16
AUTO_SUSPEND_IN_SECOND = 60 
AUTO_RESUME = TRUE 
QUERY_RUNTIME_LIMIT_IN_SECOND = 600
PRELOAD_TABLES = 'public.demo,billing.payment';