Create Compute Cluster
Description
This feature allows users to create a compute cluster in the SQL workspace based on the specified name and configuration. The compute cluster (Virtual-Cluster, abbreviated as: VCluster) is a compute resource cluster service provided by Singdata Lakehouse, offering resources such as CPU, memory, and temporary storage required for executing query analysis. Users can use the compute cluster to perform various ETL, streaming analysis, ad-hoc queries, and data integration tasks. When executing SQL Select queries or various DML operations (such as DELETE, INSERT, UPDATE, etc.) that require computation in Lakehouse, the compute cluster will be used.
The compute cluster includes three types: General Purpose Virtual Cluster (abbreviated as: GP type), Analytics Purpose Virtual Cluster (abbreviated as: AP type), and Integration Virtual Cluster.
In the General Purpose compute cluster, jobs submitted to the compute cluster share the cluster's computing resources, suitable for handling offline jobs; the Analytics Purpose cluster has features such as multiple compute instances and automatic scaling, suitable for handling online, high-concurrency jobs. The Integration cluster is specifically used for data integration tasks.
Syntax
1 .name: The name of the compute cluster. It must be unique within the workspace and cannot be changed once created. Naming rules: 3 to 28 characters, only letters, underscores, and decimal numbers (0-9) are supported, and spaces are not allowed.
2. objectProperties: The properties that can be specified when creating a compute cluster, along with their specific meanings and values, are as follows:
Field Name | Field Meaning | Value Range | Default |
---|---|---|---|
VCLUSTER_SIZE | Compute cluster size. Supports sizes from 1 CRU to 256 CRU, with increasing computing power. (Synchronous clusters separately support two small sizes: 0.25 CRU and 0.5 CRU) | Number: 1-256, unit is CRU (Compute Resource Unit). | 1 |
MIN_VCLUSTER_SIZE | Applicable only to GENERAL clusters. The minimum size of the compute cluster when scaling, supporting sizes from 1 CRU to 256 CRU, and must be less than or equal to the MAX_VCLUSTER_SIZE parameter. Cannot be used simultaneously with VCLUSTER_SIZE. | Number: 1-256, unit is CRU (Compute Resource Unit). | None |
MAX_VCLUSTER_SIZE | Applicable only to GENERAL clusters. The maximum size of the compute cluster when scaling, supporting sizes from 1 CRU to 256 CRU, and must be greater than or equal to the MIN_VCLUSTER_SIZE parameter. Cannot be used simultaneously with VCLUSTER_SIZE. | Number: 1-256, unit is CRU (Compute Resource Unit). | None |
VCLUSTER_TYPE | Compute cluster type. GENERAL: Suitable for data ingestion and ELT operations; ANALYTICS: Suitable for scenarios with strong requirements for query latency and concurrency capabilities; INTEGRATION: Used for data integration task scenarios. | GENERAL | ANALYTICS | INTEGRATION | GENERAL |
MIN_REPLICAS | Minimum number of instances for the compute cluster. Only applicable to analytical compute clusters. | 1-10 | 1 |
MAX_REPLICAS | Maximum number of instances in the compute cluster. Only applicable to analytical compute clusters. | 1-10 | 1 |
AUTO_SUSPEND_IN_SECOND | Idle time before the cluster automatically shuts down. Unit: seconds. | Value -1 or other integers greater than or equal to 0. | 600 |
AUTO_RESUME | Whether to automatically resume. | TRUE|FALSE | TRUE |
MAX_CONCURRENCY | Maximum concurrency load per compute instance in the compute cluster. Only applicable to analytical compute clusters. | 1-32 | 8 |
QUERY_RUNTIME_LIMIT_IN_SECOND | Maximum execution time for jobs submitted to this compute cluster. Unit: seconds. | Integer greater than 0. | 86400 |
PRELOAD_TABLES | The compute cluster can cache specified table data to the local SSD disk of the compute cluster by configuring preload_table, either on a schedule or triggered. You can also set cache policies on the table. Only applicable to analytical compute clusters. | schema_name.table_name, multiple table names separated by commas. Supports wildcards, e.g., sample_schema.* | null |
QUERY_RESOURCE_LIMIT_RATIO | Single Job Resource Ratio Threshold, the maximum proportion of CPU/memory resources that a single query task can use, relative to the total cluster resources | 0.0 ~ 1.0 (e.g., 0.1 means 10%) | 1.0 |
- Specify the maximum and minimum values for GP type VC during creation
- VCLUSTER_SIZE, MIN_VCLUSTER_SIZE, and MAX_VCLUSTER_SIZE cannot be set simultaneously.
- comment Specify the description information of the computing cluster, supporting up to 1024 characters.
Usage Example
-
Create a computing cluster using default properties:
-
Specify the creation of a general-purpose computing cluster, XSMALL specification, auto-start, auto-stop time of 60 seconds, maximum job execution time of 600 seconds:
Specify the creation of an analytical computing cluster, XSMALL specification, auto-start, auto-stop time of 1 minute, minimum instance count of 1, maximum instance count of 2, maximum concurrency per instance of 16, maximum job execution time of 600 seconds, pre-read data from public.demo and billing.payment tables, pull table data cache every 600 seconds: