Singdata Lakehouse Billing Instructions
1. Overview of Billing Methods
Singdata Lakehouse is an integrated data platform built on cloud-native technology. The platform records the resources you consume in scenarios such as data integration, data analysis, storage, and network transmission, and charges you accordingly based on three types: computing, storage, and network, depending on the cloud platform and region where the service occurs.
The billing of Singdata Lakehouse is mainly based on the following aspects:
- Computing Resources: The billing unit for computing resources is CRU*hour, where 1 CRU*hour represents running with the same computing power for 1 hour in each service region of each cloud platform. The use of synchronous, analytical, and general-purpose computing clusters for data integration or data analysis, tasks processed using Python or Shell scripts, and operations such as automatic materialized views (Auto_MV), data compression, and job scheduling automatically handled by the system will all generate the use of computing resources. We will measure and bill based on the actual amount of computing power consumed.
- Storage Resources: The billing unit for storage resources is GiB, and billing is based on the actual storage capacity you use on Singdata Lakehouse. The following scenarios will occupy storage capacity:
- Data stored in Lakehouse in the form of tables, materialized views, etc.;
- Data deleted but not yet cleaned up within the lifecycle of the data table;
- Cached query results. Items 2 and 3 are currently only measured and temporarily free of charge.
- Data Transmission: The billing unit for data transmission is GB, and billing is based on the amount of data transmitted. The following scenarios will incur data transmission fees: data queries through the public network, full download of query results, and data transmission from Singdata Lakehouse to other data sources. Data transmission traffic may be generated through the Internet, cross-VPC network connections, dedicated lines, or other network connectivity methods. For Internet network traffic, only the data transmission volume flowing out of Singdata Lakehouse is measured, and uploading data to Singdata Lakehouse is free of charge.
2. Billing Methods
Pay-as-you-go
In Singdata Lakehouse, all types of resources are flexibly scalable and used on-demand, and you only need to pay for the amount of resources actually used. Among them:
Computing resources are billed by the second, from the time each computing cluster instance is powered on to the time it is powered off, measured in CRU, and converted into a bill in legal currency. If the time from powering on to powering off the computing cluster is less than 1 minute, it will be billed as 1 minute, so please configure the "auto-shutdown" time of the computing cluster carefully to avoid unnecessary charges if the auto-shutdown time plus the job execution time is less than 1 minute;
Storage resources are billed based on the average value of the storage capacity of all data (including current data of data tables, historical version data, query result cache, etc.) throughout the day;
Network transmission fees are billed based on the actual network traffic generated, with Internet network traffic fees billed based on the traffic flowing out of Lakehouse.
Due to different prices of resources on different cloud platforms and in different regions, the unit prices of computing, storage, and network may vary. Please refer to the "Singdata Lakehouse Price List" for specific prices, and the actual bill within the system shall prevail.
You can view resource usage and cost details on the "Management Center" - "Billing Statement" page. Computing resources and data transmission fees are measured and deducted every hour, and storage resources are measured and deducted once a day based on the average capacity sampled over 24 hours.
Annual Prepaid
Singdata Lakehouse can also provide enterprise customers with specified resource specifications and annual prepaid billing methods. When using annual prepaid, the unit prices of computing and storage resources can offer corresponding discounts. For details, please contact Singdata sales personnel.
3. Billing Principles
3.1 Computing Resource Billing
The billing items for computing resources include: general-purpose computing clusters, analytical computing clusters, synchronous computing clusters, scheduling tasks, and serverless jobs. The billing principles for each item are as follows:
- General-purpose computing clusters
When a general-purpose computing cluster starts and reaches the "running" state, it begins to generate corresponding CRU consumption based on the size and number of instances of the cluster. When the computing cluster enters the "stopping" state, it stops generating CRU consumption.
The specifications of general-purpose computing clusters and the corresponding hourly CRU consumption are as follows:
Computing Cluster Specifications | Hourly CRU Consumption (CRU hours) |
---|---|
1 | 1 |
2 | 2 |
3 | 3 |
4 | 4 |
5 | 5 |
... | ... |
256 | 256 |
- Analytical computing clusters
The billing principle of analytical computing clusters is the same as that of general-purpose computing clusters, starting from the beginning time of the "running" state until entering the "stopping" state.
Analytical computing clusters add "instance" scaling on top of the specifications. When an instance is automatically scaled out, it increases the consumption of a computing cluster of the same specification.
The table below shows the hourly CRU consumption for 1 to 5 instances of analytical computing clusters, with more instances following the same pattern:
Compute Cluster Specifications | 1 Instance Hourly Consumption | 2 Instances Hourly Consumption | 3 Instances Hourly Consumption | 4 Instances Hourly Consumption | 5 Instances Hourly Consumption |
---|---|---|---|---|---|
1 | 1 CRU*hour | 2 CRU*hour | 3 CRU*hour | 4 CRU*hour | 5 CRU*hour |
2 | 2 CRU*hour | 4 CRU*hour | 6 CRU*hour | 8 CRU*hour | 10 CRU*hour |
4 | 4 CRU*hour | 8 CRU*hour | 12 CRU*hour | 16 CRU*hour | 20 CRU*hour |
8 | 8 CRU*hour | 16 CRU*hour | 24 CRU*hour | 32 CRU*hour | 40 CRU*hour |
16 | 16 CRU*hour | 32 CRU*hour | 48 CRU*hour | 64 CRU*hour | 80 CRU*hour |
32 | 32 CRU*hour | 64 CRU*hour | 96 CRU*hour | 128 CRU*hour | 160 CRU*hour |
64 | 64 CRU*hour | 128 CRU*hour | 192 CRU*hour | 256 CRU*hour | 320 CRU*hour |
128 | 128 CRU*hour | 256 CRU*hour | 384 CRU*hour | 512 CRU*hour | 640 CRU*hour |
256 | 256 CRU*hour | 512 CRU*hour | 768 CRU*hour | 1024 CRU*hour | 1280 |
- Synchronous Compute Cluster
Synchronous compute clusters are mainly used for submitting offline integration and real-time integration tasks. Multiple integration tasks can be submitted to the same synchronous compute cluster to reuse resources. The billing principle of synchronous compute clusters is the same as that of general compute clusters, measured from the start time of the "running" state until it enters the "stopping" state. Currently, the "synchronous compute cluster" is in trial operation, and the data integration jobs executed in the synchronous compute cluster are temporarily counted separately into the "offline integration" and "real-time integration" billing items. After the "synchronous compute cluster" is officially operational, data integration fees will be merged into the synchronous compute cluster billing item.
The specifications of the synchronous compute cluster and the corresponding hourly CRU consumption are as follows:
Compute Cluster Specifications | Hourly CRU Consumption |
---|---|
0.25 | 0.25 CRU*hour |
0.5 | 0.5 CRU*hour |
1 | 1 CRU*hour |
2 | 2 CRU*hour |
3 | 3 CRU*hour |
4 | 4 CRU* hours |
5 | 5 CRU* hours |
... | ... |
256 | 256 CRU* hours |
- Task Scheduling
Task scheduling includes two scenarios: running scripts such as Python, shell, etc., and pulling data in real-time data integration tasks.
When running scripts such as Python, shell, etc., the system will start the corresponding computing resources based on the script configuration. The billing is calculated in CRU* hours from the start of the computing resources to the end of the script execution.
During the operation of real-time data integration tasks, Singdata Lakehouse will provide resource scheduling for real-time integration tasks. The resources used for this part of task scheduling will be included in the "Task Scheduling" billing item.
- Serverless Jobs
Serverless jobs refer to jobs that do not require users to actively create computing cluster instances, but are handled by the public computing resources provided by Singdata Lakehouse. This includes query job scheduling, data compression, automatic materialized views, etc.
3.2 Storage Resource Billing
Storage fees are calculated based on the actual storage capacity you use on the Lakehouse platform. When you write data into the Lakehouse data warehouse, the written data and some of its metadata information will occupy storage capacity in Lakehouse. Lakehouse will measure your actual data storage usage, sample multiple times within a day, and use the average value of the sampled storage capacity as the storage capacity measurement value for that day to calculate the billing.
When you use the Time Travel feature of Lakehouse, to ensure data multi-version and recoverability, Lakehouse will automatically back up your data in multiple versions. The multi-version backup data generated at this time will incur corresponding storage fees, charged at the storage capacity unit price.
When you perform SQL queries, to reduce the consumption of computing resources for repeated queries, the query results will be cached, exchanging storage costs for computing resource savings. This part of the storage usage will be included in the "Result Cache" and charged at the storage capacity unit price.
3.3 Data Transfer Billing
When you use functions such as data integration to download or export data from Lakehouse in bulk through the public network, Internet network transfer fees will be incurred. Internet network transfer is measured based on the actual amount of data transferred and the fees are calculated accordingly.
Uploading data to Lakehouse from other data sources through the Internet network will not incur network transfer fees.
If you use dedicated lines, Private Link, or other network products to achieve cross-cloud vendor, cross-region, or cross-VPC network connectivity, the network connectivity itself will incur fees. These fees will be charged by Singdata for the part generated on the Singdata Lakehouse side, and the fees generated in your cloud platform account will be directly charged by the cloud platform.
3.4 Other Cloud Resource Billing
When Singdata Lakehouse performs metadata management, parses SQL statements, generates query plans, schedules and allocates query tasks, and merges and cleans data files, it will consume cloud resources. Singdata Lakehouse will measure the consumption of these cloud resources, which are currently free for a limited time. You will be notified one month in advance before the charges begin.
4. Pricing
4.1 CRU* Hour Price
Cloud Vendor | Region | Version | Unit Price |
---|---|---|---|
Alibaba Cloud | Shanghai | Standard Edition | 3.5 RMB/CRU* hour |
Singapore | Standard Edition | 0.8 USD/CRU* hour | |
Tencent Cloud | Beijing | Standard Edition | 3.5 RMB/CRU* hour |
Shanghai | Standard Edition | 3.5 RMB/CRU* hour | |
Guangzhou | Standard Edition | 3.5 RMB/CRU* hour | |
AWS | Beijing | Standard Edition | 9.95 RMB/CRU* hour |
Singapore | Standard Edition | 1.24 USD/CRU* hour |
Note: In addition to the Standard Edition, the platform also offers an Enterprise Edition with enhanced data governance and security capabilities. The CRU*hour unit price for the Enterprise Edition is higher than that of the Standard Edition. Please contact our sales representative for detailed pricing and plans.
4.2 Storage Capacity Price
Cloud Vendor | Region | Version | Storage Capacity Price |
---|---|---|---|
Alibaba Cloud | Shanghai | Standard Edition | 0.12 RMB/GiB/month |
Singapore | Standard Edition | 0.017 USD/GiB/month | |
Tencent Cloud | Beijing | Standard Edition | 0.12 RMB/GiB/month |
Shanghai | Standard Edition | 0.12 RMB/GiB/month | |
Guangzhou | Standard Edition | 0.12 RMB/GiB/month | |
AWS | Beijing | Standard Edition | 0.195 RMB/GiB/month |
Singapore | Standard Edition | 0.025 USD/GiB/month |
Currently, data multi-versioning and result caching fees are temporarily free. A one-month notice will be given before the charges start.
4.3 Data Transfer Pricing
Cloud Provider | Region | Edition | Storage Capacity Price |
---|---|---|---|
Alibaba Cloud | Shanghai | Standard Edition | 0.8 RMB/GB |
Singapore | Standard Edition | 0.081 USD/GB | |
Tencent Cloud | Beijing | Standard Edition | 0.8 RMB/GB |
Shanghai | Standard Edition | 0.8 RMB/GB | |
Guangzhou | Standard Edition | 0.8 RMB/GB | |
AWS | Beijing | Standard Edition | 0.933 RMB/GiB |
Singapore | Standard Edition | 0.12 USD/GiB |
Currently, Internet data transfer fees are temporarily free. A one-month notice will be given before the charges start.
5. Cost Examples in Common Scenarios
General Computing Cluster Cost Example
Taking an Alibaba Cloud Shanghai Enterprise Edition service instance as an example:
- A general computing cluster with a specification of 2CRU, running for 1 hour, with a unit price of 3.5 RMB per CRU*hour, the cost for running this computing cluster for 1 hour is: 1 hour*2CRU*3.5 RMB/CRU*hour = 7 RMB
- A general computing cluster with a specification of 1CRU, running for 1 minute and 20 seconds, with a unit price of 3.5 RMB per CRU*hour, the cost for running this computing cluster is: 1.33 minutes/60 minutes*1CRU*3.5 RMB/CRU*hour=0.078 RMB
- A general computing cluster with a specification of 1CRU, running for 30 seconds, with a unit price of 3.5 RMB per CRU*hour, the cost for running this computing cluster is: 1 minute/60 minutes*1CRU*3.5 RMB/CRU*hour=0.058 RMB. Since the 30-second runtime is less than 1 minute, it is calculated as 1 minute.
Analytical Computing Cluster Cost Example
Taking an Alibaba Cloud Shanghai Enterprise Edition service instance as an example:
-
An analytical cluster with a specification of 2CRU, running for 30 minutes with 1 instance, and then running for 30 minutes with 2 instances, with a unit price of 3.5 RMB per CRU*hour, the cost for running this computing cluster is:
30 minutes/60 minutes*1CRU*1 instance*3.5 RMB/CRU*hour + 30 minutes/60 minutes*1CRU*2 instances*3.5 RMB/CRU*hour = 1.75 RMB + 3.5 RMB = 5.25 RMB
Synchronous Computing Cluster Cost Example
Taking an Alibaba Cloud Shanghai Enterprise Edition service instance as an example:
- A synchronous cluster with a specification of 1CRU, running for 1 hour, with a unit price of 3.5 RMB per CRU*hour, the cost for running this computing cluster for 1 hour is: 1 hour*2CRU*3.5 RMB/CRU*hour = 7 RMB
Offline Integration Task Cost Example
Offline integration tasks are not charged separately but incur the running costs of the synchronous computing cluster to which the tasks are submitted. Offline integration tasks can automatically wake up the synchronous computing cluster. Generally, when estimating offline integration tasks, it can be assumed that 5 single-concurrent offline integration tasks can fully utilize a synchronous computing cluster with a specification of 0.25 CRU.
- Assuming a single-concurrent offline integration task of an Alibaba Cloud Shanghai Enterprise Edition service instance runs for 10 minutes, the computing power consumed is approximately: 1/5 CRU * 10 minutes/60 minutes = 0.033 CRU*hour;
- Since offline integration tasks will wake up the synchronous computing cluster, assuming the synchronous computing cluster of an Alibaba Cloud Shanghai Enterprise Edition service instance has a specification of 1 CRU, and it automatically shuts down after the 10-minute task ends, the cost incurred is: 1CRU*10 minutes/60 minutes*3.5 RMB/CRU*hour = 0.58 RMB;
At this time, the synchronous computing cluster with a specification of 1 CRU is not fully utilized. Submitting 4 more single-concurrent offline integration tasks, assuming they all run for 10 minutes, the cost is still 0.58 RMB. Therefore, reusing the resources of the synchronous computing cluster as much as possible and avoiding using overly large specifications of synchronous computing clusters can effectively save computing resource costs.
Real-time Integration Task Cost Example
Real-time integration tasks are not charged separately but incur the running costs of the synchronous computing cluster to which the tasks are submitted. Real-time integration tasks need to keep their synchronous computing cluster in a "running" state. Therefore, it is necessary to choose a specification for the synchronous computing cluster that closely matches the resource requirements of the real-time integration tasks to avoid resource waste. It can be estimated that 16 single-concurrent real-time integration tasks can fully utilize a synchronous computing cluster with a specification of 1CRU.
Python Script Task Cost Example
When executing Python script tasks, the cost is calculated based on the task execution time and the computing resources consumed. Generally, when a Python script runs for 1 hour, the computing resources used are 0.125 CRU*hours.
- Assume that the execution time of a Python script task for an enterprise edition service instance in Alibaba Cloud Shanghai is 10 minutes, and the cost is: 0.125CRU*hour*10 minutes/60 minutes*3.5 yuan/CRU*hour=0.073 yuan.
Storage Capacity Cost Example
- Assume that under the workspace of an enterprise edition service instance in Alibaba Cloud Shanghai, the storage capacity low point for the whole day is 910GiB, the high point is 1100GiB, and the daily average is 1000GiB, then the storage capacity for that day is: 1000 GiB * 0.12 yuan/GiB/month /30 days = 4 yuan
Network Transmission Price Example
- Assume that under an enterprise edition service instance in Alibaba Cloud Shanghai, the Internet downstream traffic generated in 1 hour is 10 GB, then the Internet network transmission cost for that hour is: 10GB*0.8 yuan/GB=8 yuan