Lakehouse Data Update and Cleanup Guide
Overview
In data warehouse operations, it is often necessary to correct erroneous data, clean up expired records, or reset table data. Singdata Lakehouse provides three data modification methods: UPDATE, DELETE, and TRUNCATE, each suitable for different scenarios. This guide categorizes usage by business scenario to help you quickly master safe and efficient data update and cleanup methods.
Quick Navigation
- Conditional Data Update -- Use UPDATE to modify specific rows
- Batch Update -- Use ORDER BY + LIMIT for batch updates
- Conditional Data Deletion -- Use DELETE to clean specific rows
- Truncate Table Data -- Use TRUNCATE to quickly clear entire table
- Update Based on Related Table -- Use subqueries or MERGE INTO for updates
SQL Commands Covered
| Command | Purpose | Applicable Scenario |
|---|---|---|
UPDATE ... SET ... WHERE | Update rows by condition | Fix erroneous data, status changes |
UPDATE ... ORDER BY ... LIMIT | Batch update | Safe updates for large data volumes |
DELETE FROM ... WHERE | Delete rows by condition | Clean expired or invalid data |
TRUNCATE TABLE | Clear entire table | Quickly reset table data |
Prerequisites
The following examples use a simulated employee table employees_update:
Conditional Data Update
Use the UPDATE statement to modify rows that meet conditions. Always use the WHERE clause to limit the update scope.
Verify Result:
| emp_id | emp_name | dept | salary |
|---|---|---|---|
| 1 | Alice | Engineering | 13200 |
| 2 | Bob | Engineering | 10450 |
| 3 | Carol | Marketing | 8500 |
| 4 | David | Marketing | 7800 |
| 5 | Eve | HR | 6000 |
Batch Update
When the number of rows to update is very large, use ORDER BY + LIMIT for batch updates to avoid long locks.
Verify Result:
| emp_id | emp_name | status |
|---|---|---|
| 1 | Alice | reviewed |
| 2 | Bob | reviewed |
| 3 | Carol | active |
| 5 | Eve | active |
Conditional Data Deletion
Use the DELETE statement to delete rows that meet conditions. Again, use the WHERE clause to limit the deletion scope.
Verify Result:
| emp_id | emp_name | dept | salary | status |
|---|---|---|---|---|
| 1 | Alice | Engineering | 13200 | reviewed |
| 2 | Bob | Engineering | 10450 | reviewed |
| 3 | Carol | Marketing | 8500 | active |
| 5 | Eve | HR | 6000 | active |
Truncate Table Data
Use TRUNCATE TABLE to quickly clear all table data while preserving the table structure. More efficient than DELETE.
Verify Result:
| COUNT(*) |
|---|
| 0 |
Update Based on Related Table
When updating a table based on data from another table, use subqueries or MERGE INTO.
Verify Result:
| emp_id | emp_name | salary |
|---|---|---|
| 1 | Alice | 13700 |
| 2 | Bob | 10750 |
Clean Up Test Data
After completing update and cleanup verification, it is recommended to clean up test tables:
Important Notes
- WHERE Clause: Always use
WHEREwithUPDATEandDELETEto avoid accidental full-table operations. - Transactionality: A single
UPDATEorDELETEis an atomic operation -- either all rows succeed or all fail. - TRUNCATE Is Irrecoverable:
TRUNCATEoperations do not retain historical versions and cannot be recovered through Time Travel. - Dynamic Table Limitation: Dynamic Tables do not support direct
UPDATEorDELETE; data is refreshed automatically by upstream table changes. - Batch Operations: For large-volume updates/deletes, use
ORDER BY + LIMITto execute in batches to avoid prolonged resource usage.
