Lakehouse Basic Data Filtering and Sorting Guide
Overview
Data querying is the first step in data analysis. Singdata Lakehouse provides complete SQL filtering and sorting capabilities, supporting everything from simple conditional filtering to complex expression combinations. This guide categorizes usage by common business scenarios to help you quickly master efficient data exploration methods.
Quick Navigation
- Basic Column Selection -- Use SELECT to specify return columns
- Conditional Filtering -- Use WHERE to filter data
- Result Sorting -- Use ORDER BY for sorting
- Limit Returned Rows -- Use LIMIT to control result set size
- Exclude Specific Columns -- Use EXCEPT to quickly exclude sensitive fields
SQL Commands Covered
| Command/Clause | Purpose | Applicable Scenario |
|---|---|---|
SELECT | Specify returned columns | Column pruning, reduce data transfer |
SELECT * EXCEPT(...) | Exclude specified columns | Quickly exclude sensitive fields |
WHERE | Row-level filtering | Filter data by condition |
ORDER BY | Result sorting | Sort by single or multiple columns |
LIMIT | Limit returned rows | Data exploration, paginated queries |
Prerequisites
The following examples use a simulated employee table employees:
Basic Column Selection
Explicitly specify the needed columns when querying to avoid returning unnecessary data. In columnar storage, this can significantly improve query performance.
Result:
| emp_name | dept |
|---|---|
| Alice | Engineering |
| Bob | Engineering |
| Carol | Marketing |
| David | Marketing |
| Eve | HR |
Conditional Filtering
Use the WHERE clause to filter data by condition. Lakehouse supports a rich set of comparison and logical operators.
Scenario 1: Numeric Range Filtering
Scenario 2: Multiple Condition Combinations
Scenario 3: IN and BETWEEN
Scenario 4: Fuzzy Matching
Result Sorting
Use ORDER BY to sort query results, supporting ascending (ASC, default) and descending (DESC) order.
Result (multi-column sort):
| emp_name | dept | salary |
|---|---|---|
| Alice | Engineering | 12000 |
| Bob | Engineering | 9500 |
| Eve | HR | 6000 |
| Carol | Marketing | 8500 |
| David | Marketing | 7800 |
Limit Returned Rows
Use LIMIT to control the number of returned rows, commonly used for data exploration and paginated queries.
Result:
| emp_name | dept | salary |
|---|---|---|
| Alice | Engineering | 12000 |
| Bob | Engineering | 9500 |
| Carol | Marketing | 8500 |
Exclude Specific Columns
Use the EXCEPT clause to quickly exclude unneeded columns, especially useful for wide table queries.
Result:
| emp_name | dept | salary | hire_date |
|---|---|---|---|
| Alice | Engineering | 12000 | 2020-03-15 |
| Bob | Engineering | 9500 | 2021-07-01 |
| Carol | Marketing | 8500 | 2019-11-20 |
| David | Marketing | 7800 | 2022-01-10 |
| Eve | HR | 6000 | 2023-05-05 |
Clean Up Test Data
After completing query verification, it is recommended to clean up test tables:
Important Notes
- Column Pruning Optimization: Explicitly specify
SELECTcolumn names and avoidSELECT *to significantly reduce I/O and network transfer. - Partition Pruning: If the table is partitioned, filter directly on the partition column in
WHEREto skip irrelevant partitions. - Sorting Performance:
ORDER BYtriggers a full sort; for large data volumes, use it withLIMIT. - NULL Value Sorting:
ORDER BYplacesNULLvalues last (ascending) or first (descending) by default; useNULLS FIRST/LASTto control.
