Lakehouse Query Acceleration Index Guide
Overview
In large-scale data query scenarios, full table scans can be time-consuming. Singdata Lakehouse provides multiple index types, including Bloom filters (for equality query acceleration), inverted indexes (for full-text search), and vector indexes (for similarity search). This guide is organized by business scenario to help you quickly master index creation, building, and management methods.
Quick Navigation
- Create Bloom Filter Index -- Accelerate equality filtering on high-cardinality columns
- Create Inverted Index -- Support full-text search on text fields
- Build Index for Existing Data -- Generate indexes for existing data
- View and Drop Indexes -- Manage index lifecycle
- Index Effectiveness Verification -- Confirm queries are using index acceleration
SQL Commands Covered
| Command | Purpose | Use Case |
|---|---|---|
CREATE BLOOMFILTER INDEX | Create Bloom filter index | Equality queries on high-cardinality columns (e.g., ID, phone number) |
CREATE INVERTED INDEX | Create inverted index | Full-text search on text fields (e.g., logs, comments) |
BUILD INDEX | Build index for existing data | Generate indexes for historical data |
SHOW INDEX | View index list | Monitor index status |
DROP INDEX | Drop index | Clean up unused indexes |
Prerequisites
The following examples use a simulated log table app_logs_idx:
Create Bloom Filter Index
Bloom filters are suitable for equality filtering (=, IN) on high-cardinality columns, significantly reducing the amount of data scanned.
Create Inverted Index
Inverted indexes are suitable for full-text search on text fields, supporting Chinese word segmentation and multiple matching modes.
Analyzer Options:
chinese: Chinese word segmentationenglish: English word segmentationkeyword: No segmentation, exact match
Build Index for Existing Data
Newly created indexes do not automatically scan historical data. Use BUILD INDEX to generate indexes for existing data.
View and Drop Indexes
Use SHOW INDEX to view all indexes on a table, and DROP INDEX to clean up indexes no longer needed.
Index Effectiveness Verification
Once created, queries automatically leverage indexes for acceleration without modifying SQL syntax.
Clean Up Test Data
After completing index verification, it is recommended to clean up the test table:
Notes
- Bloom Filter Limitations: Does not support
LIKE, range queries (>,<), or complex types (ARRAY/MAP/STRUCT). - Inverted Index Building:
BUILD INDEXis only effective for inverted indexes and vector indexes; Bloom filters do not support historical data building. - Storage Overhead: Indexes consume additional storage space, typically 5%-20% of the original table.
- Write Performance: Indexes slightly increase
INSERT/UPDATElatency because the index structure must be updated simultaneously. - Automatic Application: The query optimizer automatically determines whether to use an index; no explicit specification is needed in SQL.
