Bloomfilter Index
Bloomfilter Index is an equality filter acceleration index in Lakehouse. It records the presence of column values at the data file level, allowing the query engine to quickly skip files that do not contain the target value before scanning, reducing I/O.
How It Works
Bloomfilter is a probabilistic data structure that quickly determines "whether a value may exist in a certain data block":
- Not present: 100% accurate, the data block is skipped directly
- May be present: needs actual read to verify (very low probability of false positives)
Applicable Scenarios
| Scenario | Effect |
|---|---|
Equality filtering on high-cardinality columns (e.g., user_id = 12345) | Significantly reduces the number of scanned files |
| Point queries (look up single record by ID) | Greatly improves query speed |
Range queries (BETWEEN, >, <) | Ineffective, not suitable |
| Low-cardinality columns (e.g., gender, status) | Limited benefit, not recommended |
Create Example
Related Documents
- Bloomfilter Index Details
- Index Usage Best Practices
- Inverted Index — full-text search scenarios
