Bitmap Type Documentation
Bitmap is an efficient data type in ClickZetta Lakehouse for storing and processing collection-type data. The Bitmap in ClickZetta Lakehouse is 64-bit, optimized using the Roaring Bitmap compression algorithm, enabling efficient storage and processing of large-scale integer sets.
Bitmap represents integer sets through bit-level operations, providing extremely high space compression rates. Compared to directly storing arrays, Bitmap can significantly reduce storage costs while providing fast set operation performance.
Bitmap Characteristics
- 64-bit Integer Support: Supports storing integers in the range of 0 to 264-1
- Efficient Compression: Uses Roaring Bitmap algorithm with minimal space overhead
- Fast Operations: Supports union, intersection, complement, and other set operations with excellent performance
- Binary Serialization: Can be converted to and from binary type for convenient data exchange
- Flexible Querying: Supports set inclusion checking, cardinality calculation, and other operations
Syntax
Creating a Table with Bitmap Column
Example:
Building Bitmap Data
Using the bitmap_build Function
Construct a Bitmap object from an integer array:
Example:
Using the GROUP_BITMAP_STATE Function
Constructs a bitmap type result based on the input expression (expr). This function is typically used to perform grouping operations on integer type data and converts the unique values of each group into a bitmap array.
Example:
Notes
Functional Limitations
- No Comparison Operations: Bitmap type does not support direct comparison operations (<, >, =, !=, etc.)
- No Sorting or Grouping: Bitmap columns cannot be used in ORDER BY, GROUP BY, or DISTINCT operations
- Cannot Be Used as Keys: Bitmap cannot be used as a table's PRIMARY KEY, PARTITION KEY, or CLUSTER KEY
- Query Display Requirements: ClickZetta Java version must be >3.0.21
Data Validity
- Valid Integer Range: The input array for bitmap_build function must contain valid integer values
- Binary Conversion: When using binary_to_bitmap for conversion, the input binary data converted using bitmap_to_binary must be in valid Bitmap serialization format
- NULL Handling: Bitmap itself can be NULL, but NULL values in the array will be ignored
Performance Considerations
- Ideal Use Cases: Bitmap is best suited for storing sparse integer sets or large integer sets
- Set Operations: Large-scale set operations should be completed at the database layer rather than in the application layer
Common Bitmap Functions
For more Bitmap functions, refer to Bitmap documentation
Data Construction Functions
bitmap_build
Constructs a Bitmap object from an integer array.
| Parameter | Description |
|---|---|
| array | Array expression containing integers |
Return Value: bitmap
Data Conversion Functions
bitmap_to_array
Converts a Bitmap to an integer array.
Return Value: array<integer>
bitmap_to_binary
Converts a Bitmap to binary type.
Return Value: binary
binary_to_bitmap
Converts binary type to Bitmap.
Return Value: bitmap
Set Operation Functions
bitmap_and
Computes the intersection (AND operation) of two Bitmaps.
Return Value: bitmap
bitmap_or
Computes the union (OR operation) of two Bitmaps.
Return Value: bitmap
bitmap_xor
Computes the XOR (exclusive OR operation) of two Bitmaps.
Return Value: bitmap
Statistical Functions
bitmap_cardinality
Calculates the number of elements (cardinality) in a Bitmap.
Return Value: bigint
Query Functions
bitmap_contains
Checks whether a Bitmap contains a specified integer.
| Parameter | Description |
|---|---|
| bitmap | Bitmap object |
| element | Integer value to check |
Return Value: boolean
Examples
Example 1: Create Table and Insert Data
Example 2: Query Elements in Bitmap
Execution Result:
| user_id | tags |
|---|---|
| 1 | [1, 3, 5, 7, 9] |
| 2 | [2, 4, 6, 8, 10] |
| 3 | [1, 2, 3, 4, 5] |
| 4 | [5, 6, 7, 8, 9, 10] |
Example 3: Calculate Bitmap Cardinality (Element Count)
Execution Result:
| user_id | tag_count |
|---|---|
| 1 | 5 |
| 2 | 5 |
| 3 | 5 |
| 4 | 6 |
Example 4: Check if Bitmap Contains Specific Element
SQL Execution:
Execution Result:
| user_id | my_tags | has_tag_5 |
|---|---|---|
| 1 | [1, 3, 5, 7, 9] | TRUE |
| 2 | [2, 4, 6, 8, 10] | FALSE |
| 3 | [1, 2, 3, 4, 5] | TRUE |
| 4 | [5, 6, 7, 8, 9, 10] | TRUE |
Example 5: Calculate Common Tags Between Two Users (Intersection)
SQL Execution:
Execution Result:
| common_tags |
|---|
| [1, 3, 5] |
Example 6: Calculate All Tags Between Two Users (Union)
SQL Execution:
Execution Result:
| union_tags |
|---|
| [1, 3, 5, 7, 9, 2, 4, 6, 8, 10] |
Example 7: Bitmap and Binary Conversion
SQL Execution:
Execution Result:
| user_id | original_tags | restored_tags |
|---|---|---|
| 1 | [1, 3, 5, 7, 9] | [1, 3, 5, 7, 9] |
| 2 | [2, 4, 6, 8, 10] | [2, 4, 6, 8, 10] |
| 3 | [1, 2, 3, 4, 5] | [1, 2, 3, 4, 5] |
| 4 | [5, 6, 7, 8, 9, 10] | [5, 6, 7, 8, 9, 10] |
Writing Bitmap Data Using SDK
Java SDK Example
Use BulkloadStream in ClickZetta Java SDK to write Bitmap data in bulk. You need to use RoaringBitmap to construct Bitmap objects.
Constructing Bitmap Objects:
Maven Dependency
Add the following dependency to pom.xml with version greater than 3.0.23:
Python SDK Example
Python Dependencies:
Constructing Bitmap Objects:
Best Practices
-
Choose Appropriate Data Types: When you need to store integer sets, prioritize Bitmap, especially for large or sparse collections
-
Complete Operations at Database Layer: Fully leverage Bitmap's set operation functions to complete intersection, union, and other operations at the database layer, reducing data transfer
-
Use Conversion Functions Appropriately:
- Use
bitmap_to_arrayfor display and debugging - Use
bitmap_to_binaryfor persistent storage
- Use
-
Performance Optimization:
- Use Bitmap instead of Array for set operations on large-scale datasets
- Use
bitmap_cardinalityfor counting instead of converting to array and then counting
