BITMAP User Analysis
Overview
BITMAP is an efficient set data structure that uses integer bits to represent set members. In user analysis scenarios, BITMAP's core advantage is: computing intersection, union, and difference on large-scale user ID sets is significantly faster than JOIN or IN subqueries.
Typical scenarios:
- Calculate DAU / MAU (daily/monthly active deduplicated user count)
- Multi-tag audience selection: users matching "VIP AND active in last 7 days"
- Retention analysis: users active for N consecutive days
- Funnel intersection: users who completed both step A and step B
Core Function Reference
| Function | Input Type | Return Type | Description |
|---|---|---|---|
bitmap_build(array) | ARRAY<BIGINT> | bitmap | Build a bitmap object from an array |
bitmap_count(bm) | bitmap | BIGINT | Return the number of elements in the bitmap (cardinality) |
bitmap_to_array(bm) | bitmap | ARRAY<BIGINT> | Convert the bitmap to an array to view members |
TO_BITMAP(n) | BIGINT | bitmap | Build a bitmap containing a single element |
group_bitmap(id) | BIGINT | BIGINT | Aggregate function, returns the cardinality of unique IDs within the group |
group_bitmap_state(id) | BIGINT | bitmap | Aggregate function, returns a bitmap object (for two-phase aggregation) |
group_bitmap_or(bm) | bitmap | BIGINT | Aggregate function, computes union cardinality of multiple bitmaps |
group_bitmap_and(bm) | bitmap | BIGINT | Aggregate function, computes intersection cardinality of multiple bitmaps |
group_bitmap_xor(bm) | bitmap | BIGINT | Aggregate function, computes XOR cardinality of multiple bitmaps |
group_bitmap_merge(bm) | bitmap | BIGINT | Aggregate function, merges multiple bitmap states and returns union cardinality |
bm1 & bm2 | bitmap | bitmap | Intersection (AND) |
bm1 | bm2 | bitmap | bitmap | Union (OR) |
bm1 bm2 | bitmap | bitmap | XOR (symmetric difference) |
Prerequisite Data
Scenario 1: Daily Active Users (DAU)
Results:
| dt | dau |
|---|---|
| 2024-01-01 | 5 |
| 2024-01-02 | 5 |
| 2024-01-03 | 5 |
Scenario 2: Multi-day Deduplicated Active Users (MAU)
Compute the deduplicated active user count across 3 days (union).
Results:
| mau_3days |
|---|
| 9 |
Scenario 3: Users Retained for N Consecutive Days
Compute users active on all 3 consecutive days (intersection).
Results:
| retained_3days |
|---|
| 1 |
Scenario 4: Retention Between Two Days (Date-specific Intersection)
Compute the count of users active on both Jan 1 and Jan 2.
Results:
| day1_day2_common |
|---|
| 3 |
Scenario 5: Multi-tag Audience Selection (Intersection / Union / Difference)
5.1 Intersection: Users meeting multiple tags simultaneously
Results:
| vip_and_active |
|---|
| 3 |
View which specific users:
Results:
| vip_and_active_users |
|---|
| ["1001", "1003", "1005"] |
5.2 Union: Users meeting either tag (deduplicated)
Results:
| vip_or_active |
|---|
| 7 |
5.3 Difference: Users in A but not in B
Singdata Lakehouse does not support the ~ negation operator. The difference set is implemented equivalently as A XOR (A AND B):
Results:
| vip_not_active |
|---|
| 2 |
Scenario 6: User Count Per Tag (GROUP_BITMAP)
Results:
| tag_name | user_count |
|---|---|
| active_7d | 5 |
| buyer | 5 |
| vip | 5 |
Scenario 7: Two-phase Aggregation (Big Data Optimization)
When data volume is large, use group_bitmap_state to generate intermediate bitmaps by partition first, then use group_bitmap_merge to combine them, reducing memory pressure on individual aggregations.
Results:
| mau_3days |
|---|
| 9 |
Notes
group_bitmapreturns BIGINT, not a bitmap object: you cannot apply&,|, `` operations to its result. Usegroup_bitmap_statewhen a bitmap object is needed.- No direct difference operator: the
~negation is not supported. Compute the differenceA - BasA (A & B). bitmap_buildacceptsARRAY<BIGINT>: if data consists of user IDs stored row-by-row, first aggregate them into an array withCOLLECT_LIST, then pass tobitmap_build.- User IDs must be non-negative integers: BITMAP is based on integer bitmaps and does not support negative or non-integer IDs.
bitmap_to_arrayresult is a string array: the returned array element type isSTRING; useCASTif numeric comparison is needed.
