Create BLOOMFILTER Index
Description
A Bloom Filter is a probabilistic data structure used to determine whether an element belongs to a set. This feature allows users to create Bloom Filter indexes in a table to improve query efficiency. For a detailed introduction, refer to Bloomfilter Index
Syntax
CREATE BLOOMFILTER INDEX [IF NOT EXISTS] index_name ON TABLE
[schema].table_name(column_name)
[COMMENT 'comment']
[PROPERTIES ('key'='value')]
bloomfilter: Index type, using the Bloom filter algorithm.
index_name: The name of the index to be created, which must be under the specified schema and cannot be duplicated within the same schema.
schema: Optional parameter, used to specify the schema name of the table.
table_name: The name of the table to create the index on.
column_name: The name of the column to create the index on, currently only single-column indexes are supported.
COMMENT: Optional parameter, used to add a description of the index.
PROPERTIES: Optional parameter, reserved properties for Lakehouse, facilitating the addition of properties in the future.
Reference Documentation
Usage Notes
- After adding the BLOOMFILTER INDEX, it will only take effect for newly written data. Old data will not be affected. If old data needs to be effective, the data must be rewritten.
- A table can have multiple Bloom filter indexes.
- Type restrictions: interval, struct, map, array, and other types are not supported. If unsupported types are used, the system will report an error.
Instructions
BLOOMFILTER INDEX Instructions
Examples
- Specify BLOOMFILTER index when creating a table
DROP TABLE IF EXISTS t;
CREATE TABLE t (
order_id INT,
customer_id INT,
amount DOUBLE,
order_year string,
order_month string,
INDEX order_id_index (order_id) BLOOMFILTER COMMENT 'BLOOMFILTER'
);
INSERT INTO t
VALUES (1, 101, 100.0, '2023','01'),
(2, 102, 200.0, '2023','02'),
(3, 103, 300.0, '2023','03'),
(4, 104, 400.0, '2023','04'),
(5, 105, 500.0, '2023','04');
SHOW INDEX FROM t;
+----------------+--------------+
| index_name | index_type |
+----------------+--------------+
| order_id_index | bloom_filter |
+----------------+--------------+
DESC INDEX order_id_index;
+--------------------+-------------------------+
| info_name | info_value |
+--------------------+-------------------------+
| name | order_id_index |
| creator | UAT_TEST |
| created_time | 2024-12-26 21:20:43.914 |
| last_modified_time | 2024-12-26 21:20:43.914 |
| comment | BLOOMFILTER |
| index_type | bloom_filter |
| table_name | t |
| table_column | order_id |
+--------------------+-------------------------+
DROP INDEX order_id_index;
- Add BLOOMFILTER Index to Existing Columns
CREATE BLOOMFILTER INDEX customer_id_index
ON TABLE public.t (customer_id)
COMMENT 'xx';
-- After the BLOOMFILTER INDEX is added, it will only take effect for newly written data. Old data will not be affected. If old data needs to be affected, the data must be rewritten.
INSERT OVERWRTE t SELECT * FROM t;
DESC INDEX customer_id_index;
+--------------------+-------------------------+
| info_name | info_value |
+--------------------+-------------------------+
| name | customer_id_index |
| creator | UAT_TEST |
| created_time | 2024-12-26 21:24:57.649 |
| last_modified_time | 2024-12-26 21:24:57.649 |
| comment | xx |
| index_type | bloom_filter |
| table_name | t |
| table_column | customer_id |
+--------------------+-------------------------+
DROP INDEX customer_id_index;