Create Inverted Index
For detailed introduction, refer to Inverted Index introduction
Syntax
CREATE TABLE table_name(
columns_difinition,
INDEX index_name (column_name) INVERTED [COMMENT ''] PROPERTIES('analyzer'='english|chinese|keyword|unicode'),
INDEX index_name (column_name) INVERTED [COMMENT ''] PROPERTIES('analyzer'='english|chinese|keyword|unicode'),
....
INDEX index_name (column_name) INVERTED [COMMENT ''] PROPERTIES('analyzer'='english|chinese|keyword|unicode')
);
columns_difinition: Defines the field information of the table, the last field must be separated by a comma
INDEX: Keyword
index_name: Custom name for the index
column_name: The name of the field to add the index to
INVERTED: Keyword, indicates an inverted index
COMMENT: Specifies the description information of the index
PROPERTIES: Specifies the parameters of the INDEX, currently supports specifying tokenization. Numerical and date types do not need to specify properties, if it is a string type, tokenization must be specified
- Tokenization parameters: 'analyzer'='english|chinese | keyword|unicode', used to specify the text tokenization strategy, suitable for different languages and needs.
- keyword: This type of field will not be tokenized. The entire string is treated as a single root word and stored in the inverted index. When searching, the complete field value needs to be provided for an exact match.
- english: A tokenizer designed specifically for English, only recognizes continuous ASCII letters and numbers, and converts the text to lowercase. Suitable for processing English content, ignoring non-alphanumeric characters.
- chinese: Chinese text tokenization plugin, recognizes Chinese and English characters, filters out punctuation, and converts English letters to lowercase. Suitable for processing mixed Chinese and English text.
- unicode: A tokenizer based on the Unicode text segmentation algorithm, capable of recognizing text boundaries in multiple languages, effectively splitting text into words, and converting letters to lowercase. Suitable for text processing in a multilingual environment.
Reference Documents
Examples
CREATE TABLE inverted_index_test(
id int,
name string,
INDEX id_index (id) INVERTED ,
INDEX name_index (name) INVERTED PROPERTIES('analyzer'='keyword')
);
Adding Inverted Index to Existing Tables
Syntax
CREATE INVERTED INDEX [IF NOT EXISTS] index_name ON TABLE
[schema].table_name(col_name)
[COMMENT 'comment']
PROPERTIES('analyzer'='english|chinese|keyword|unicode')
INVERTED: Index type, inverted index
index_name: Table name, located under schema, index name under schema must be unique
col_name: Column name only supports single column
PROPERTIES: Specify parameters for INDEX, currently supports specifying tokenization. Numerical and date types do not need to specify properties, if it is a string type, tokenization must be specified
Description
Executing CREATE INDEX is only effective for new data. To index existing data, please use the BUILD INDEX command.
Example
- Specify inverted index when creating a table
DROP TABLE IF EXISTS t;
CREATE TABLE t (
order_id INT,
customer_id INT,
amount DOUBLE,
order_year string,
order_month string,
INDEX order_id_index (order_id) INVERTED COMMENT 'INVERTED'
);
INSERT INTO t
VALUES (1, 101, 100.0, '2023','01'),
(2, 102, 200.0, '2023','02'),
(3, 103, 300.0, '2023','03'),
(4, 104, 400.0, '2023','04'),
(5, 105, 500.0, '2023','04');
DESC INDEX order_id_index;
+--------------------+-------------------------+
| info_name | info_value |
+--------------------+-------------------------+
| name | order_id_index |
| creator | UAT_TEST |
| created_time | 2024-12-27 10:39:44.778 |
| last_modified_time | 2024-12-27 10:39:44.778 |
| comment | INVERTED |
| index_type | inverted |
| table_name | t |
| table_column | order_id |
+--------------------+-------------------------+
--Using inverted index query
SELECT *
FROM t
WHERE match_any (order_id, 3);
+----------+-------------+--------+------------+-------------+
| order_id | customer_id | amount | order_year | order_month |
+----------+-------------+--------+------------+-------------+
| 3 | 103 | 300.0 | 2023 | 03 |
+----------+-------------+--------+------------+-------------+
- Add BLOOMFILTER Index to Existing Columns
CREATE INVERTED INDEX order_year_index
ON TABLE public.t (order_year)
PROPERTIES('analyzer'='chinese');
-- After the inverted index is added, only newly written data will take effect. Old data will not take effect. If old data needs to be effective, you can use the BUILD INDEX command.
BUILD INDEX order_year_index ON public.t;
DESC INDEX EXTENDED order_year_index;
+--------------------------+--------------------------+
| info_name | info_value |
+--------------------------+--------------------------+
| name | order_year_index |
| creator | UAT_TEST |
| created_time | 2024-12-27 10:51:58.977 |
| last_modified_time | 2024-12-27 10:51:58.977 |
| comment | |
| properties | (("analyzer","chinese")) |
| index_type | inverted |
| table_name | t |
| table_column | order_year |
| index_size_in_data_file | 0 |
| index_size_in_index_file | 296 |
| total_index_size | 296 |
+--------------------------+--------------------------+
SELECT *
FROM t
WHERE match_all (order_year, '2023');
+----------+-------------+--------+------------+-------------+
| order_id | customer_id | amount | order_year | order_month |
+----------+-------------+--------+------------+-------------+
| 1 | 101 | 100.0 | 2023 | 01 |
| 2 | 102 | 200.0 | 2023 | 02 |
| 3 | 103 | 300.0 | 2023 | 03 |
| 4 | 104 | 400.0 | 2023 | 04 |
| 5 | 105 | 500.0 | 2023 | 04 |
+----------+-------------+--------+------------+-------------+