Reasons for Recommendation
The recommended columns can be used as the sort key for the table
The selected columns frequently appear in filter statements. Setting these columns as the sort key for the table can speed up query execution.
How to Enable
- You can set the property on the workspace, which will analyze all tables under the workspace
ALTER WORKSPACE workspace_name SET properties (auto_index='day[,150,5,100]');
For example:
ALTER WORKSPACE quick_start SET properties (auto_index='day');
ALTER WORKSPACE quick_start SET properties (auto_index='day,150,5,100');
--The first parameter is required, specifying whether to collect daily or monthly. Monthly collection is done on the 1st of each month at 6 PM.
--The second parameter specifies the duration of jobs to be used (in minutes), default is 150.
--The third parameter specifies how many times a job needs to be repeated to be used, default is 5.
--The fourth parameter specifies the maximum number of jobs used per column, default is 100.
Automatically collects the top 10 tables used in filter statements for each schema
How to Query
SELECT * FROM information_schema.sortkey_candidates;
--The returned results are as follows:
instance_id,workspace_id,workspace_name,schema_id,schema_name,table_id,table_name,col,statement,ratio,insert_time,p_date
11111111111,855911111111,aaaaaaaaaaa,23671111111111,ddddddddd,84465111,aaaaaaaaaaa.ddddddddd.member,brandid,alter table aaaaaaaaaaa.ddddddddd.member set properties("hint.sort.columns"="brandid"), 2.21%,05:49.1,2024/11/7
11111111111,855911111111,aaaaaaaaaaa,23671111111111,ddddddddd,84465111,aaaaaaaaaa.ddddddddd.member,birthday,alter table aaaaaaaaaa.ddddddddd.member set properties("hint.sort.columns"="birthday"),0.01%,05:49.1,2024/11/7
11111111111,855911111111,aaaaaaaaaaa,23671111111111,ddddddddd,84465111,aaaaaaaaaaa.ddddddddd.member,id,alter table aaaaaaaaaaa.ddddddddd.member set properties("hint.sort.columns"="id"),0.01%,05:49.1,2024/11/7
--It can be seen that for the aaaaaaaaaaa.ddddddddd.member table, the improvement effect of brandid is the highest, so set it as the sortkey alter table aaaaaaaaaaa.ddddddddd.member set properties("hint.sort.columns"="brandid"), this setting will
The included columns are instance_id, workspace_id, workspace_name, schema_id, schema_name, table_id, table_name, col, statement, ratio, insert_time, p_date
. Among them:
p_date
is the partition column, formatted as yyyy-mm-dd.
insert_time
is the time when the collection results are inserted.
col
is the recommended column, currently a single column.
statement
is the SQL statement on how to set the recommended column as the sorting column.
ratio
is the estimated improvement effect, in percentage.
Usage Recommendations
It is recommended to use the analyze statement to collect stats for the table columns, which will improve the accuracy of the recommendations.