Materialized View Creation and Usage
Description
Creates a new materialized view in the current or specified schema based on a query against existing tables, and populates the view with the query data.
A materialized view is like an auto-updating query cache -- the optimizer automatically recognizes and uses it to replace slow queries without requiring any application code changes.
For more details, see Materialized View.
How to Choose: Materialized View vs Dynamic Table vs View
| Object | Use Case | Reference Method |
|---|---|---|
| Materialized View | Transparently accelerate existing queries; optimizer auto-rewrites without application code changes | Optimizer auto-replaces; application code is unaware |
| Dynamic Table | Build data pipelines (ODS->DWD->ADS); requires explicit reference to the result table | Explicitly query the dynamic table name |
| View | Logical encapsulation; no data stored; real-time execution on each query | Explicitly query the view name |
Syntax
Required Parameters
- mv_name: Specifies the name of the materialized view.
- AS query: The query statement that the materialized view encapsulates.
Optional Parameters
-
IF NOT EXISTS: Optional. If a materialized view with the specified name already exists, no error is raised, but the materialized view will not be created. Cannot be used together with OR REPLACE.
-
OR REPLACE: In traditional databases, this option is used to replace an old object with a new one within the same transaction and delete the old object's data. However, in the Lakehouse, to support add, drop, and modify operations on materialized views, we retain both data and metadata permission information. This means that even when modifying table structure or SQL logic, existing data is not lost. This feature is especially useful for adding or dropping columns, adjusting SQL processing logic, and changing data types. Note that you must use the BUILD DEFERRED ... DISABLE QUERY REWRITE syntax when making modifications. This syntax disables the materialized view rewrite feature. If the user's change is not a simple drop column / add column (where the added column definition must be simply SELECT-passed through from the table without participating in any computation affecting other columns, such as join key, group key, etc.), then after Create Or Replace, the REFRESH task will degenerate into a full refresh. If you want to experience incremental computation, please use DYNAMIC TABLE.
-
<column_list>:
- You can specify column names or add comment information to the materialized view's columns. You can specify column names but not column types; types are inferred from the SELECT results in
AS <query>. If you want to specify a type, you can explicitly CAST in the SELECT results. - If any column in the table is based on an expression, it is recommended to provide a name for each column, or use aliases in
AS <query>.
- You can specify column names or add comment information to the materialized view's columns. You can specify column names but not column types; types are inferred from the SELECT results in
-
PARTITIONED BY (): Specifies a partition, using a column from <column_list> as the partition. Partitioning is a method of speeding up queries by grouping similar rows together at write time. It enables data pruning and query optimization.
-
CLUSTERED BY: Optional. Specifies a Hash Key. The Lakehouse will hash the specified column and distribute data across data buckets based on the hash value. To avoid data skew and hot spots and improve parallel execution, it is recommended to choose columns with a wide range of values and few duplicate keys as the Hash Key. This is typically effective during JOIN operations. It is recommended to use CLUSTERED BY in scenarios with large data volumes, generally with a bucket size between 128MB and 1GB. If no bucketing is specified, the default is 256 buckets.
- SORTED BY: Optional. Specifies the sort order of fields within a bucket. It is recommended to keep SORTED BY and CLUSTERED BY consistent for better performance. When the SORTED BY clause is specified, row data will be sorted according to the specified columns.
-
COMMENT: Specifies the comment information for the materialized view.
-
BUILD DEFERRED: This is a creation mode for materialized views. Unlike
BUILD IMMEDIATE, which generates data immediately,BUILD DEFERREDallows the materialized view to be created without immediately generating data. The default value is BUILD IMMEDIATE. When using the CREATE OR REPLACE syntax, BUILD DEFERRED is required. -
refreshOption: Optional. Refresh options.
- START WITH timestamp_expr: Specifies the start time, supporting a timestamp expression. If START WITH is not written, the refresh starts from the current time.
timestamp_expressionreturns a result that is a standard timestamp type expression. The earliest timestamp specified by TIMESTAMP AS OF depends on the TIME TRAVEL(data_retention_days) parameter. If the specified version does not exist, an error is returned. If not specified, the current timestamp version data is used. For example:'2023-11-07 14:49:18', i.e., a string coercible to timestamp.cast('2023-11-07 14:49:18 Asia/Shanghai' as timestamp).current_timestamp() - interval '12' hours.- Any other expression that is itself a timestamp or coercible to timestamp.
- interval_time: Specifies the time interval. Supports interval time types. If interval_time is not written but START WITH is, it will only refresh once at the time specified by START WITH. The interval_time intervals are as follows:
- START WITH timestamp_expr: Specifies the start time, supporting a timestamp expression. If START WITH is not written, the refresh starts from the current time.
| Syntax | Description | Example |
|---|---|---|
| INTERVAL '[+ | -]' DAY | Specify DAY interval only |
| INTERVAL '[+ | -]' HOUR | Specify HOUR interval only |
| INTERVAL '[+ | -]' MINUTE | Specify MINUTE interval only |
| INTERVAL '[+ | -]' SECOND | Specify SECOND interval only |
| INTERVAL '[+ | -] ' DAY TO HOUR | Specify both DAY and HOUR intervals |
| INTERVAL '[+ | -] ' DAY TO MINUTE | Specify DAY, HOUR, and MINUTE intervals |
| INTERVAL '[+ | -] ' DAY TO SECOND | Specify DAY, HOUR, MINUTE, and SECOND intervals |
day: Value range [0, 2147483647]. hour: Value range [0, 23]. minute: Value range [0, 59]. second: Value range [0, 59.999999999].
-
The minimum INTERVAL value is 1 minute, which can be expressed as 60 SECOND or 1 MINUTE.
- INTERVAL supports quoted or unquoted forms; the following are equivalent:
- INTERVAL "60 SECOND"
- INTERVAL '60 SECOND'
- INTERVAL 60 SECOND
- INTERVAL supported units: SECOND, MINUTE, HOUR, DAY
- INTERVAL units are case-insensitive; HOUR and hour are equivalent.
- INTERVAL supports quoted or unquoted forms; the following are equivalent:
-
Specify the compute cluster in
refreshOption. Automatic refresh consumes resources, so the compute cluster must be explicitly specified. If not specified, the current session's compute cluster is used by default. You can check the current session's compute cluster usingSELECT current_vcluster().
- DISABLE QUERY REWRITE: This means the materialized view does not support the query rewrite feature. Query rewrite is when the database optimizer automatically determines whether a query against the base table of a materialized view can be satisfied by querying the materialized view instead, thereby avoiding aggregation or join operations and reading directly from the pre-computed materialized view.
Notes
- Incremental refresh of materialized views is based on historical versions of the base table. Historical versions depend on the TIME TRAVEL(data_retention_days) parameter; if the specified version does not exist, an error is returned. This parameter defines how long deleted data is retained. The Lakehouse retains data for one day by default. Based on your business requirements, you can adjust the
data_retention_daysparameter to extend or shorten the data retention period. Note that adjusting the data retention period may affect storage costs. Extending the retention period increases storage requirements and may increase associated costs.
Examples
Example 1: Simple Materialized View
This example demonstrates how to create a table (inventory) and a materialized view (mv_inventory_basic) based on that table. Then, a row of data is inserted, followed by selecting data from the materialized view, refreshing it, and selecting again to observe the effect of manual refresh.
The above SQL statements demonstrate how to create and use a materialized view. In practice, materialized views can significantly improve query performance, especially for complex computations or large data volumes. Refreshing the materialized view ensures its data stays consistent with the source table, providing the latest query results. Note: Since no scheduling parameters were used when creating the materialized view, a manual refresh is performed here.
Example 2: Materialized View with Column List and Comment
Example 3: Materialized View with Partitioning and Clustering
Example 4: Materialized View with Refresh Options and Virtual Compute Cluster
Modify the refresh interval of a materialized view with refresh options:
View the modification result:
You can see that the value of refresh_interval_second is now 60, indicating that the refresh interval was successfully modified. For materialized views that auto-refresh on a schedule, to pause or resume automatic refresh, please refer to: Alter Materialized View
Example 5: Comprehensive Example
Example 6: Create Materialized View and Add Comment
Re-creating a MATERIALIZED VIEW can reuse the previous result. The scenario includes adding a column, where the previous MV can be reused and the newly added column shows null for old data.
Example
Related Guides
- Views and Materialized Views: Comparison of view and materialized view usage scenarios, refresh strategies, and typical query acceleration patterns
