Lakehouse View and Materialized View Guide

Overview

Views and Materialized Views are important tools for encapsulating complex query logic and simplifying data access. A View is a virtual table that computes results dynamically at query time; a Materialized View pre-computes and stores results, supporting query rewrite acceleration. This guide is organized by business scenario to help you quickly master view creation and management methods.


SQL Commands Covered

CommandPurposeUse Case
CREATE VIEWCreate a logical viewEncapsulate JOIN/aggregation to simplify queries
CREATE MATERIALIZED VIEWCreate a materialized viewPre-compute high-frequency query results
REFRESH MATERIALIZED VIEWRefresh a materialized viewUpdate pre-computed data
DROP VIEW / DROP MATERIALIZED VIEWDrop a viewClean up abandoned views

Prerequisites

The following examples use a simulated sales detail table sales_detail:

-- Create source table
CREATE TABLE IF NOT EXISTS sales_detail (
    sale_id INT,
    product_id INT,
    region STRING,
    amount DOUBLE,
    sale_date DATE
);

-- Insert test data
INSERT INTO sales_detail VALUES
(1, 101, 'East', 5000, '2024-06-01'),
(2, 102, 'West', 3000, '2024-06-01'),
(3, 101, 'East', 5000, '2024-06-02');

Create a Regular View

Use CREATE VIEW to define a logical view. Views do not store data; they dynamically execute the underlying SQL at query time.

-- Create a view summarizing by region
CREATE VIEW v_region_sales AS
SELECT 
    region,
    COUNT(*) as sale_count,
    SUM(amount) as total_amount
FROM sales_detail
GROUP BY region;

Using the View:

SELECT * FROM v_region_sales ORDER BY total_amount DESC;

Result Explanation:

regionsale_counttotal_amount
East210000
West13000

Create a Materialized View

Use CREATE MATERIALIZED VIEW to create a pre-computed view. Materialized views store actual data and can be automatically rewritten to by the optimizer at query time.

-- Create a materialized view summarizing by date
CREATE MATERIALIZED VIEW mv_daily_sales AS
SELECT 
    sale_date,
    SUM(amount) as daily_total
FROM sales_detail
GROUP BY sale_date;

Query the Materialized View:

SELECT * FROM mv_daily_sales ORDER BY sale_date;

Result Explanation:

sale_datedaily_total
2024-06-018000
2024-06-025000

Refresh a Materialized View

When source table data changes, use REFRESH MATERIALIZED VIEW to update the materialized view.

-- Insert new data into the source table
INSERT INTO sales_detail VALUES (4, 103, 'South', 4000, '2024-06-03');

-- Refresh the materialized view
REFRESH MATERIALIZED VIEW mv_daily_sales;

-- Verify the refresh result
SELECT * FROM mv_daily_sales WHERE sale_date = '2024-06-03';

Result Explanation:

sale_datedaily_total
2024-06-034000

Query Rewrite Verification

The Singdata Lakehouse optimizer automatically rewrites queries on source tables to use materialized views, thereby accelerating response times.

-- Query the source table directly (the optimizer may automatically rewrite it to use mv_daily_sales)
SELECT sale_date, SUM(amount) as total
FROM sales_detail
GROUP BY sale_date;

Drop Views

Use DROP VIEW or DROP MATERIALIZED VIEW to drop views.

-- Drop a regular view
DROP VIEW v_region_sales;

-- Drop a materialized view
DROP MATERIALIZED VIEW mv_daily_sales;

Clean Up Test Data

After completing view verification, it is recommended to clean up the test table:

-- Drop the test table
DROP TABLE IF EXISTS sales_detail;

Notes

  1. View Dependencies: Dropping a source table will invalidate views that depend on it. Use DESC VIEW to check dependency relationships.
  2. Materialized View Refresh: Materialized views are not auto-refreshed by default. For near-real-time data scenarios, consider using Dynamic Tables.
  3. Query Rewrite Conditions: The optimizer only rewrites queries when the materialized view's aggregation dimensions exactly match the query.
  4. Storage Cost: Materialized views consume additional storage space, proportional to the pre-computed result set size.
  5. DROP Syntax: Dropping a materialized view must use DROP MATERIALIZED VIEW, not DROP VIEW.