Lakehouse Execution Plan Analysis Guide (EXPLAIN)

Overview

In SQL query performance tuning, understanding the query execution plan is the first step. Singdata Lakehouse provides the EXPLAIN command to display the logical and physical execution plan of a query, helping you identify full table scans, inefficient JOINs, and missed index usage. This guide categorizes usage by business scenario to help you quickly master how to view and analyze execution plans.


SQL Commands Covered

CommandPurposeApplicable Scenario
EXPLAIN SELECT ...View logical execution planQuick check of query structure

Prerequisites

The following examples use a simulated order table orders_explain:

-- Create test table
CREATE TABLE IF NOT EXISTS orders_explain (
    order_id INT,
    customer_id INT,
    amount DOUBLE,
    order_date DATE
);

-- Insert test data
INSERT INTO orders_explain VALUES
(1, 101, 500, '2024-06-01'),
(2, 102, 300, '2024-06-02'),
(3, 101, 800, '2024-06-03');

View Execution Plan

Use EXPLAIN to view a query's execution plan and understand how the query will be decomposed and executed.

-- View the execution plan of a query
EXPLAIN SELECT order_id, amount FROM orders_explain WHERE customer_id = 101;

Output Description:

  • PhysicalTableScan: Table scan, reads underlying data.
  • Filter: Row filtering, applies WHERE conditions.
  • Project: Column pruning, reads only needed columns.

Identify Full Table Scans

When a query does not use partition columns or index filtering, the execution plan shows a full table scan (TableScan without filter conditions).

-- Query without partition column (may trigger full table scan)
EXPLAIN SELECT * FROM orders_explain WHERE amount > 400;

Optimization Suggestions:

  • If order_date is a partition column, adding order_date >= '2024-06-01' in WHERE can trigger partition pruning.
  • Creating a Bloom Filter index on the amount column can accelerate equality or range filtering.

Verify Index Hit

After creating an index, use EXPLAIN to confirm whether the query automatically uses the index.

-- Create a Bloom Filter index on customer_id
CREATE BLOOMFILTER INDEX idx_customer_bloom ON TABLE orders_explain(customer_id);

-- View the execution plan (should show index filtering)
EXPLAIN SELECT * FROM orders_explain WHERE customer_id = 101;

Output Changes:

  • In the optimized plan, TableScan will include a BloomFilter or IndexScan marker, indicating that non-matching data blocks are skipped.

Compare Plans Before and After Optimization

After optimizing a query (e.g., adding indexes, adjusting JOIN order), compare the execution plan changes.

-- Before optimization: full table scan
EXPLAIN SELECT * FROM orders_explain WHERE customer_id = 101;

-- After optimization: index scan
-- (After creating the index, run EXPLAIN again and compare the output differences)

Evaluation Metrics:

  • Whether TableScan changes to IndexScan or includes filter conditions.
  • Whether the Cost estimate drops significantly.
  • Whether Statistics estimated scan row count decreases.

Clean Up Test Data

After completing execution plan verification, it is recommended to clean up test tables:

-- Drop test table
DROP TABLE IF EXISTS orders_explain;

Important Notes

  1. EXPLAIN Does Not Execute the Query: EXPLAIN only generates the execution plan and does not actually read data or modify tables.
  2. Dynamic Plans: Execution plans are generated by the optimizer based on current statistics; plans may differ after table data changes.
  3. EXTENDED Output Is Large: For complex queries, EXPLAIN EXTENDED output can be hundreds of lines long. Search by keywords (e.g., Join, Scan, Filter) for efficient navigation.
  4. JSON Format: Use EXPLAIN (FORMAT JSON) to output the plan as JSON, facilitating programmatic parsing or visualization tool integration.