Lakehouse Quick Start Experience
Overview
Welcome to Lakehouse! This guide has designed a series of carefully orchestrated experience projects to guide you step by step through the core features and advantages of Lakehouse.
This guide includes the following experience content:
-
Run Your First SQL Query (2-3 minutes) Experience Lakehouse's easy-to-use SQL analysis environment.
-
Create a Second Compute Cluster (3-6 minutes) Learn how to create and manage different types of computing resources.
-
Compute-Storage Separation Architecture Experience (5-8 minutes) Experience independent scaling of computing and storage resources for efficient resource utilization.
-
Lakehouse Unified Architecture Experience (7-10 minutes) Learn how to process structured and unstructured data in a unified way.
-
Batch and Real-time Unified Experience (5-7 minutes) Experience processing both batch and streaming data on a unified platform.
-
Vector Search and Inverted Index Hybrid Search Experience (7-10 minutes) Learn how to combine semantic search with keyword search for efficient hybrid queries.
-
Data Transformation and Analysis (5-8 minutes) Experience flexible data processing and analysis capabilities.
-
Clean Up Resources (3 minutes) Learn how to clean up created resources to avoid unnecessary resource occupation.
Each practice path has clear operation steps and expected results, helping you quickly master the core features of Lakehouse and experience its powerful capabilities in data processing and analysis.
Preparation
Log into Lakehouse Studio and create a new workspace: lakehouse_quick_experience.

Enter the "Development" page and switch the workspace to the newly created workspace in the upper right corner.

Entry for creating a new SQL worksheet:

Create a new SQL worksheet named "00_Environment_Preparation".

Before starting the experience, we need to create a dedicated Schema and the first Virtual Compute Cluster.
Basic Operations
1. Run Your First SQL Query
In this exercise, you will execute simple SQL queries, create tables, and perform basic analysis. Estimated time: 2-3 minutes.
-
Log into Lakehouse Studio
-
Create a new SQL worksheet named "01_My_First_SQL"
-
Execute the following SQL commands:
-
Create a simple table and insert data:
-
Run an aggregation query:
Tip: You can see that without any complex configuration, you can immediately execute SQL queries, create tables, insert data, and perform analysis. This demonstrates Lakehouse's ease of use.
2. Create a Second Compute Cluster
Next, let's create a different type of compute cluster to understand how to choose the right computing resources for different scenarios. Estimated time: 3-6 minutes.
-
Create a new SQL worksheet named "02_Create_Analytics_Cluster"
-
Check the current environment:
-
Create an Analytics-type Virtual Compute Cluster:
-
Switch to the newly created cluster and test:
-
View cluster information:
Core Architecture Experience
3. Compute-Storage Separation Architecture Experience
Compute-storage separation is a core architectural feature of Lakehouse, allowing you to flexibly adjust computing resources without affecting data storage. Estimated time: 5-8 minutes.
-
Create a new SQL worksheet named "03_Compute_Storage_Separation"
-
Prepare the environment:
-
Create a test dataset:
-
Query data on the first cluster:
-
Switch to the second cluster and query the same data:
-
Add new data on the second cluster:
-
Switch back to the first cluster and see the data changes:
4. Lakehouse Unified Architecture Experience
The Lakehouse unified architecture allows you to directly query files in multiple formats with unified SQL, without complex ETL transformations. Estimated time: 7-10 minutes.
-
Create a new SQL worksheet named "04_Unified_Lakehouse"
-
Prepare the environment:
-
Create and prepare data:
-
Export data in multiple file formats to User Volume:
-
Create a sales record table and export as CSV format:
-
Use the Analytics cluster to directly query files in different formats:
-
Join queries across different file formats:
5. Batch and Real-time Unified Experience
Lakehouse supports simultaneously processing both batch and streaming data on the same platform. Estimated time: 5-7 minutes.
-
Create a new SQL worksheet named "05_Batch_and_Real_time"
-
Prepare the environment:
-
Create historical and real-time order tables:
-
Create a unified view:
-
Query the unified view:
-
Simulate real-time data writes:
-
Query the statistics again to see changes:
6. Vector Search and Inverted Index Hybrid Search Experience
Lakehouse supports efficient vector search and inverted index search, which can be used to implement hybrid queries combining semantic search and keyword search. Estimated time: 7-10 minutes.
-
Create a new SQL worksheet named "06_Hybrid_Search"
-
Prepare the environment:
-
Create a vector index table:
-
Insert sample data:
-
Test tokenization:
-
Vector search alone:
-
Inverted index search alone:
-
Vector and inverted index hybrid search:
-
Complex hybrid query scenarios:
7. Batch Processing Experience
Lakehouse supports offline batch processing and transformation. Estimated time: 7-10 minutes.
- Create a new SQL worksheet named "07_Batch_Processing_and_Transformation"
- Prepare the environment:
-
Create and populate a sales data table:
-
Create a date dimension table:
-
Create a sales summary table:
-
Switch to the Analytics cluster for interactive analysis:
-
Advanced analysis using window functions:
-
Create a business insights view:
-
Sales ranking and proportion analysis:
Clean Up Resources
After completing the experience, it is recommended to clean up the created resources to avoid unnecessary resource occupation. Estimated time: 3 minutes.
-
Create a new SQL worksheet named "10_Clean_Up_Environment"
-
Prepare the environment:
-
Clean up data tables:
-
Clean up created views:
-
Clean up files in User Volume:
-
Switch to the default cluster:
-
Clean up virtual clusters:
-
Finally clean up the created schema:
Summary and Recommended Paths
Congratulations on completing the Lakehouse feature experience! Through these exercises, you have learned about Lakehouse's core features, including the easy-to-use SQL environment, flexible computing resource management, compute-storage separation architecture, lakehouse unification, batch and real-time unification, and powerful search capabilities.
Based on your role and interests, here are the recommended learning paths:
Data Analyst:
- Run your first SQL query
- Create a second compute cluster
- Lakehouse unified architecture experience
- Data transformation and analysis
Data Engineer:
- Run your first SQL query
- Create a second compute cluster
- Lakehouse unified architecture experience
- Batch and real-time unified experience
- Data transformation and analysis
Data Architect / Manager:
- Run your first SQL query
- Create a second compute cluster
- Compute-storage separation architecture experience
- Lakehouse unified architecture experience
- Batch and real-time unified experience
Now you can start applying Lakehouse to actual business scenarios and enjoy a simple and efficient data processing and analysis experience!
References
Key Concepts Virtual Compute Cluster Volume Vector Index Inverted Index
