Data Analysis and SQL Guide

Singdata Lakehouse analytical capabilities span five areas: interactive SQL queries and analysis, data writing and modification, real-time and incremental processing, object and storage management, and SQL syntax and function reference.


Data Querying and Analysis

Write SQL directly in Studio with support for standard SQL syntax, including window functions, CTEs, JOINs, subqueries, and more.

Entry document: Data Querying and Analysis

ScenarioReference
Data deduplication (ROW_NUMBER / DISTINCT / BITMAP)Data Deduplication
Basic filtering and sortingBasic Data Filtering and Sorting
Group aggregation (GROUP BY / ROLLUP / CUBE)Data Group Aggregation
Ranking and percentile analysis (RANK / NTILE / PERCENTILE)Ranking and Percentile Analysis
Time series analysis (YoY / MoM / sliding window)Time Series Analysis
Funnel analysis and user behavior pathsFunnel Analysis and User Behavior
Session analysis (Sessionization)Session Analysis
Retention and cohort analysisRetention and Cohort Analysis
Marketing attribution analysis (first/last/linear/time decay)Marketing Attribution Analysis
Hierarchical queries (org chart / BOM)Hierarchical Query Workaround
String processingString Processing
Data pivoting (rows to columns / columns to rows)Data Pivoting and Transposition
Missing value filling and handlingMissing Value Handling
Data type conversionData Type Conversion
JSON data parsingJSON Data Parsing
Cumulative calculations and running totalsCumulative Calculations and Running Totals
BITMAP user segmentation and analysisBITMAP User Analysis
Data sampling and explorationData Sampling and Exploration
Data comparison and merging (UNION / INTERSECT / EXCEPT)Data Comparison and Merging
Semi-structured data analysisSemi-structured Data Analysis

Data Writing and Modification

Entry document: Data Writing and Modification

ScenarioReference
Batch data insertion (INSERT / INSERT OVERWRITE)Batch Data Insertion
Upsert operations (MERGE INTO)Upsert Operations
Data updates and cleanup (UPDATE / DELETE)Data Updates and Cleanup
Table cloning and quick backup (zero-copy clone)Table Cloning and Quick Backup

Real-time and Incremental Processing

Entry document: Real-time and Incremental Processing

ScenarioReference
Declarative incremental computation (Dynamic Table)Dynamic Table Development Guide
Row-level change capture CDC (Table Stream)Table Stream Change Data Capture
Continuous data ingestion (Pipe)Continuous Data Ingestion

Object and Storage Management

Entry document: Object and Storage Management

ScenarioReference
Views and materialized viewsViews and Materialized Views
Semantic views (business term queries)Semantic View Usage Guide
Volume file managementVolume File Management
Bulk file import/export (COPY INTO)Bulk File Import/Export
External table queries (Parquet / ORC / CSV)External Table Queries
Federated queries (Hive / Databricks / Snowflake)Federated Queries
Cross-instance data sharingCross-instance Data Sharing
Query acceleration indexes (Bloomfilter / inverted / vector)Query Acceleration Indexes
Historical data lookback (Time Travel)Historical Data Lookback

SQL Syntax and Optimization

Entry document: SQL Syntax and Optimization

ScenarioReference
CREATE TABLE syntaxSQL CREATE TABLE Guide
DML considerationsSQL DML Guide
SELECT considerationsSQL SELECT Guide
JOIN patterns and optimizationSQL Join Guide
CTE patternsSQL With CTE Guide
Partitioned table usagePartitioned Table Guide
Generated columnsGenerated Columns Guide
JSON query syntaxJSON Query Syntax
JSON data processingJSON Data Processing Guide
VECTOR data processingVECTOR Data Processing Guide
Execution plan analysis (EXPLAIN)Execution Plan Analysis
Small file compaction optimizationSmall File Compaction Optimization

SQL Function Usage Guide

Entry document: SQL Function Usage Guide

ScenarioReference
Array and Map processingArray and Map Processing in Practice
Approximate aggregate functions (HyperLogLog / KLL)Approximate Aggregate Functions in Practice
Array expansion and flattening (EXPLODE / UNNEST)Array Expansion and Flattening in Practice
Full-text search and text analysisFull-text Search and Text Analysis in Practice
Vector search and RAG applicationsVector Search and RAG Applications in Practice

Connecting BI Tools for Reporting

Most mainstream BI tools connect via JDBC / ODBC.

BI ToolReference
FineBIFineBI Connection Guide
PowerBIPowerBI Connection Guide
TableauTableau Connection Guide
SupersetSuperset Connection Guide
Other toolsEcosystem Integration Overview

Analyzing Data with AI

ScenarioSolutionReference
Ask questions in natural language, AI auto-generates SQLData Analytics Agent (DataGPT)DataGPT Introduction · DataGPT Tutorial
Semantic search / RAG applicationsVector searchVector Search · Vector Search and RAG Applications in Practice
Call large language models in SQLAI functionsAI Function Usage Guide
Query using business terms (no JOIN needed)Semantic viewsSemantic View Overview

Query Performance Optimization

ScenarioSolutionReference
High-frequency complex queries are slowMaterialized views (pre-computation + query rewriting)Materialized Views
Cache repeated query resultsResult CachePerformance Optimization
Large table scans are slowSort columns / partition designTable Design Best Practices
Too many small files affecting performanceSmall file compactionSmall File Compaction Optimization
Slow queries, want to locate bottlenecksJob Profile diagnosticsJob History Analysis