Data Analysis and SQL Guide
Singdata Lakehouse analytical capabilities span five areas: interactive SQL queries and analysis, data writing and modification, real-time and incremental processing, object and storage management, and SQL syntax and function reference.
Data Querying and Analysis
Write SQL directly in Studio with support for standard SQL syntax, including window functions, CTEs, JOINs, subqueries, and more.
Entry document: Data Querying and Analysis
| Scenario | Reference |
|---|---|
| Data deduplication (ROW_NUMBER / DISTINCT / BITMAP) | Data Deduplication |
| Basic filtering and sorting | Basic Data Filtering and Sorting |
| Group aggregation (GROUP BY / ROLLUP / CUBE) | Data Group Aggregation |
| Ranking and percentile analysis (RANK / NTILE / PERCENTILE) | Ranking and Percentile Analysis |
| Time series analysis (YoY / MoM / sliding window) | Time Series Analysis |
| Funnel analysis and user behavior paths | Funnel Analysis and User Behavior |
| Session analysis (Sessionization) | Session Analysis |
| Retention and cohort analysis | Retention and Cohort Analysis |
| Marketing attribution analysis (first/last/linear/time decay) | Marketing Attribution Analysis |
| Hierarchical queries (org chart / BOM) | Hierarchical Query Workaround |
| String processing | String Processing |
| Data pivoting (rows to columns / columns to rows) | Data Pivoting and Transposition |
| Missing value filling and handling | Missing Value Handling |
| Data type conversion | Data Type Conversion |
| JSON data parsing | JSON Data Parsing |
| Cumulative calculations and running totals | Cumulative Calculations and Running Totals |
| BITMAP user segmentation and analysis | BITMAP User Analysis |
| Data sampling and exploration | Data Sampling and Exploration |
| Data comparison and merging (UNION / INTERSECT / EXCEPT) | Data Comparison and Merging |
| Semi-structured data analysis | Semi-structured Data Analysis |
Data Writing and Modification
Entry document: Data Writing and Modification
| Scenario | Reference |
|---|---|
| Batch data insertion (INSERT / INSERT OVERWRITE) | Batch Data Insertion |
| Upsert operations (MERGE INTO) | Upsert Operations |
| Data updates and cleanup (UPDATE / DELETE) | Data Updates and Cleanup |
| Table cloning and quick backup (zero-copy clone) | Table Cloning and Quick Backup |
Real-time and Incremental Processing
Entry document: Real-time and Incremental Processing
| Scenario | Reference |
|---|---|
| Declarative incremental computation (Dynamic Table) | Dynamic Table Development Guide |
| Row-level change capture CDC (Table Stream) | Table Stream Change Data Capture |
| Continuous data ingestion (Pipe) | Continuous Data Ingestion |
Object and Storage Management
Entry document: Object and Storage Management
| Scenario | Reference |
|---|---|
| Views and materialized views | Views and Materialized Views |
| Semantic views (business term queries) | Semantic View Usage Guide |
| Volume file management | Volume File Management |
| Bulk file import/export (COPY INTO) | Bulk File Import/Export |
| External table queries (Parquet / ORC / CSV) | External Table Queries |
| Federated queries (Hive / Databricks / Snowflake) | Federated Queries |
| Cross-instance data sharing | Cross-instance Data Sharing |
| Query acceleration indexes (Bloomfilter / inverted / vector) | Query Acceleration Indexes |
| Historical data lookback (Time Travel) | Historical Data Lookback |
SQL Syntax and Optimization
Entry document: SQL Syntax and Optimization
| Scenario | Reference |
|---|---|
| CREATE TABLE syntax | SQL CREATE TABLE Guide |
| DML considerations | SQL DML Guide |
| SELECT considerations | SQL SELECT Guide |
| JOIN patterns and optimization | SQL Join Guide |
| CTE patterns | SQL With CTE Guide |
| Partitioned table usage | Partitioned Table Guide |
| Generated columns | Generated Columns Guide |
| JSON query syntax | JSON Query Syntax |
| JSON data processing | JSON Data Processing Guide |
| VECTOR data processing | VECTOR Data Processing Guide |
| Execution plan analysis (EXPLAIN) | Execution Plan Analysis |
| Small file compaction optimization | Small File Compaction Optimization |
SQL Function Usage Guide
Entry document: SQL Function Usage Guide
| Scenario | Reference |
|---|---|
| Array and Map processing | Array and Map Processing in Practice |
| Approximate aggregate functions (HyperLogLog / KLL) | Approximate Aggregate Functions in Practice |
| Array expansion and flattening (EXPLODE / UNNEST) | Array Expansion and Flattening in Practice |
| Full-text search and text analysis | Full-text Search and Text Analysis in Practice |
| Vector search and RAG applications | Vector Search and RAG Applications in Practice |
Connecting BI Tools for Reporting
Most mainstream BI tools connect via JDBC / ODBC.
| BI Tool | Reference |
|---|---|
| FineBI | FineBI Connection Guide |
| PowerBI | PowerBI Connection Guide |
| Tableau | Tableau Connection Guide |
| Superset | Superset Connection Guide |
| Other tools | Ecosystem Integration Overview |
Analyzing Data with AI
| Scenario | Solution | Reference |
|---|---|---|
| Ask questions in natural language, AI auto-generates SQL | Data Analytics Agent (DataGPT) | DataGPT Introduction · DataGPT Tutorial |
| Semantic search / RAG applications | Vector search | Vector Search · Vector Search and RAG Applications in Practice |
| Call large language models in SQL | AI functions | AI Function Usage Guide |
| Query using business terms (no JOIN needed) | Semantic views | Semantic View Overview |
Query Performance Optimization
| Scenario | Solution | Reference |
|---|---|---|
| High-frequency complex queries are slow | Materialized views (pre-computation + query rewriting) | Materialized Views |
| Cache repeated query results | Result Cache | Performance Optimization |
| Large table scans are slow | Sort columns / partition design | Table Design Best Practices |
| Too many small files affecting performance | Small file compaction | Small File Compaction Optimization |
| Slow queries, want to locate bottlenecks | Job Profile diagnostics | Job History Analysis |
