Lakehouse AI Features Overview
Singdata Lakehouse integrates AI capabilities natively into the data platform — you can call large language models, run vector search, and build RAG pipelines directly in SQL, without moving data to an external AI platform.
Selection Guide
| What I want to do | Recommended approach |
|---|---|
| Call an LLM in a SQL query (text classification, summarization, extraction, translation) | AI Functions / AI_COMPLETE |
| Manage and switch between multiple LLM models (OpenAI, Qwen, etc.) | AI Gateway |
| Semantic similarity search, RAG retrieval, image search | Vector Search |
| Call external HTTP services (cloud functions, vision APIs, custom models) | External Function |
| Python data processing + AI inference with a PySpark-like interface | Zettapark |
| Encapsulate business semantics for BI tools and AI Agents | Semantic View |
| Natural language conversational data analysis, zero-barrier data querying | Data Analytics Agent (DataGPT) |
| Let an AI Agent operate Lakehouse directly | CZ-CLI |
Core Capabilities
AI Functions — Call LLMs in SQL
AI_COMPLETE is the most direct entry point: one SQL statement calls an LLM for every row of data, and results appear directly in the query result set.
→ AI Functions Full Documentation · AI_COMPLETE Syntax Reference · AI Gateway Model Management
Vector Search — Semantic Search and RAG
Create vector indexes on tables to support approximate nearest neighbor (ANN) retrieval — suitable for semantic search, knowledge base Q&A, image similarity, and similar scenarios.
→ Vector Search Full Documentation · Vector Index · Full-Text + Vector Hybrid Search Best Practices
External Function — Call External AI Services
Register HTTP services such as Alibaba Cloud Function Compute or Tencent Cloud SCF as SQL functions, and call vision recognition, speech transcription, custom models, and other capabilities directly in queries.
→ External Function Introduction · Development Guide (Python) · Usage Guide
Semantic View — Semantic Layer for AI Agents and BI Tools
Encapsulate multi-table JOINs and aggregation logic as business semantics. BI tools and AI Agents access data through semantic views, hiding the complexity of the underlying table structure and unifying metric definitions.
→ Semantic View Overview · Integration with AI Features · Generate Semantic Views with AI Agent
Zettapark — Python Data Processing and AI Inference
A PySpark-like Python interface for running Python scripts on Lakehouse — suitable for feature engineering, model inference, and complex data processing scenarios that SQL cannot cover.
→ Zettapark Quick Start · Credit Scoring Example · Feature Engineering Example
Typical Scenarios
RAG Knowledge Base Q&A: Ingest documents → vectorize → build vector index → retrieve relevant chunks on user query → AI_COMPLETE generates the answer → Vector Search Guide · Hybrid Search Best Practices
Batch Text Processing: Review sentiment analysis, contract information extraction, multi-language translation → AI Functions Overview
AI-Enhanced BI: Semantic views unify metric definitions; Data Analytics Agent enables natural language data querying → Semantic View Best Practices
Image / Multimodal Processing: Call vision APIs for image classification, OCR → Using Hugging Face Image Recognition Model to Process Image Data
