Lakehouse AI Features Overview

Singdata Lakehouse integrates AI capabilities natively into the data platform — you can call large language models, run vector search, and build RAG pipelines directly in SQL, without moving data to an external AI platform.


Selection Guide

What I want to doRecommended approach
Call an LLM in a SQL query (text classification, summarization, extraction, translation)AI Functions / AI_COMPLETE
Manage and switch between multiple LLM models (OpenAI, Qwen, etc.)AI Gateway
Semantic similarity search, RAG retrieval, image searchVector Search
Call external HTTP services (cloud functions, vision APIs, custom models)External Function
Python data processing + AI inference with a PySpark-like interfaceZettapark
Encapsulate business semantics for BI tools and AI AgentsSemantic View
Natural language conversational data analysis, zero-barrier data queryingData Analytics Agent (DataGPT)
Let an AI Agent operate Lakehouse directlyCZ-CLI

Core Capabilities

AI Functions — Call LLMs in SQL

AI_COMPLETE is the most direct entry point: one SQL statement calls an LLM for every row of data, and results appear directly in the query result set.

-- Sentiment analysis on each user review -- Replace endpoint:my_llm with the LLM endpoint name configured in your AI Gateway SELECT review_id, review_text, AI_COMPLETE('endpoint:my_llm', 'Classify the sentiment of the following review as "positive", "negative", or "neutral": ' || review_text) AS sentiment FROM user_reviews;

AI Functions Full Documentation · AI_COMPLETE Syntax Reference · AI Gateway Model Management


Create vector indexes on tables to support approximate nearest neighbor (ANN) retrieval — suitable for semantic search, knowledge base Q&A, image similarity, and similar scenarios.

-- Semantic similarity search: find the 5 most relevant documents -- Replace endpoint:my_embedding with the Embedding endpoint name configured in your AI Gateway SELECT doc_id, content FROM knowledge_base ORDER BY cosine_distance(embedding, AI_EMBEDDING('endpoint:my_embedding', 'user question')) ASC LIMIT 5;

Vector Search Full Documentation · Vector Index · Full-Text + Vector Hybrid Search Best Practices


External Function — Call External AI Services

Register HTTP services such as Alibaba Cloud Function Compute or Tencent Cloud SCF as SQL functions, and call vision recognition, speech transcription, custom models, and other capabilities directly in queries.

External Function Introduction · Development Guide (Python) · Usage Guide


Semantic View — Semantic Layer for AI Agents and BI Tools

Encapsulate multi-table JOINs and aggregation logic as business semantics. BI tools and AI Agents access data through semantic views, hiding the complexity of the underlying table structure and unifying metric definitions.

Semantic View Overview · Integration with AI Features · Generate Semantic Views with AI Agent


Zettapark — Python Data Processing and AI Inference

A PySpark-like Python interface for running Python scripts on Lakehouse — suitable for feature engineering, model inference, and complex data processing scenarios that SQL cannot cover.

Zettapark Quick Start · Credit Scoring Example · Feature Engineering Example


Typical Scenarios

RAG Knowledge Base Q&A: Ingest documents → vectorize → build vector index → retrieve relevant chunks on user query → AI_COMPLETE generates the answer → Vector Search Guide · Hybrid Search Best Practices

Batch Text Processing: Review sentiment analysis, contract information extraction, multi-language translation → AI Functions Overview

AI-Enhanced BI: Semantic views unify metric definitions; Data Analytics Agent enables natural language data querying → Semantic View Best Practices

Image / Multimodal Processing: Call vision APIs for image classification, OCR → Using Hugging Face Image Recognition Model to Process Image Data