AI_COMPLETE

AI_COMPLETE is the core scalar function for generative AI tasks on the Singdata Lakehouse platform. It allows you to invoke large language models (LLMs) directly within the SQL environment, generating responses based on text prompts or multimodal inputs, enabling tasks such as text completion, translation, sentiment analysis, code generation, and complex reasoning.

Singdata pushes AI computation down to the storage layer and execution engine, so data can be intelligently processed within the platform without being transferred to external environments — safeguarding data security while significantly reducing task latency.


Syntax

AI_COMPLETE supports two calling forms:

Text mode

ai_complete(<model>, <prompt> [, json '{}'])

Image mode

ai_complete(<model>, (<prompt> AS prompt, <image_url> AS image) [, json '{}'])


Parameter Description

model (required)

Specifies the language model to invoke. Two sources are supported:

Source 1: API Gateway Endpoint (recommended)

The platform administrator pre-configures model services in the API Gateway (including provider, version, and authentication credentials). Regular users reference them using the endpoint: prefix without needing to know the underlying connection details.

'endpoint:<endpoint_name>' -- Examples 'endpoint:qwen3-max-preview' 'endpoint:qwen3.5-plus' 'endpoint:doubao-seed-2-0-pro-260215'

Source 2: API Connection object

Users create connection objects themselves via CREATE API CONNECTION, suitable for scenarios requiring custom service addresses, authentication keys, or connections to privately deployed models.

-- Create a connection object CREATE API CONNECTION conn_bailian TYPE ai_function PROVIDER = 'bailian' BASE_URL = 'https://dashscope.aliyuncs.com/api/v1' API_KEY = 'sk-xxxxxxxxxxxxxxxxxxxxxxxx'; -- Reference using the connection: prefix SELECT ai_complete('conn_bailian:qwen3.5-plus', 'Briefly introduce the basic principles of quantum computing.');

CREATE API CONNECTION field descriptions:

FieldDescription
TYPEFixed as ai_function
PROVIDERModel provider identifier, e.g. 'bailian', 'openai', 'anthropic'
BASE_URLAPI base address of the model service
API_KEYAuthentication key required to call the service

prompt (required)

The input content to send to the model, of type STRING.

Text mode: pass a string directly:

SELECT ai_complete('endpoint:qwen3-max-preview', 'Explain what a vector database is in one sentence');

Dynamic content can be concatenated using CONCAT or ||:

SELECT ai_complete( 'endpoint:qwen3-max-preview', CONCAT('Summarize the following text in 20 words: ', content) ) AS summary FROM articles;

Image mode: use named tuple syntax to pass both a text prompt and an image URL:

SELECT ai_complete( 'endpoint:doubao-seed-2-0-pro-260215', ('What is in the image?' AS prompt, GET_PRESIGNED_URL(USER VOLUME, 'images/product.jpg', 36000) AS image) );


options (optional)

Passed using the json '{}' literal syntax to control execution behavior and model parameters:

Parameter keyTypeDescription
model.params.temperatureFLOATOutput randomness, range [0, 2]; lower values produce more deterministic output
model.params.max_tokensINTMaximum number of output tokens
model.params.top_pFLOATNucleus sampling probability, range (0, 1]
model.params.enable_thinkingBOOLWhether to enable thinking mode; recommended to set to false for batch processing
response.timeoutSTRINGTimeout per request in seconds, e.g. "60"
task.concurrencySTRINGConcurrency level for batch processing, e.g. "5"

-- Disable thinking mode SELECT ai_complete( 'endpoint:qwen3-max-preview', 'What is artificial intelligence? Answer in one sentence', json '{"model.params":{"enable_thinking":false}}' ) AS result; -- Combine multiple parameters SELECT ai_complete( 'endpoint:qwen3-max-preview', question, json '{"model.params":{"enable_thinking":false},"task.concurrency":"5"}' ) AS answer FROM questions;


Return Value

Returns a STRING — the response text generated by the model.

  • When prompt is NULL or an empty string, returns NULL
  • When the endpoint does not exist, throws CZLH-67000: No available endpoints found
  • When the model format is invalid (missing endpoint: or connection: prefix), throws CZLH-65000: Invalid model coordinates

Usage Examples

Basic Text Completion

SELECT ai_complete('endpoint:qwen3-max-preview', 'What is the capital of China?') AS result;

Batch Processing Table Data

SELECT id, ai_complete( 'endpoint:qwen3-max-preview', question, json '{"model.params":{"enable_thinking":false},"task.concurrency":"5"}' ) AS answer FROM questions;

Dynamic Prompt with CONCAT

SELECT product_id, ai_complete( 'endpoint:qwen3-max-preview', CONCAT('Write a selling point description of no more than 30 words for the following product: ', product_name) ) AS selling_point FROM products;

Image Description

SELECT ai_complete( 'endpoint:doubao-seed-2-0-pro-260215', ('What is in the image?' AS prompt, GET_PRESIGNED_URL(USER VOLUME, 'images/product.jpg', 36000) AS image) ) AS result;

Image with a Specific Question

SELECT ai_complete( 'endpoint:doubao-seed-2-0-pro-260215', ('What is the price of this product? Answer in one sentence.' AS prompt, GET_PRESIGNED_URL(USER VOLUME, 'images/product.jpg', 36000) AS image) ) AS result;

Batch Image Processing

SELECT relative_path, ai_complete( 'endpoint:doubao-seed-2-0-pro-260215', ('Describe the product in the image in one sentence.' AS prompt, GET_PRESIGNED_URL(USER VOLUME, relative_path, 36000) AS image), json '{"model.params":{"enable_thinking":false},"task.concurrency":"3"}' ) AS description FROM (SHOW USER VOLUME DIRECTORY SUBDIRECTORY 'images/products') LIMIT 10;

Combining with Other AI Functions

-- AI_COMPLETE generates text → AI_CLASSIFY categorizes it SELECT ai_classify( 'endpoint:qwen3.5-plus', ai_complete('endpoint:qwen3-max-preview', 'Write a technology news headline'), ARRAY('Technology', 'Sports', 'Finance', 'Entertainment') ) AS category;


Notes

  • Model selection: Image mode requires a model that supports multimodal input (e.g. doubao-seed-2-0-pro-260215). Text-only models such as qwen3-max-preview do not support image input — passing an image does not cause an error, but the image content is ignored and the response is generated from the text prompt only.
  • Thinking mode: The qwen3 model series enables thinking mode by default, which increases latency and token consumption. For batch processing scenarios, it is recommended to disable it via options: json '{"model.params":{"enable_thinking":false}}'.
  • Image field cannot be NULL: Passing NULL for the image field in image mode throws invalid type of image field: void. Ensure GET_PRESIGNED_URL() returns a valid URL.
  • NULL and empty strings: When prompt is NULL or an empty string, the function returns NULL without error.
  • Error codes: A missing endpoint throws CZLH-67000; an invalid model format (missing endpoint: prefix) throws CZLH-65000.
  • Token limits: Different models have different context window limits. Inputs exceeding the limit will be truncated or cause an error. Pay attention to input length when processing long text.