AI_SUMMARIZE
Overview
AI_SUMMARIZE is an AI text summarization function provided by Singdata Lakehouse. It generates concise summaries of input text. Supports Chinese, English, Japanese, and other languages. Use the max_words parameter to control summary length — no prompt writing required. One line of SQL handles text summarization.
Singdata pushes AI computation down to the storage and execution engine layer. AI_SUMMARIZE can be called directly on any text column in a SQL query, freely combined with filters, aggregations, JOINs, and other operations — no need to export data to external systems.
Syntax
Parameters
Required Parameters
| Parameter | Type | Description |
|---|---|---|
model | STRING | Model identifier specifying the AI model to use for summarization |
content | STRING | Text to summarize; supports CHAR, VARCHAR, STRING types |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
max_words | INT | 50 | Target word count for model output. Set to 0 to return the original text (no summarization); negative values cause an error |
model Parameter Details
The model parameter supports two sources:
Source 1: API Gateway Endpoint (Recommended)
A platform administrator pre-configures model services in the API Gateway. Regular users reference them with the endpoint: prefix.
Source 2: API Connection Object
Return Value
Returns STRING containing a summary of the input text.
- Returns
NULLwhen input isNULL - Returns empty string
''when input is empty string'' max_wordsis a target, not a hard limit; actual output may vary by ±20%
Error Behavior
| Error scenario | Error message | Resolution |
|---|---|---|
max_words is negative | max_words must be non-negative, got: -1 | max_words must be ≥ 0 |
| Endpoint does not exist | CZLH-67000 No available endpoints found | Check that the endpoint name is correct |
| Invalid model format (missing prefix) | CZLH-65000 Invalid model coordinates | Use 'endpoint:name' or 'connection:model' format |
| Missing required parameter | CZLH-65000 AI function must have at least two arguments | Ensure both model and content are provided |
Usage Notes
modelis required; omitting it causes the first string argument to be misinterpreted as the model name.max_wordsis a target, not a hard limit. Actual output may vary by ±20% — do not rely on exact word counts in downstream processing.- When
max_words=0, the function returns the original text without calling the model. - Output language automatically follows the input language; no additional specification needed.
- For large tables, use
WHEREto filter down to the rows that need processing first, avoiding unnecessary model calls. - LLM output is non-deterministic; the same input may produce slightly different results across executions.
Examples
Basic Usage
Using Default Word Count
English Text Example
max_words=0 Returns Original Text
Batch Processing Table Data
Using an API Connection
Limitations
| Item | Description |
|---|---|
| model parameter | Must use 'endpoint:name' or 'connection:model' format; cannot be omitted |
| Input length | Limited by the underlying model's context window (qwen3-max-preview: approximately 32K tokens) |
| Output length | max_words is a target; actual output may vary by ±20% |
| Aggregate summarization | No aggregate function version (e.g. summarizing multiple rows grouped by GROUP BY) |
| Model dependency | Requires a configured Endpoint in the AI Gateway, or a pre-created API Connection |
| Result determinism | LLM output is non-deterministic; the same input may produce slightly different results across executions |
