AI_CLASSIFY
AI_CLASSIFY is an AI text/image classification function provided by Singdata Lakehouse. It automatically assigns input content to user-defined categories — no model training, no prompt writing. One line of SQL is all it takes.
Syntax
| Parameter | Type | Required | Description |
|---|---|---|---|
model | STRING | Yes | Model identifier; supports endpoint: and connection: sources |
content | STRING or image reference | Yes | Text to classify, or GET_PRESIGNED_URL(...) AS image |
labels | ARRAY | Yes | Category array: ARRAY('category1', 'category2', ...) |
options | JSON literal | No | Optional parameters (timeout, concurrency, model params) |
Return value: STRING — the best-matching category name (plain string, not JSON).
model Parameter
Method 1: API Gateway Endpoint (Recommended)
A platform administrator pre-configures model services in the API Gateway. Regular users reference them with the endpoint: prefix, without needing to know the underlying connection details.
Method 2: API Connection Object
Users create their own connection objects via CREATE API CONNECTION, suitable for custom service addresses, authentication keys, or private deployment models.
CREATE API CONNECTION field descriptions:
| Field | Description |
|---|---|
TYPE | Fixed as ai_function |
PROVIDER | Model provider identifier, e.g. 'bailian', 'openai', 'anthropic' |
BASE_URL | Base API URL of the model service |
API_KEY | Authentication key for calling the service |
Quick Start
Use Cases
Case 1: Product Classification
| product_name | category |
|---|---|
| iPhone | electronics |
| Dior dress | clothing |
| Oreo cookies | food |
Case 2: Image Classification
Case 3: News Classification
Case 4: Customer Support Ticket Routing
Case 5: Batch Classification with options
Multilingual Support
AI_CLASSIFY natively supports classification in 29+ languages based on the model you choose, including:
| Language family | Supported languages |
|---|---|
| CJK | Chinese, Japanese, Korean |
| Latin | English, French, Spanish, Portuguese, German, Italian |
| Southeast Asian | Vietnamese, Thai, Indonesian |
| Other | Arabic, Russian, Polish, Dutch, Turkish, and more |
Same-language classification
Input and labels in the same language:
Cross-language classification
Input and labels can be in different languages:
options Parameter
| Parameter | Type | Description |
|---|---|---|
model.params.enable_thinking | boolean | Set to false to disable thinking mode for faster responses (recommended for batch classification) |
response.timeout | string (seconds) | Per-call timeout |
task.concurrency | string (integer) | Batch processing concurrency |
NULL and Empty Input Behavior
| Input | Return value | Notes |
|---|---|---|
| content is NULL | NULL | NULL is passed through |
| content is empty string | "" | Returns empty string (not NULL) |
| Normal text | Matching category name | Plain string |
Best Practices
-
Use descriptive category names — Use meaningful names (e.g. "electronics" rather than "cat_1"). The model understands categories through semantic meaning.
-
Keep the number of categories reasonable — 2–10 categories works best. Too many categories may reduce accuracy.
-
Disable thinking for speed — For batch classification, set
enable_thinking:falseto significantly reduce response time. -
Filter before classifying — For large tables, use
WHEREto narrow the scope first, avoiding unnecessary model calls. -
Leverage cross-language capability — Labels can be in English even when input is in another language, making downstream processing consistent.
-
Image classification — Pass images via
GET_PRESIGNED_URL(USER VOLUME, path, expiry) AS image; the model classifies based on image content. -
Guard against empty strings — For columns that may contain empty strings, add
WHERE content IS NOT NULL AND content != ''before classifying.
Limitations
| Item | Description |
|---|---|
| Model parameter | An endpoint must be specified |
| Minimum labels | 1 (recommended ≥ 2; with a single label, that label is always returned) |
| Maximum labels | Recommended ≤ 20; too many reduces accuracy |
| Return value | Single label (one category name string) |
| Image input | Must use GET_PRESIGNED_URL(...) AS image syntax |
| Quota | Subject to AI Gateway tenant token quota limits |
Error Handling
| Error scenario | Error message | Resolution |
|---|---|---|
| Endpoint does not exist | CZLH-67000 No available endpoints found | Check that the endpoint name is correct |
| Quota exceeded | Tenant quota exceeded | Contact your administrator to increase quota |
| Image not found | Failed to fetch image from URL | Check the Volume file path |
