AI_EMBEDDING
Overview
Converts text into embedding vectors. An embedding vector is an abstract numerical representation of the semantic features of text, which can be used to measure the degree of semantic similarity between texts. It is suitable for downstream tasks such as semantic search, text similarity computation, cluster analysis, and recommendation systems.
Syntax
Parameter Description
Required Parameters
model
Specifies the model to use for generating embedding vectors. Two reference methods are supported:
Method 1: Call via API Gateway endpoint
Method 2: Call via API Connection object
First create a connection object using CREATE API CONNECTION, then reference it in the format <connection_name>:<model_name>:
input
The input text used to generate the embedding vector. This can be a single word, a sentence, a paragraph, or a value from a column in a data table.
Optional Parameters
model_parameters
Model hyperparameters passed in as a JSON object. Supported parameters may vary by model. text-embedding-v4 supports the following parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
input | STRING | 'document' | Specifies the purpose type of the input content. Values are 'document' (document content) or 'query' (query text). In retrieval scenarios, use 'document' when indexing documents and 'query' for user queries — the model optimizes vectors differently for each purpose, improving retrieval accuracy. For symmetric tasks such as clustering and classification, use the default 'document'. |
dimensions | STRING | '1024' | Specifies the output vector dimension. text-embedding-v4 supports 8 dimensions: '64', '128', '256', '512', '768', '1024', '1536', '2048'. Higher dimensions generally preserve richer semantic information but consume more storage space and computational resources. |
Return Value
The embedding vector derived from the input text, of type ARRAY<FLOAT>.
- Use the
SIZE()function to get the vector dimension - Use the
COSINE_SIMILARITY()function to compute the cosine similarity between two vectors
Usage Notes
- Results are deterministic: Embedding models are deterministic — the same input text always returns the same vector.
- NULL input returns NULL: When
inputisNULL, the function returnsNULLwithout error. - Distinguish input types in retrieval scenarios: Use
"input": "document"when indexing documents and"input": "query"for user queries to improve retrieval accuracy. - Filter NULLs in batch processing: In batch processing scenarios, it is recommended to filter out NULL rows in advance to avoid query failures caused by mixing NULL and non-NULL data.
Limitations
- Empty strings are not supported: When
inputis an empty string'', the function throws the errorinput.texts should not be nullrather than returning NULL. Filter empty values before calling:WHERE input IS NOT NULL AND LENGTH(input) > 0. - Maximum input length is 8,192 tokens: Exceeding this limit causes an error; the input is not automatically truncated. Chinese text corresponds to roughly 26,000 characters or fewer.
- Maximum 10 items per batch: When calling in batch, a single request can process at most 10 inputs.
Model Specifications (text-embedding-v4)
| Attribute | Specification |
|---|---|
| Model Family | Qwen3-Embedding |
| Supported Dimensions | 64 / 128 / 256 / 512 / 768 / 1024 (default) / 1536 / 2048 |
| Maximum Input Length | 8,192 tokens |
| Supported Languages | Chinese, English, Japanese, Korean, German, French, Spanish, Portuguese, Russian, Indonesian, and 100+ other languages |
