EXTERNAL FUNCTION
Overview
EXTERNAL FUNCTION (REMOTE FUNCTION) is a custom function (UDF) created in Singdata Lakehouse using Python & Java languages, executed through remote services (supported remote services include: Alibaba Cloud Function Compute FC, Tencent Cloud Function Service). During execution, it can call
- Online services: Online services provided in the form of APIs, such as AI online model services (e.g., large language model APIs, online AI API services provided by cloud platforms)
- Offline functions: Offline service packages that bundle specific function code, dependency libraries, models, and data files. For example, image recognition models downloaded from Hugging Face, etc.
Singdata Lakehouse saves the connection access information of external cloud function computing services in the metadata by creating an API CONNECTION. EXTERNAL FUNCTION calls external function computing services through the HTTP protocol to process data and return results.
The Lakehouse platform, through the user's pre-authorization, will automatically create functions under the cloud function service in the customer's account when creating external functions. When users use external functions in SQL queries, the external functions achieve secure connections with external computing services, process data, and return query results.
Main Process of Creating EXTERNAL FUNCTION
Please refer to: Usage Process: External Function
- Users activate cloud function computing services (such as Alibaba Cloud Function Compute FC) and object storage services
- Package the function execution code & executable files, dependency libraries, models, and data files, and upload them to object storage
- Grant Singdata Lakehouse permission to operate the above services and access function files
- Users execute connection and external function DDL to generate UDF and use it in queries
Execution Process of EXTERNAL FUNCTION
- Users call External function in Singdata Lakehouse SQL statements
- Singdata Lakehouse sends an HTTP request to call the running function based on the provided service address and authentication information
- Singdata Lakehouse retrieves the response information and returns the result
Advantages of EXTERNAL FUNCTION
- Remote Function can be used to call external rich data processing capabilities, supplementing the traditional SQL computing model. For example, it can call large language models (LLM), images, audio, and video, supplementing SQL's unstructured data processing capabilities
- It can directly access external networks, not constrained by Singdata Lakehouse network
Usage Restrictions
- Currently, only Java and Python programming languages are supported, with supported runtime environments: Java 8 and Python 3.10 versions
- If relying on native libraries (including .so libraries), they need to be compatible with Python 3.10 ABI
- When the program and dependency files exceed 500M after compression, functions need to be created in the form of container images. Please refer to Practice: Using Hugging Face Image Recognition Model to Process Image Data
EXTERNAL FUNCTION Fees
- Supported custom function types: UDF, UDAF, UDTF
- Fees for remote service calls: Refer to the function computing service fee information of cloud vendors (for Alibaba Cloud, please refer to link, for Tencent Cloud, please refer to link)
- Computing fees generated by using Singdata Lakehouse's computing resources
- Data transmission fees: Any fees involving public network data outflow. No charges for intranet
Developing UDF Functions Using REMOTE FUNCTION
Please refer to the following development guides:
- Development Guide: External Function (Java)
- Development Guide: External Function (Python3)