Lakehouse SQL: CREATE FUNCTION Syntax and Usage

Overview

The CREATE FUNCTION statement is used to create SQL scalar functions or table functions in Lakehouse. These functions can accept a set of parameters and return a scalar value or a set of rows. This article will provide a detailed explanation of its syntax, parameters, and usage examples.

Syntax


CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
    function_name ( [ function_parameter [, ...] ] )
     [ RETURNS data_type | RETURNS TABLE(column_spec)]
   AS|RETURN { expression | query };

function_parameter
    parameter_name data_type;

column_spec
    column_name data_type;

Parameter Description

OR REPLACE
- If specified, it will replace an existing function with the same name and signature (number and type of parameters).
- Cannot be used with IF NOT EXISTS.
IF NOT EXISTS: If specified, the function will only be created if it does not already exist.
function_name: The name of the function, which can be schema-qualified.
function_parameter
- parameter_name: The parameter name must be unique within the function.
- data_type: Any supported data type.
RETURNS data_type
- The return data type for scalar functions. If not provided, it will be derived from the function body.
AS|RETURN {expression | query}
- The body of the function. For scalar functions, it can be an expression or a query; for table functions, it must be a query.

RETURNS TABLE：Defines the returned table structure, requiring column names and data types to be specified.

Examples

Creating and Using SQL Scalar Functions


-- Create a function with a specified return type
CREATE FUNCTION public.area(x DOUBLE, y DOUBLE) 
RETURNS DOUBLE
RETURN x * y;

-- Use the function in a query
SELECT public.area(3, 4);
-- Output: 12.0

-- Create a function using AS to specify the logic
CREATE FUNCTION public.area2(x DOUBLE, y DOUBLE)
RETURNS double
AS x * y;

SELECT public.area2(3, 4);

Specifying Return Type


CREATE FUNCTION public.hello()
RETURNS STRING
AS 'Hello World!';
SELECT public.hello();

Using Built-in Functions in Expressions


CREATE FUNCTION public.roll_dice(num_dice INT, num_sides INT)
RETURNS INT
COMMENT 'Roll a number of n-sided dice'
RETURN (rand() * num_sides)::INT + 1;

SELECT public.roll_dice(3, 10);

Returns a table - valued type:


CREATE TABLE employee (
    id INT,
    name STRING,
    deptno INT
);
INSERT INTO employee (id, name, deptno)
VALUES
    (1, 'Alice', 10),
    (2, 'Bob', 10),
    (3, 'Charlie', 20),
    (4, 'David', 10),
    (5, 'Eve', 20);
CREATE OR REPLACE FUNCTION ga1_1.getemps(deptno INT)
    RETURNS TABLE (name STRING)
    RETURN SELECT name FROM employee e WHERE e.deptno = deptno;
SELECT * FROM ga1_1.getemps(10);

Notes

When using a function, you must specify the schema where the function was created; otherwise, it will result in a "function not found" error. You can avoid this error by setting the parameter cz.sql.remote.udf.lookup.policy. This parameter dynamically switches the resolution priority between UDFs and built-in functions. The default behavior requires specifying the schema prefix when using a UDF. Example:


-- Create a function
CREATE FUNCTION public.lower()
RETURNS STRING
AS 'Hello World!';

-- Use the function; the schema must be specified, otherwise it will result in a "function not found" error
SELECT public.lower();

-- Policy 1: Prioritize built-in functions; the schema prefix is not required. If the name conflicts with a built-in function, the built-in function will be used.
SET cz.sql.remote.udf.lookup.policy = builtin_first;
SELECT lower();

-- Policy 2: Prioritize UDFs (suitable for MC/Spark job scenarios). If the name conflicts with a built-in function, the UDF will be used.
SET cz.sql.remote.udf.lookup.policy = udf_first;
SELECT lower();