Lakehouse SQL: CREATE FUNCTION Syntax and Usage

Overview

The CREATE FUNCTION statement is used to create SQL scalar functions or table functions in Lakehouse. These functions can accept a set of parameters and return a scalar value or a set of rows. This article will provide a detailed explanation of its syntax, parameters, and usage examples.

Syntax

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
    function_name ( [ function_parameter [, ...] ] )
     [ RETURNS data_type | RETURNS TABLE(column_spec)]
   AS|RETURN { expression | query };

function_parameter
    parameter_name data_type;

column_spec
    column_name data_type;

Parameter Description

  • OR REPLACE
    • If specified, it will replace an existing function with the same name and signature (number and type of parameters).
    • Cannot be used with IF NOT EXISTS.
  • IF NOT EXISTS: If specified, the function will only be created if it does not already exist.
  • function_name: The name of the function, which can be schema-qualified.
  • function_parameter
    • parameter_name: The parameter name must be unique within the function.
    • data_type: Any supported data type.
  • RETURNS data_type
    • The return data type for scalar functions. If not provided, it will be derived from the function body.
  • AS|RETURN {expression | query}
    • The body of the function. For scalar functions, it can be an expression or a query; for table functions, it must be a query.
  • RETURNS TABLE:Defines the returned table structure, requiring column names and data types to be specified.

Examples

Creating and Using SQL Scalar Functions

-- Create a function with a specified return type
CREATE FUNCTION public.area(x DOUBLE, y DOUBLE) 
RETURNS DOUBLE
RETURN x * y;

-- Use the function in a query
SELECT public.area(3, 4);
-- Output: 12.0

-- Create a function using AS to specify the logic
CREATE FUNCTION public.area2(x DOUBLE, y DOUBLE)
RETURNS double
AS x * y;

SELECT public.area2(3, 4);

Specifying Return Type

CREATE FUNCTION public.hello()
RETURNS STRING
AS 'Hello World!';
SELECT public.hello();

Using Built-in Functions in Expressions

CREATE FUNCTION public.roll_dice(num_dice INT, num_sides INT)
RETURNS INT
COMMENT 'Roll a number of n-sided dice'
RETURN (rand() * num_sides)::INT + 1;

SELECT public.roll_dice(3, 10);

Returns a table - valued type:

CREATE TABLE employee (
    id INT,
    name STRING,
    deptno INT
);
INSERT INTO employee (id, name, deptno)
VALUES
    (1, 'Alice', 10),
    (2, 'Bob', 10),
    (3, 'Charlie', 20),
    (4, 'David', 10),
    (5, 'Eve', 20);
CREATE OR REPLACE FUNCTION ga1_1.getemps(deptno INT)
    RETURNS TABLE (name STRING)
    RETURN SELECT name FROM employee e WHERE e.deptno = deptno;
SELECT * FROM ga1_1.getemps(10);

Notes

  • When using a function, you must specify the schema where the function was created; otherwise, it will result in a "function not found" error. You can avoid this error by setting the parameter cz.sql.remote.udf.lookup.policy. This parameter dynamically switches the resolution priority between UDFs and built-in functions. The default behavior requires specifying the schema prefix when using a UDF. Example:
    -- Create a function
    CREATE FUNCTION public.lower()
    RETURNS STRING
    AS 'Hello World!';
    
    -- Use the function; the schema must be specified, otherwise it will result in a "function not found" error
    SELECT public.lower();
    
    -- Policy 1: Prioritize built-in functions; the schema prefix is not required. If the name conflicts with a built-in function, the built-in function will be used.
    SET cz.sql.remote.udf.lookup.policy = builtin_first;
    SELECT lower();
    
    -- Policy 2: Prioritize UDFs (suitable for MC/Spark job scenarios). If the name conflicts with a built-in function, the UDF will be used.
    SET cz.sql.remote.udf.lookup.policy = udf_first;
    SELECT lower();