UDF Function Development Guideline
Contents
UDF Function Development Guideline#
1. Background#
Although there are already hundreds of built-in functions, they can not satisfy the needs in some cases. In the past, this could only be done by developing new built-in functions. Built-in function development requires a relatively long cycle because it needs to recompile binary files and users have to wait for new version release. In order to help users to quickly develop computing functions that are not provided by OpenMLDB, we develop the mechanism of user dynamic registration function.
SQL functions can be categorised into single-line functions and aggregate functions. At present, only single-line UDFs are supported. An introduction to single-line functions and aggregate functions can be seen here.
2. Development Procedures#
2.1 Develop UDF functions#
2.1.1 Naming Specification of C++ Built-in Function#
The naming of C++ built-in function should follow the snake_case style.
The name should clearly express the function’s purpose.
The name of a function should not be the same as the name of a built-in function or other custom functions. The list of all built-in functions can be seen here.
2.1.2#
The types of the built-in C++ functions’ parameters should be BOOL, NUMBER, TIMESTAMP, DATE, or STRING. The SQL types corresponding to C++ types are shown as follows:
SQL Type |
C/C++ Type |
---|---|
BOOL |
|
SMALLINT |
|
INT |
|
BIGINT |
|
FLOAT |
|
DOUBLE |
|
STRING |
|
TIMESTAMP |
|
DATE |
|
2.1.3 Parameters and Return Values#
Return Value:
If the output type of the UDF is a basic type, it will be processed as a return value.
If the output type of the UDF is STRING, TIMESTAMP or DATE, it will return through the last parameter of the function.
Parameters:
If the parameter is a basic type, it will be passed by value.
If the output type of the UDF is STRING, TIMESTAMP or DATE, it will be passed by pointer.
The first parameter must be
UDFContext* ctx
. The definition of UDFContext is:
struct UDFContext {
ByteMemoryPool* pool; // Used for memory allocation.
void* ptr; // Used for the storage of temporary variables. Single-line functions don't use it at present.
};
Function Declaration:
The functions must be declared by extern “C”.
2.1.4 Memory Management#
It is not allowed to use
new
operator ormalloc
function to request new memory space in UDF functions.If you need to request additional memory space dynamically, please use the memory management interface provided by OpenMLDB. OpenMLDB will automatically free the memory space after the function is executed.
char *buffer = ctx->pool->Alloc(size);
The maximum size of the space allocated at a time cannot exceed 2M bytes.
2.1.5 Implement the Function#
The head file
udf/openmldb_udf.h
should be included.Realize the logic of the function.
#include "udf/openmldb_udf.h" // The headfile
// Realize a UDF which slices the first 2 characters of a given string.
extern "C"
void cut2(UDFContext* ctx, StringRef* input, StringRef* output) {
if (input == nullptr || output == nullptr) {
return;
}
uint32_t size = input->size_ <= 2 ? input->size_ : 2;
//To apply memory space in UDF functions, please use ctx->pool.
char *buffer = ctx->pool->Alloc(size);
memcpy(buffer, input->data_, size);
output->size_ = size;
output->data_ = buffer;
}
For more UDF implementation, see here.
2.2 Compile the Dynamic Library#
Copy the
include
directory (https://github.com/4paradigm/OpenMLDB/tree/main/include
) to a certain path (like/work/OpenMLDB/
) for later compiling.Run the compiling command.
-I
specifies the path ofinclude
directory.-o
specifies the name of the dynamic library.
g++ -shared -o libtest_udf.so examples/test_udf.cc -I /work/OpenMLDB/include -std=c++11 -fPIC
2.3 Copy the Dynamic Library#
The compiled dynamic libraries should be copied into the udf
directories for both TaskManager and tablets. Please create a new udf
directory if it does not exist.
The
udf
directory of a tablet ispath_to_tablet/udf
.The
udf
directory of TaskManager ispath_to_taskmanager/taskmanager/bin/udf
.
For example, if the deployment paths of a tablet and TaskManager are both /work/openmldb
, the structure of the directory is shown below:
/work/openmldb/
├── bin
├── conf
├── taskmanager
│ ├── bin
│ │ ├── taskmanager.sh
│ │ └── udf
│ │ └── libtest_udf.so
│ ├── conf
│ └── lib
├── tools
└── udf
└── libtest_udf.so
Note
Note that, for multiple tablets, the library needs to be copied to every one.
Moreover, dynamic libraries should not be deleted before the execution of
DROP FUNCTION
.
2.4 Register, Drop and Show the Functions#
For registering, please use CREATE FUNCTION.
CREATE FUNCTION cut2(x STRING) RETURNS STRING OPTIONS (FILE='libtest_udf.so');
Note
The types of parameters and return values must be consistent with the implementation of the code.
FILE
specifies the file name of the dynamic library. It is not necessary to include a path.A UDF function can only work on one type. Please create multiple functions for multiple types.
After successful registration, the function can be used.
SELECT cut2(c1) FROM t1;
You can view registered functions through SHOW FUNCTIONS
.
SHOW FUNCTIONS;
Please use the DROP FUNCTION
to delete a registered function.
DROP FUNCTION cut2;