Built-in Functions
Contents
Built-in Functions#
Must read#
Types in documents here may a little different from real types in OpenMLDB SQL, for the purpose of simplify. Those type are synonymics.
Type literal |
Same as any of those types in OpenMLDB SQL |
---|---|
|
|
|
|
|
|
|
|
Functions#
function abs#
abs()
Description:
Return the absolute value of expr.
Parameters:
expr
Since: 0.1.0
Example:
SELECT ABS(-32);
-- output 32
Supported Types:
[
bool
][
number
]
function acos#
acos()
Description:
Return the arc cosine of expr.
Parameters:
expr
Since: 0.1.0
Example:
SELECT ACOS(1);
-- output 0
Supported Types:
[
number
]
function add#
add()
Description:
Compute sum of two arguments.
Since: 0.1.0
Example:
select add(1, 2);
-- output 3
Supported Types:
[
bool
,bool
][
bool
,number
][
bool
,timestamp
][
int16
,timestamp
][
int32
,timestamp
][
int64
,timestamp
][
number
,bool
][
number
,number
][
timestamp
,bool
][
timestamp
,int16
][
timestamp
,int32
][
timestamp
,int64
][
timestamp
,timestamp
]
function add_months#
add_months()
Description:
adds an integer months to a given date, returning the resulting date.
Parameters:
start_date Date value to add
num_months Integer value as number of months to add, can be positive or negative
Since: 0.8.0
The resulting day component will remain the same as that specified in date, unless the resulting month has fewer days than the day component of the given date, in which case the day will be the last day of the resulting month. Returns NULL if given an invalid date, or a NULL argument.
SELECT add_months('2016-08-31', 1);
-- 2016-09-30
SELECT add_months('2016-08-31', -1);
-- 2016-07-31
SELECT add_months('2012-01-31', 1);
-- 2012-02-29
Supported Types:
[
date
,int16
][
date
,int32
][
date
,int64
]
function array_contains#
array_contains()
Description:
array_contains(array, value) - Returns true if the array contains the value.
Since: 0.7.0
Example:
select array_contains([2,2], 2) as c0;
-- output true
Supported Types:
[array_bool,
bool
][array_date,
date
][array_double,
double
][array_float,
float
][array_int16,
int16
][array_int32,
int32
][array_int64,
int64
][array_string,
string
][array_timestamp,
timestamp
]
function asin#
asin()
Description:
Return the arc sine of expr.
Parameters:
expr
Since: 0.1.0
Example:
SELECT ASIN(0.0);
-- output 0.000000
Supported Types:
[
number
]
function at#
at()
Description:
alias to lag
function atan#
atan()
Description:
Return the arc tangent of expr If called with one parameter, this function returns the arc tangent of expr. If called with two parameters X and Y, this function returns the arc tangent of Y / X.
Parameters:
X
Y
Since: 0.1.0
Example:
SELECT ATAN(-0.0);
-- output -0.000000
SELECT ATAN(0, -0);
-- output 3.141593
Supported Types:
[
bool
,bool
][
bool
,number
][
number
][
number
,bool
][
number
,number
]
function atan2#
atan2()
Description:
Return the arc tangent of Y / X…
Parameters:
X
Y
Since: 0.1.0
Example:
SELECT ATAN2(0, -0);
-- output 3.141593
Supported Types:
[
bool
,bool
][
bool
,number
][
number
,bool
][
number
,number
]
function avg#
avg()
Description:
Compute average of values.
Parameters:
value Specify value column to aggregate on.
Since: 0.1.0
Example:
value |
---|
0 |
1 |
2 |
3 |
4 |
SELECT avg(value) OVER w;
-- output 2
Supported Types:
[
list<number>
]
function avg_cate#
avg_cate()
Description:
Compute average of values grouped by category key and output string. Each group is represented as ‘K:V’ and separated by comma in outputs and are sorted by key in ascend order.
Parameters:
value Specify value column to aggregate on.
catagory Specify catagory column to group by.
Example:
value |
catagory |
---|---|
0 |
x |
1 |
y |
2 |
x |
3 |
y |
4 |
x |
SELECT avg_cate(value, catagory) OVER w;
-- output "x:2,y:2"
Supported Types:
[
list<number>
,list<date>
][
list<number>
,list<int16>
][
list<number>
,list<int32>
][
list<number>
,list<int64>
][
list<number>
,list<string>
][
list<number>
,list<timestamp>
]
function avg_cate_where#
avg_cate_where()
Description:
Compute average of values matching specified condition grouped by category key and output string. Each group is represented as ‘K:V’, separated by comma, and sorted by key in ascend order.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
false |
y |
2 |
false |
x |
3 |
true |
y |
4 |
true |
x |
SELECT avg_cate_where(value, condition, category) OVER w;
-- output "x:2,y:3"
Supported Types:
[
list<number>
,list<bool>
,list<date>
][
list<number>
,list<bool>
,list<int16>
][
list<number>
,list<bool>
,list<int32>
][
list<number>
,list<bool>
,list<int64>
][
list<number>
,list<bool>
,list<string>
][
list<number>
,list<bool>
,list<timestamp>
]
function avg_where#
avg_where()
Description:
Compute average of values match specified condition.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
Since: 0.1.0
Example:
value |
---|
0 |
1 |
2 |
3 |
4 |
SELECT avg_where(value, value > 2) OVER w;
-- output 3.5
Supported Types:
[
list<number>
,list<bool>
]
function bigint#
bigint()
Description:
alias to int64
function bool#
bool()
Description:
Cast string expression to bool.
Since: 0.1.0
Example:
select bool("true");
-- output true
Supported Types:
[
string
]
function ceil#
ceil()
Description:
Return the smallest integer value not less than the expr.
Parameters:
expr
Since: 0.1.0
Example:
SELECT CEIL(1.23);
-- output 2
Supported Types:
[
bool
][
number
]
function ceiling#
ceiling()
Description:
alias to ceil
function char#
char()
Description:
Returns the ASCII character having the binary equivalent to expr. If n >= 256 the result is equivalent to char(n % 256).
Since: 0.6.0
Example:
SELECT char(65);
--output "A"
Supported Types:
[
int32
]
function char_length#
char_length()
Description:
Returns the length of the string. It is measured in characters and multibyte character string is not supported.
Since: 0.6.0
Example:
SELECT CHAR_LENGTH('Spark SQL ');
--output 10
Supported Types:
[
string
]
function character_length#
character_length()
Description:
alias to char_length
function concat#
concat()
Description:
This function returns a string resulting from the joining of two or more string values in an end-to-end manner. (To add a separating value during joining, see concat_ws.)
Since: 0.1.0
Example:
select concat("1", 2, 3, 4, 5.6, 7.8, Timestamp(1590115420000L));
-- output "12345.67.82020-05-22 10:43:40"
Supported Types:
[…]
function concat_ws#
concat_ws()
Description:
Returns a string resulting from the joining of two or more string value in an end-to-end manner. It separates those concatenated string values with the delimiter specified in the first function argument.
Since: 0.1.0
Example:
select concat_ws("-", "1", 2, 3, 4, 5.6, 7.8, Timestamp(1590115420000L));
-- output "1-2-3-4-5.6-7.8-2020-05-22 10:43:40"
Supported Types:
[
any
, …]
function cos#
cos()
Description:
Return the cosine of expr.
Parameters:
expr It is a single argument in radians.
Since: 0.1.0
Example:
SELECT COS(0);
-- output 1.000000
The value returned by cos() is always in the range: -1 to 1.
Supported Types:
[
number
]
function cot#
cot()
Description:
Return the cotangent of expr.
Parameters:
expr
Since: 0.1.0
Example:
SELECT COT(1);
-- output 0.6420926159343306
Supported Types:
[
number
]
function count#
count()
Description:
Compute number of values.
Parameters:
value Specify value column to aggregate on.
Since: 0.1.0
Example:
value |
---|
0 |
1 |
2 |
3 |
4 |
SELECT count(value) OVER w;
-- output 5
Supported Types:
[
list<any>
][
list<row>
]
function count_cate#
count_cate()
Description:
Compute count of values grouped by category key and output string. Each group is represented as ‘K:V’ and separated by comma in outputs and are sorted by key in ascend order.
Parameters:
value Specify value column to aggregate on.
catagory Specify catagory column to group by.
Example:
value |
catagory |
---|---|
0 |
x |
1 |
y |
2 |
x |
3 |
y |
4 |
x |
SELECT count_cate(value, catagory) OVER w;
-- output "x:3,y:2"
Supported Types:
[
list<any>
,list<date>
][
list<any>
,list<int16>
][
list<any>
,list<int32>
][
list<any>
,list<int64>
][
list<any>
,list<string>
][
list<any>
,list<timestamp>
]
function count_cate_where#
count_cate_where()
Description:
Compute count of values matching specified condition grouped by category key and output string. Each group is represented as ‘K:V’ and separated by comma in outputs and are sorted by key in ascend order.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
false |
y |
2 |
false |
x |
3 |
true |
y |
4 |
true |
x |
SELECT count_cate_where(value, condition, category) OVER w;
-- output "x:2,y:1"
Supported Types:
[
list<any>
,list<bool>
,list<date>
][
list<any>
,list<bool>
,list<int16>
][
list<any>
,list<bool>
,list<int32>
][
list<any>
,list<bool>
,list<int64>
][
list<any>
,list<bool>
,list<string>
][
list<any>
,list<bool>
,list<timestamp>
]
function count_where#
count_where()
Description:
Compute number of values match specified condition.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
Since: 0.1.0
Example:
value |
---|
0 |
1 |
2 |
3 |
4 |
SELECT count_where(value, value > 2) OVER w;
-- output 2
Supported Types:
[
list<any>
,list<bool>
][
list<row>
,list<bool>
]
function date#
date()
Description:
Cast timestamp or string expression to date (date >= 1900-01-01)
Since: 0.1.0
Supported string style:
yyyy-mm-dd
yyyymmdd
yyyy-mm-dd hh:mm:ss
Example:
select date(timestamp(1590115420000));
-- output 2020-05-22
select date("2020-05-22");
-- output 2020-05-22
Supported Types:
[
string
][
timestamp
]
function date_format#
date_format()
Description:
Formats the date value according to the format string.
Example:
select date_format(date(timestamp(1590115420000)),"%Y-%m-%d");
--output "2020-05-22"
Supported Types:
[
date
,string
][
timestamp
,string
]
function datediff#
datediff()
Description:
days difference from date1 to date2
Since: 0.7.0
Supported date string style:
yyyy-mm-dd
yyyymmdd
yyyy-mm-dd HH:MM:SS
yyyy-mm-ddTHH:MM:SS.fff+HH:MM (RFC3399 format)
Dates from string are transformed into the same time zone (which is currently always UTC+8) before differentiation, dates from date type by default is at UTC+8, you may see a +1/-1 difference if the two date string have different time zones.
Hint: since openmldb date type limits range from year 1900, to datadiff from/to a date before 1900, pass it as string.
Example:
select datediff("2021-05-10", "2021-05-01");
-- output 9
select datediff("2021-04-10", "2021-05-01");
-- output -21
select datediff(Date("2021-04-10"), Date("2021-05-01"));
-- output -21
Supported Types:
[
date
,date
][
date
,string
][
string
,date
][
string
,string
]
function day#
day()
Description:
alias to dayofmonth
function dayofmonth#
dayofmonth()
Description:
Return the day of the month for a timestamp or date.
Since: 0.1.0
Note: This function equals the [day()](/openmldb_sql/Files/udfs_8h.md#function-day)
function.
Example:
select dayofmonth(timestamp(1590115420000));
-- output 22
select day(timestamp(1590115420000));
-- output 22
Supported Types:
[
date
][
int64
][
timestamp
]
function dayofweek#
dayofweek()
Description:
Return the day of week for a timestamp or date.
Since: 0.4.0
Note: This function equals the [week()](/openmldb_sql/Files/udfs_8h.md#function-week)
function.
Example:
select dayofweek(timestamp(1590115420000));
-- output 6
Supported Types:
[
date
][
int64
][
timestamp
]
function dayofyear#
dayofyear()
Description:
Return the day of year for a timestamp or date. Returns 0 given an invalid date.
Since: 0.1.0
Example:
select dayofyear(timestamp(1590115420000));
-- output 143
select dayofyear(1590115420000);
-- output 143
select dayofyear(date("2020-05-22"));
-- output 143
select dayofyear(date("2020-05-32"));
-- output 0
Supported Types:
[
date
][
int64
][
timestamp
]
function degrees#
degrees()
Description:
Convert radians to degrees.
Parameters:
expr
Since: 0.5.0
Example:
SELECT degrees(3.141592653589793);
-- output 180.0
Supported Types:
[
double
]
function distinct_count#
distinct_count()
Description:
Compute number of distinct values.
Parameters:
value Specify value column to aggregate on.
Since: 0.1.0
Example:
value |
---|
0 |
0 |
2 |
2 |
4 |
SELECT distinct_count(value) OVER w;
-- output 3
Supported Types:
[
list<any>
]
function double#
double()
Description:
Cast string expression to double.
Since: 0.1.0
Example:
select double("1.23");
-- output 1.23
Supported Types:
[
string
]
function drawdown#
drawdown()
Description:
Compute drawdown of values.
Parameters:
value Specify value column to aggregate on.
Since: 0.7.2
Drawdown is defined as the max decline percentage from a historical peak to a subsequent valley. It is commonly used as an indicator of risk in quant-trading to measure the max loss.
It requires that values are ordered so that it can only be used with WINDOW (PARTITION BY xx ORDER BY xx). GROUP BY and full table aggregation are not supported.
It requires that all values are non-negative. Negative values will be ignored.
Example:
value |
---|
1 |
8 |
5 |
2 |
10 |
4 |
SELECT drawdown(value) OVER w;
-- output 0.75 (decline from 8 to 2)
Supported Types:
[
list<number>
]
function earth_distance#
earth_distance()
Description:
Returns the great circle distance between two points on the surface of the Earth. Km as return unit. add a minus (-) sign if heading west (W) or south (S).
Parameters:
ll1 First latitude in degree
ll2 First longitude in degree
rl1 second latitude in degree
rl2 Second longitude in degree
Since: 0.8.0
Example:
select earth_distance(40, 73, 41, 74)
-- output 139.7
Supported Types:
[
any
,any
,any
,any
]
function entropy#
entropy()
Description:
Calculate Shannon entropy of a column of values. Null values are skipped.
Parameters:
value Specify value column to aggregate on.
Since: 0.8.0
Example:
| col1 | | 1 | | 1 | | 2 | | 3 |
select entropy(col1) from t1
-- output 1.5
Supported Types:
[
list<any>
]
function ew_avg#
ew_avg()
Description:
Compute exponentially-weighted average of values. It’s equivalent to pandas ewm(alpha={alpha}, adjust=True, ignore_na=True, com=None, span=None, halflife=None, min_periods=0)
Parameters:
value Specify value column to aggregate on.
alpha Specify smoothing factor alpha (0 <= alpha <= 1). If NULL or 0, fall back to normal
avg
Since: 0.7.2
It requires that values are ordered so that it can only be used with WINDOW (PARTITION BY xx ORDER BY xx). Undefined behaviour if it is used with GROUP BY and full table aggregation.
Example:
value |
---|
0 |
1 |
2 |
3 |
4 |
SELECT ew_avg(value, 0.5) OVER w;
-- output 3.161290
Supported Types:
[
list<number>
,list<double>
]
function exp#
exp()
Description:
Return the value of e (the base of natural logarithms) raised to the power of expr.
Parameters:
expr
Since: 0.1.0
SELECT EXP(0);
-- output 1
Supported Types:
[
number
]
function farm_fingerprint#
farm_fingerprint()
Description:
alias to hash64
function first_value#
first_value()
Description:
Returns the value of expr from the latest row (last row) of the window frame.
Since: 0.1.0
Example:
select id, gp, ts, first_value(ts) over w as agg from t1
window w as (partition by gp order by ts rows between 3 preceding and current row);
id |
gp |
ts |
agg |
---|---|---|---|
1 |
100 |
98 |
98 |
2 |
100 |
99 |
99 |
3 |
100 |
100 |
100 |
Supported Types:
function float#
float()
Description:
Cast string expression to float.
Since: 0.1.0
Example:
select float("1.23");
-- output 1.23
Supported Types:
[
string
]
function floor#
floor()
Description:
Return the largest integer value not less than the expr.
Parameters:
expr
Since: 0.1.0
Example:
SELECT FLOOR(1.23);
-- output 1
Supported Types:
[
bool
][
number
]
function get_json_object#
get_json_object()
Description:
Extracts a JSON object from JSON Pointer
Parameters:
expr A string expression contains well formed JSON
path A string expression of JSON string representation from JSON Pointer
Since: 0.9.0
NOTE JSON string is not fully validated. Which means that the function may still returns values even though returned string does not valid for JSON. It’s your responsibility to make sure input string is valid JSON
Example:
select get_json_object('{"boo": "baz"}', "/boo")
-- baz
select get_json_object('{"boo": [1, 2]}', "/boo/0")
-- 1
select get_json_object('{"m~n": 1}', "/m~0n")
-- 1
select get_json_object('{"m/n": 1}', "/m~1n")
-- 1
select get_json_object('{"foo": {"bar": bz}}', "/foo")
-- {"bar": bz}
--- returns value even input JSON is not a valid JSON
Supported Types:
[
string
,string
]
function hash64#
hash64()
Description:
Returns a hash value of the arguments. It is not a cryptographic hash function and should not be used as such.
Since: 0.7.0
Example:
SELECT hash64(cast(90 as int));
--output -3754664774081171349
Supported Types:
[
any
]
function hex#
hex()
Description:
Convert integer to hexadecimal.
Since: 0.6.0
Example:
select hex("Spark SQL");
--output "537061726B2053514C"
Supported Types:
[
number
][
string
]
function hour#
hour()
Description:
Return the hour for a timestamp.
Since: 0.1.0
Example:
select hour(timestamp(1590115420000));
-- output 10
Supported Types:
[
int64
][
timestamp
]
function identity#
identity()
Description:
Return value.
Since: 0.1.0
Example:
select identity(1);
-- output 1
Supported Types:
[
any
]
function if_null#
if_null()
Description:
If input is not null, return input value; else return default value.
Parameters:
input Input value
default Default value if input is null
Since: 0.1.0
Example:
SELECT if_null("hello", "default"), if_null(cast(null as string), "default");
-- output ["hello", "default"]
Supported Types:
[
any
,any
]
function ifnull#
ifnull()
Description:
alias to if_null
function ilike_match#
ilike_match()
Description:
pattern match same as ILIKE predicate
Parameters:
target string to match
pattern the glob match pattern
Since: 0.4.0
Rules:
Special characters:
underscore(_): exact one character
precent(%): zero or more characters.
Escape character: backslash() is the default escape character
case insensitive
backslash: sql string literal use backslash() for escape sequences, write ‘’ as backslash itself
Return NULL if target or pattern is NULL
Example:
select ilike_match('Mike', 'Mi_k')
-- output: true
select ilike_match('Mike', 'mi_k')
-- output: true
Supported Types:
[
string
,string
][
string
,string
,string
]
function inc#
inc()
Description:
Return expression + 1.
Since: 0.1.0
Example:
select inc(1);
-- output 2
Supported Types:
[
number
]
function int#
int()
Description:
alias to int32
function int16#
int16()
Description:
Cast string expression to int16.
Parameters:
str Input string, refer int64 for supported format
Since: 0.1.0
Returns NULL if string is invalid or represented number out of range of output type.
Example:
select int16("123");
-- output 123
Supported Types:
[
string
]
function int32#
int32()
Description:
Cast string expression to int32.
Parameters:
str Input string, refer int64 for supported format
Since: 0.1.0
Returns NULL if string is invalid or represented number out of range of output type.
Example:
select int32("12345");
-- output 12345
Supported Types:
[
string
]
function int64#
int64()
Description:
Cast string expression to int64.
Parameters:
str Input string
Since: 0.1.0
Returns NULL if string is invalid or represented number out of range of output type.
Valid string input can be represented as the regexp
\s*(\+|-)?(0[xX])?[0-9a-fA-F]+\s*
string is parsed unsigned, use minus(
-
) or plus(+
) indicate signsdefault base of interpreted integer value is 10, and 16 if string starts with
0x
or0X
Example:
select int64("1590115420000");
-- output 1590115420000
Supported Types:
[
string
]
function is_null#
is_null()
Description:
Check if input value is null, return bool.
Parameters:
input Input value
Since: 0.1.0
Supported Types:
[
any
]
function isnull#
isnull()
Description:
alias to is_null
function join#
join()
Description:
For each string value from specified column of window, join by delimeter. Null values are skipped.
Parameters:
input List of string to join
delimeter Join delimeter
Since: 0.6.5
Example:
select `join`(split("k1:v1,k2:v2", ","), " ");
-- "k1:v1 k2:v2"
Supported Types:
[
list<string>
,string
]
function json_array_length#
json_array_length()
Description:
Returns the number of elements in the outermost JSON array.
Parameters:
jsonArray JSON arry in string
Since: 0.9.0
Null returned if input is not valid JSON array string.
Example:
select json_array_length('[1, 2]')
-- 2
SELECT json_array_length('[1,2,3,{"f1":1,"f2":[5,6]},4]');
-- 5
select json_array_length('[1, 2')
-- NULL
Supported Types:
[
string
]
function lag#
lag()
Description:
Returns value evaluated at the row that is offset rows before the current row within the partition. Offset is evaluated with respect to the current row.
Parameters:
offset The number of rows forwarded from the current row, must not negative
Note: This function equals the [at()](/openmldb_sql/Files/udfs_8h.md#function-at)
function.
The offset in window is nth_value()
, not [lag()](/openmldb_sql/Files/udfs_8h.md#function-lag)/at()
. The old [at()](/openmldb_sql/Files/udfs_8h.md#function-at)
(version < 0.5.0) is start from the last row of window(may not be the current row), it’s more like nth_value()
Example:
c1 |
c2 |
---|---|
0 |
1 |
1 |
1 |
2 |
2 |
3 |
2 |
4 |
2 |
SELECT lag(c1, 1) over w as co from t1 window w as(partition by c2 order by c1 rows between unbounded preceding and current row);
-- output
-- | co |
-- |----|
-- |NULL|
-- |0 |
-- |NULL|
-- |2 |
-- |3 |
SELECT at(c1, 1) over w as co from t1 window w as(partition by c2 order by c1 rows between unbounded preceding and current row);
-- output
-- | co |
-- |----|
-- |NULL|
-- |0 |
-- |NULL|
-- |2 |
-- |3 |
Supported Types:
[
list<any>
,int64
]
function last_day#
last_day()
Description:
Return the last day of the month to which the date belongs to.
Since: 0.6.1
Example:
select last_day(timestamp("2020-05-22 10:43:40"));
-- output 2020-05-31
select last_day(timestamp("2020-02-12 10:43:40"));
-- output 2020-02-29
select last_day(timestamp("2021-02-12"));
-- output 2021-02-28
Supported Types:
[
date
][
int64
][
timestamp
]
function lcase#
lcase()
Description:
Convert all the characters to lowercase. Note that characters with values > 127 are simply returned.
Since: 0.5.0
Example:
SELECT LCASE('SQl') as str1;
--output "sql"
Supported Types:
[
string
]
function like_match#
like_match()
Description:
pattern match same as LIKE predicate
Parameters:
target string to match
pattern the glob match pattern
Since: 0.4.0
Rules:
Special characters:
underscore(_): exact one character
precent(%): zero or more characters.
Escape character is backslash() by default
case sensitive
backslash: sql string literal use backslash() for escape sequences, write ‘’ as backslash itself
if one or more of target, pattern then the result is null
Example:
select like_match('Mike', 'Mi_k')
-- output: true
select like_match('Mike', 'mi_k')
-- output: false
Supported Types:
[
string
,string
][
string
,string
,string
]
function list_except_by_key#
list_except_by_key()
Description:
Return list of elements in list1 but keys not in except_str.
Parameters:
list1 List of string, with each element as the format of
key:vaule
.except_str String joined list, as
key1,key2
, split by comma(,)
Since: 0.8.1
Example:
select `join`(list_except_by_key(split("a:1,b:2,c:0", ","), "a,c"), " ");
-- output b:2
Supported Types:
[
list<string>
,string
]
function list_except_by_value#
list_except_by_value()
Description:
Return list of elements in list1 but values not in except_str.
Parameters:
list1 List of string, with each element as the format of
key:vaule
.except_str String joined list, as
value1,value2
, split by comma(,). Empty string filters list whose value is empty
Since: 0.8.1
Example:
select `join`(list_except_by_value(split("a:1,b:2,c:0", ","), "0,1"), " ");
-- output b:2
Supported Types:
[
list<string>
,string
]
function ln#
ln()
Description:
Return the natural logarithm of expr.
Parameters:
expr
Since: 0.1.0
Example:
SELECT LN(1);
-- output 0.000000
Supported Types:
[
bool
][
number
]
function log#
log()
Description:
log(base, expr) If called with one parameter, this function returns the natural logarithm of expr. If called with two parameters, this function returns the logarithm of expr to the base.
Parameters:
base
expr
Since: 0.1.0
Example:
SELECT LOG(1);
-- output 0.000000
SELECT LOG(10,100);
-- output 2
Supported Types:
[
bool
][
bool
,any
][
number
][
number
,any
]
function log10#
log10()
Description:
Return the base-10 logarithm of expr.
Parameters:
expr
Since: 0.1.0
Example:
SELECT LOG10(100);
-- output 2
Supported Types:
[
bool
][
number
]
function log2#
log2()
Description:
Return the base-2 logarithm of expr.
Parameters:
expr
Since: 0.1.0
Example:
SELECT LOG2(65536);
-- output 16
Supported Types:
[
bool
][
number
]
function lower#
lower()
Description:
alias to lcase
function make_tuple#
make_tuple()
Description:
Supported Types:
[…]
function max#
max()
Description:
Compute maximum of values.
Parameters:
value Specify value column to aggregate on.
Since: 0.1.0
Example:
value |
---|
0 |
1 |
2 |
3 |
4 |
SELECT max(value) OVER w;
-- output 4
Supported Types:
[
list<date>
][
list<number>
][
list<string>
][
list<timestamp>
]
function max_cate#
max_cate()
Description:
Compute maximum of values grouped by category key and output string. Each group is represented as ‘K:V’ and separated by comma in outputs and are sorted by key in ascend order.
Parameters:
value Specify value column to aggregate on.
catagory Specify catagory column to group by.
Example:
value |
catagory |
---|---|
0 |
x |
1 |
y |
2 |
x |
3 |
y |
4 |
x |
SELECT max_cate(value, catagory) OVER w;
-- output "x:4,y:3"
Supported Types:
[
list<number>
,list<date>
][
list<number>
,list<int16>
][
list<number>
,list<int32>
][
list<number>
,list<int64>
][
list<number>
,list<string>
][
list<number>
,list<timestamp>
]
function max_cate_where#
max_cate_where()
Description:
Compute maximum of values matching specified condition grouped by category key and output string. Each group is represented as ‘K:V’ and separated by comma in outputs and are sorted by key in ascend order.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
false |
y |
2 |
false |
x |
3 |
true |
y |
4 |
true |
x |
SELECT max_cate_where(value, condition, category) OVER w;
-- output "x:4,y:3"
Supported Types:
[
list<number>
,list<bool>
,list<date>
][
list<number>
,list<bool>
,list<int16>
][
list<number>
,list<bool>
,list<int32>
][
list<number>
,list<bool>
,list<int64>
][
list<number>
,list<bool>
,list<string>
][
list<number>
,list<bool>
,list<timestamp>
]
function max_where#
max_where()
Description:
Compute maximum of values match specified condition.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
Since: 0.1.0
Example:
value |
---|
0 |
1 |
2 |
3 |
4 |
SELECT max_where(value, value <= 2) OVER w;
-- output 2
Supported Types:
[
list<number>
,list<bool>
]
function maximum#
maximum()
Description:
Compute maximum of two arguments.
Since: 0.1.0
Supported Types:
[
any
,string
][
bool
,bool
][
bool
,number
][
date
,date
][
number
,bool
][
number
,number
][
string
,bool
][
string
,date
][
string
,number
][
string
,timestamp
][
timestamp
,timestamp
]
function median#
median()
Description:
Compute the median of values.
Parameters:
value Specify value column to aggregate on.
Since: 0.6.0
Example:
value |
---|
1 |
2 |
3 |
4 |
SELECT median(value) OVER w;
-- output 2.5
Supported Types:
[
list<number>
]
function min#
min()
Description:
Compute minimum of values.
Parameters:
value Specify value column to aggregate on.
Since: 0.1.0
Example:
value |
---|
0 |
1 |
2 |
3 |
4 |
SELECT min(value) OVER w;
-- output 0
Supported Types:
[
list<date>
][
list<number>
][
list<string>
][
list<timestamp>
]
function min_cate#
min_cate()
Description:
Compute minimum of values grouped by category key and output string. Each group is represented as ‘K:V’ and separated by comma in outputs and are sorted by key in ascend order.
Parameters:
value Specify value column to aggregate on.
catagory Specify catagory column to group by.
Example:
value |
catagory |
---|---|
0 |
x |
1 |
y |
2 |
x |
3 |
y |
4 |
x |
SELECT min_cate(value, catagory) OVER w;
-- output "x:0,y:1"
Supported Types:
[
list<number>
,list<date>
][
list<number>
,list<int16>
][
list<number>
,list<int32>
][
list<number>
,list<int64>
][
list<number>
,list<string>
][
list<number>
,list<timestamp>
]
function min_cate_where#
min_cate_where()
Description:
Compute minimum of values matching specified condition grouped by category key and output string. Each group is represented as ‘K:V’ and separated by comma in outputs and are sorted by key in ascend order.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
false |
y |
2 |
false |
x |
1 |
true |
y |
4 |
true |
x |
3 |
true |
y |
SELECT min_cate_where(value, condition, category) OVER w;
-- output "x:0,y:1"
Supported Types:
[
list<number>
,list<bool>
,list<date>
][
list<number>
,list<bool>
,list<int16>
][
list<number>
,list<bool>
,list<int32>
][
list<number>
,list<bool>
,list<int64>
][
list<number>
,list<bool>
,list<string>
][
list<number>
,list<bool>
,list<timestamp>
]
function min_where#
min_where()
Description:
Compute minimum of values match specified condition.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
Since: 0.1.0
Example:
value |
---|
0 |
1 |
2 |
3 |
4 |
SELECT min_where(value, value > 2) OVER w;
-- output 3
Supported Types:
[
list<number>
,list<bool>
]
function minimum#
minimum()
Description:
Compute minimum of two arguments.
Since: 0.1.0
Supported Types:
[
any
,string
][
bool
,bool
][
bool
,number
][
date
,date
][
number
,bool
][
number
,number
][
string
,bool
][
string
,date
][
string
,number
][
string
,timestamp
][
timestamp
,timestamp
]
function minute#
minute()
Description:
Return the minute for a timestamp.
Since: 0.1.0
Example:
select minute(timestamp(1590115420000));
-- output 43
Supported Types:
[
int64
][
timestamp
]
function month#
month()
Description:
Return the month part of a timestamp or date.
Since: 0.1.0
Example:
select month(timestamp(1590115420000));
-- output 5
Supported Types:
[
date
][
int64
][
timestamp
]
function nth_value_where#
nth_value_where()
Description:
Returns the value of expr from the idx th row matches the condition.
Parameters:
value Expr of the matched row
idx Idx th matched row (start from 1 or -1). If positive, count from first row of window; if negative, count from last row of window; 0 is invalid, results NULL.
cond Match expression of the row.
Since: 0.8.0
Example:
select col1, cond, gp, nth_value_where(col1, 2, cond) over (partition by gp order by col1 rows between 10 preceding and current row) as agg from t1;
col1 |
cond |
gp |
agg |
---|---|---|---|
1 |
true |
100 |
NULL |
2 |
false |
100 |
NULL |
3 |
NULL |
100 |
NULL |
4 |
true |
100 |
4 |
Supported Types:
[
list<any>
,list<int16>
,list<bool>
][
list<any>
,list<int32>
,list<bool>
][
list<any>
,list<int64>
,list<bool>
]
function nvl#
nvl()
Description:
alias to if_null
function nvl2#
nvl2()
Description:
nvl2(expr1, expr2, expr3) - Returns expr2 if expr1 is not null, or expr3 otherwise.
Parameters:
expr1 Condition expression
expr2 Return value if expr1 is not null
expr3 Return value if expr1 is null
Since: 0.2.3
Example:
SELECT nvl2(NULL, 2, 1);
-- output 1
Supported Types:
[
any
,any
,any
]
function pmod#
pmod()
Description:
Compute pmod of two arguments. If any param is NULL, output NULL. If divisor is 0, output NULL.
Parameters:
dividend any numeric number or NULL
divisor any numeric number or NULL
Since: 0.7.0
Example:
select pmod(-10, 3);
-- output 2
select pmod(10, -3);
-- output 1
select pmod(10, 3);
-- output 1
select pmod(-10, 0);
-- output NULL
select pmod(-10, NULL);
-- output NULL
select pmod(NULL, 2);
-- output NULL
Supported Types:
[
bool
,bool
][
bool
,number
][
number
,bool
][
number
,number
]
function pow#
pow()
Description:
Return the value of expr1 to the power of expr2.
Parameters:
expr1
expr2
Since: 0.1.0
Example:
SELECT POW(2, 10);
-- output 1024.000000
Supported Types:
[
bool
,bool
][
bool
,number
][
number
,bool
][
number
,number
]
function power#
power()
Description:
alias to pow
function radians#
radians()
Description:
Returns the argument X, converted from degrees to radians. (Note that π radians equals 180 degrees.)
Since: 0.6.0
Example:
SELECT RADIANS(90.0);
--output 1.570796326794896619231
Supported Types:
[
number
]
function regexp_like#
regexp_like()
Description:
pattern match same as RLIKE predicate (based on RE2)
Parameters:
target string to match
pattern the regular expression match pattern
Since: 0.6.1
Rules:
Accept standard POSIX (egrep) syntax regular expressions
dot (.) : matches any single-width ASCII character in an expression, with the exception of line break characters.
asterisk (*) : matches the preceding token zero or more times.
plus sign (+) : matches the preceding token one or more times.
question mark (?) : identifies the preceding character as being optional.
vertical bar (|) : separates tokens, one of which must be matched, much like a logical OR statement.
parenthesis (‘(’ and ‘)’) : groups multiple tokens together to disambiguate or simplify references to them.
open square bracket ([) and close square bracket (]) : enclose specific characters or a range of characters to be matched. The characters enclosed inside square brackets are known as a character class.
caret (^) : the caret has two different meanings in a regular expression, depending on where it appears: As the first character in a character class, a caret negates the characters in that character class. As the first character in a regular expression, a caret identifies the beginning of a term. In this context, the caret is often referred to as an anchor character.
dollar sign ($) : as the last character in a regular expression, a dollar sign identifies the end of a term. In this context, the dollar sign is often referred to as an anchor character.
backslash () : used to invoke the actual character value for a metacharacter in a regular expression.
case sensitive
backslash: sql string literal use backslash() for escape sequences, write ‘’ as backslash itself
Return NULL if target or pattern is NULL
Example:
select regexp_like('Mike', 'Mi.k')
-- output: true
select regexp_like('append', 'ap*end')
-- output: true
Supported Types:
[
string
,string
][
string
,string
,string
]
function replace#
replace()
Description:
replace(str, search[, replace]) - Replaces all occurrences of search
with replace
Since: 0.5.2
if replace is not given or is empty string, matched search
s removed from final string
Example:
select replace("ABCabc", "abc")
-- output "ABC"
Supported Types:
[
string
,string
][
string
,string
,string
]
function reverse#
reverse()
Description:
Returns the reversed given string.
Since: 0.4.0
Example:
SELECT REVERSE('abc') as str1;
--output "cba"
Supported Types:
[
string
]
function round#
round()
Description:
Returns expr rounded to d decimal places using HALF_UP rounding mode.
Parameters:
numeric_expr Expression evaluated to numeric
d Integer decimal place, if omitted, default to 0
Since: 0.1.0
When d
is a positive, numeric_expr
is rounded to the number of decimal positions specified by d
. When d
is a negative , numeric_expr
is rounded on the left side of the decimal point. Return type is the same as the type first parameter.
Example:
SELECT round(1.23);
-- 1 (double type)
SELECT round(1.23, 1)
-- 1.2 (double type)
SELECT round(123, -1)
-- 120 (int32 type)
Supported Types:
[
number
, …][
number
,int32
]
function second#
second()
Description:
Return the second for a timestamp.
Since: 0.1.0
Example:
select second(timestamp(1590115420000));
-- output 40
Supported Types:
[
int64
][
timestamp
]
function sin#
sin()
Description:
Return the sine of expr.
Parameters:
expr It is a single argument in radians.
Since: 0.1.0
Example:
SELECT SIN(0);
-- output 0.000000
The value returned by sin() is always in the range: -1 to 1.
Supported Types:
[
number
]
function size#
size()
Description:
Get the size of a List (e.g., result of split)
Since: 0.7.0
Example:
select size(split("a b c", " "));
-- output 3
Supported Types:
[
list<string>
]
function smallint#
smallint()
Description:
alias to int16
function split#
split()
Description:
Split string to list by delimeter. Null values are skipped.
Parameters:
input Input string
delimeter Delimeter of string
Since: 0.6.5
Example:
select `join`(split("k1:1,k2:2", ","), " ") as out;
-- output "k1:1 k2:2"
Supported Types:
[
string
,string
]
function split_array#
split_array()
Description:
Split string to array of string by delimeter.
Since: 0.7.0
select array_contains(split_array("2,1", ","), "1") as c0;
-- output true
Supported Types:
[
string
,string
]
function split_by_key#
split_by_key()
Description:
Split string by delimeter and split each segment as kv pair, then add each key to output list. Null or illegal segments are skipped.
Parameters:
input Input string
delimeter Delimeter of string
kv_delimeter Delimeter of kv pair
Since: 0.6.5
Example:
select `join`(split_by_key("k1:1, k2:2", ",", ":"), " ") as out;
-- output "k1 k2"
Supported Types:
[
string
,string
,string
]
function split_by_value#
split_by_value()
Description:
Split string by delimeter and split each segment as kv pair, then add each value to output list. Null or illegal segments are skipped.
Parameters:
input Input string
delimeter Delimeter of string
kv_delimeter Delimeter of kv pair
Since: 0.6.5
Example:
select `join`(split_by_value("k1:1, k2:2", ",", ":"), " ") as out;
-- output "1 2"
Supported Types:
[
string
,string
,string
]
function sqrt#
sqrt()
Description:
Return square root of expr.
Parameters:
expr It is a single argument in radians.
Since: 0.1.0
Example:
SELECT SQRT(100);
-- output 10.000000
Supported Types:
[
number
]
function std#
std()
Description:
alias to stddev
function stddev#
stddev()
Description:
Compute sample standard deviation of values, i.e., sqrt( sum((x_i - avg)^2) / (n-1) )
Parameters:
value Specify value column to aggregate on.
Since: 0.7.2
Alias function: std
, stddev_samp
Example:
value |
---|
1 |
2 |
3 |
4 |
SELECT stddev(value) OVER w;
-- output 1.290994
Supported Types:
[
list<number>
]
function stddev_pop#
stddev_pop()
Description:
Compute population standard deviation of values, i.e., sqrt( sum((x_i - avg)^2) / n )
Parameters:
value Specify value column to aggregate on.
Since: 0.7.2
Example:
value |
---|
1 |
2 |
3 |
4 |
SELECT stddev_pop(value) OVER w;
-- output 1.118034
Supported Types:
[
list<number>
]
function stddev_samp#
stddev_samp()
Description:
alias to stddev
function strcmp#
strcmp()
Description:
Returns 0 if the strings are the same, -1 if the first argument is smaller than the second according to the current sort order, and 1 otherwise.
Since: 0.1.0
Example:
select strcmp("text", "text1");
-- output -1
select strcmp("text1", "text");
-- output 1
select strcmp("text", "text");
-- output 0
Supported Types:
[
string
,string
]
function string#
string()
Description:
Return string converted from timestamp expression.
Since: 0.1.0
Example:
select string(timestamp(1590115420000));
-- output "2020-05-22 10:43:40"
Supported Types:
[
bool
][
date
][
number
][
timestamp
]
function substr#
substr()
Description:
alias to substring
function substring#
substring()
Description:
Return a substring len
characters long from string str, starting at position pos
. Alias function: substr
Parameters:
str
pos define the begining of the substring.
len length of substring. If len is less than 1, the result is the empty string.
Since: 0.1.0
Example:
select substr("hello world", 3, 6);
-- output "llo wo"
If
pos
is positive, the begining of the substring ispos
charactors from the start of string.If
pos
is negative, the beginning of the substring ispos
characters from the end of the string, rather than the beginning.
Supported Types:
[
string
,int32
][
string
,int32
,int32
]
function sum#
sum()
Description:
Compute sum of values.
Parameters:
value Specify value column to aggregate on.
Example:
value |
---|
0 |
1 |
2 |
3 |
4 |
SELECT sum(value) OVER w;
-- output 10
Supported Types:
[
list<number>
][
list<timestamp>
]
function sum_cate#
sum_cate()
Description:
Compute sum of values grouped by category key and output string. Each group is represented as ‘K:V’ and separated by comma in outputs and are sorted by key in ascend order.
Parameters:
value Specify value column to aggregate on.
catagory Specify catagory column to group by.
Example:
value |
catagory |
---|---|
0 |
x |
1 |
y |
2 |
x |
3 |
y |
4 |
x |
SELECT sum_cate(value, catagory) OVER w;
-- output "x:6,y:4"
Supported Types:
[
list<number>
,list<date>
][
list<number>
,list<int16>
][
list<number>
,list<int32>
][
list<number>
,list<int64>
][
list<number>
,list<string>
][
list<number>
,list<timestamp>
]
function sum_cate_where#
sum_cate_where()
Description:
Compute sum of values matching specified condition grouped by category key and output string. Each group is represented as ‘K:V’ and separated by comma in outputs and are sorted by key in ascend order.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
false |
y |
2 |
false |
x |
3 |
true |
y |
4 |
true |
x |
SELECT sum_cate_where(value, condition, category) OVER w;
-- output "x:4,y:3"
Supported Types:
[
list<number>
,list<bool>
,list<date>
][
list<number>
,list<bool>
,list<int16>
][
list<number>
,list<bool>
,list<int32>
][
list<number>
,list<bool>
,list<int64>
][
list<number>
,list<bool>
,list<string>
][
list<number>
,list<bool>
,list<timestamp>
]
function sum_where#
sum_where()
Description:
Compute sum of values match specified condition.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
Since: 0.1.0
Example:
value |
---|
0 |
1 |
2 |
3 |
4 |
SELECT sum_where(value, value > 2) OVER w;
-- output 7
Supported Types:
[
list<number>
,list<bool>
]
function tan#
tan()
Description:
Return the tangent of expr.
Parameters:
expr It is a single argument in radians.
Since: 0.1.0
Example:
SELECT TAN(0);
-- output 0.000000
Supported Types:
[
number
]
function timestamp#
timestamp()
Description:
Cast int64, date or string expression to timestamp.
Since: 0.1.0
Supported string style:
yyyy-mm-dd
yyyymmdd
yyyy-mm-dd hh:mm:ss
Example:
select timestamp(1590115420000);
-- output 1590115420000
select timestamp("2020-05-22");
-- output 1590076800000
select timestamp("2020-05-22 10:43:40");
-- output 1590115420000
Supported Types:
[
date
][
string
]
function top#
top()
Description:
Compute top k of values and output string separated by comma. The outputs are sorted in desc order.
Parameters:
value Specify value column to aggregate on.
k Fetch top n keys.
Since: 0.1.0
Example:
value |
---|
1 |
2 |
3 |
4 |
4 |
SELECT top(value, 3) OVER w;
-- output "4,4,3"
Supported Types:
[
list<date>
,list<int32>
][
list<date>
,list<int64>
][
list<number>
,list<int32>
][
list<number>
,list<int64>
][
list<string>
,list<int32>
][
list<string>
,list<int64>
][
list<timestamp>
,list<int32>
][
list<timestamp>
,list<int64>
]
function top1_ratio#
top1_ratio()
Description:
Compute the top1 occurring value’s ratio.
Parameters:
col Expr to the key
Since: 0.6.5
Calculate the most frequently occurring value from the list, and output ratio as count_of_mode / count_of_all
. NULL values are ignored. 0 returned if input list do not has non-null value.
Example:
SELECT key, top1_ratio(key) over () as ratio FROM t1;
key |
ratio |
---|---|
1 |
1.0 |
2 |
0.5 |
NULL |
0.5 |
Supported Types:
[
list<date>
][
list<number>
][
list<string>
][
list<timestamp>
]
function top_n_key_avg_cate_where#
top_n_key_avg_cate_where()
Description:
Compute average of values matching specified condition grouped by category key. Output string for top N category keys in descend order. Each group is represented as ‘K:V’ and separated by comma(,). Empty string returned if no rows selected.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
n Fetch top n keys.
Since: 0.1.0
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
false |
y |
2 |
false |
x |
3 |
true |
y |
4 |
true |
x |
5 |
true |
z |
6 |
false |
z |
SELECT top_n_key_avg_cate_where(value, condition, catagory, 2)
OVER w;
-- output "z:5,y:3"
Supported Types:
[
list<number>
,list<bool>
,list<date>
,list<int32>
][
list<number>
,list<bool>
,list<date>
,list<int64>
][
list<number>
,list<bool>
,list<int16>
,list<int32>
][
list<number>
,list<bool>
,list<int16>
,list<int64>
][
list<number>
,list<bool>
,list<int32>
,list<int32>
][
list<number>
,list<bool>
,list<int32>
,list<int64>
][
list<number>
,list<bool>
,list<int64>
,list<int32>
][
list<number>
,list<bool>
,list<int64>
,list<int64>
][
list<number>
,list<bool>
,list<string>
,list<int32>
][
list<number>
,list<bool>
,list<string>
,list<int64>
][
list<number>
,list<bool>
,list<timestamp>
,list<int32>
][
list<number>
,list<bool>
,list<timestamp>
,list<int64>
]
function top_n_key_count_cate_where#
top_n_key_count_cate_where()
Description:
Compute count of values matching specified condition grouped by category key. Output string for top N category keys in descend order. Each group is represented as ‘K:V’ and separated by comma(,). Empty string returned if no rows selected.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
n Fetch top n keys.
Since: 0.1.0
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
true |
y |
2 |
false |
x |
3 |
true |
y |
4 |
false |
x |
5 |
true |
z |
6 |
true |
z |
SELECT top_n_key_count_cate_where(value, condition, catagory, 2)
OVER w;
-- output "z:2,y:2"
Supported Types:
[
list<any>
,list<bool>
,list<date>
,list<int32>
][
list<any>
,list<bool>
,list<date>
,list<int64>
][
list<any>
,list<bool>
,list<int16>
,list<int32>
][
list<any>
,list<bool>
,list<int16>
,list<int64>
][
list<any>
,list<bool>
,list<int32>
,list<int32>
][
list<any>
,list<bool>
,list<int32>
,list<int64>
][
list<any>
,list<bool>
,list<int64>
,list<int32>
][
list<any>
,list<bool>
,list<int64>
,list<int64>
][
list<any>
,list<bool>
,list<string>
,list<int32>
][
list<any>
,list<bool>
,list<string>
,list<int64>
][
list<any>
,list<bool>
,list<timestamp>
,list<int32>
][
list<any>
,list<bool>
,list<timestamp>
,list<int64>
]
function top_n_key_max_cate_where#
top_n_key_max_cate_where()
Description:
Compute maximum of values matching specified condition grouped by category key. Output string for top N category keys in descend order. Each group is represented as ‘K:V’ and separated by comma(,). Empty string returned if no rows selected.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
n Fetch top n keys.
Since: 0.1.0
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
false |
y |
2 |
false |
x |
3 |
true |
y |
4 |
true |
x |
5 |
true |
z |
6 |
false |
z |
SELECT top_n_key_max_cate_where(value, condition, catagory, 2)
OVER w;
-- output "z:5,y:3"
Supported Types:
[
list<number>
,list<bool>
,list<date>
,list<int32>
][
list<number>
,list<bool>
,list<date>
,list<int64>
][
list<number>
,list<bool>
,list<int16>
,list<int32>
][
list<number>
,list<bool>
,list<int16>
,list<int64>
][
list<number>
,list<bool>
,list<int32>
,list<int32>
][
list<number>
,list<bool>
,list<int32>
,list<int64>
][
list<number>
,list<bool>
,list<int64>
,list<int32>
][
list<number>
,list<bool>
,list<int64>
,list<int64>
][
list<number>
,list<bool>
,list<string>
,list<int32>
][
list<number>
,list<bool>
,list<string>
,list<int64>
][
list<number>
,list<bool>
,list<timestamp>
,list<int32>
][
list<number>
,list<bool>
,list<timestamp>
,list<int64>
]
function top_n_key_min_cate_where#
top_n_key_min_cate_where()
Description:
Compute minimum of values matching specified condition grouped by category key. Output string for top N category keys in descend order. Each group is represented as ‘K:V’ and separated by comma(,). Empty string returned if no rows selected.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
n Fetch top n keys.
Since: 0.1.0
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
true |
y |
2 |
false |
x |
3 |
true |
y |
4 |
false |
x |
5 |
true |
z |
6 |
true |
z |
SELECT top_n_key_min_cate_where(value, condition, catagory, 2)
OVER w;
-- output "z:5,y:1"
Supported Types:
[
list<number>
,list<bool>
,list<date>
,list<int32>
][
list<number>
,list<bool>
,list<date>
,list<int64>
][
list<number>
,list<bool>
,list<int16>
,list<int32>
][
list<number>
,list<bool>
,list<int16>
,list<int64>
][
list<number>
,list<bool>
,list<int32>
,list<int32>
][
list<number>
,list<bool>
,list<int32>
,list<int64>
][
list<number>
,list<bool>
,list<int64>
,list<int32>
][
list<number>
,list<bool>
,list<int64>
,list<int64>
][
list<number>
,list<bool>
,list<string>
,list<int32>
][
list<number>
,list<bool>
,list<string>
,list<int64>
][
list<number>
,list<bool>
,list<timestamp>
,list<int32>
][
list<number>
,list<bool>
,list<timestamp>
,list<int64>
]
function top_n_key_ratio_cate#
top_n_key_ratio_cate()
Description:
Ratios (cond match cnt / total cnt) for groups.
Parameters:
value Specify value column to aggregate on.
condition Ratio filter condition .
catagory Specify catagory column to group by.
n Top N.
Since: 0.8.1
For each group, ratio value is value
expr count matches condtion divide total rows count. NULL groups or NULL values are never take into count. Output string for top N category keys in descend order. Each group is represented as ‘K:V’ and separated by comma(,). Empty string returned if no rows selected.
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
2 |
true |
x |
4 |
true |
x |
1 |
true |
y |
3 |
false |
y |
5 |
true |
z |
6 |
true |
z |
SELECT top_n_key_ratio_cate_where(value, condition, catagory, 2) from t;
-- output "z:1.000000,y:0.500000"
Supported Types:
[
list<any>
,list<bool>
,list<date>
,list<int32>
][
list<any>
,list<bool>
,list<date>
,list<int64>
][
list<any>
,list<bool>
,list<int16>
,list<int32>
][
list<any>
,list<bool>
,list<int16>
,list<int64>
][
list<any>
,list<bool>
,list<int32>
,list<int32>
][
list<any>
,list<bool>
,list<int32>
,list<int64>
][
list<any>
,list<bool>
,list<int64>
,list<int32>
][
list<any>
,list<bool>
,list<int64>
,list<int64>
][
list<any>
,list<bool>
,list<string>
,list<int32>
][
list<any>
,list<bool>
,list<string>
,list<int64>
][
list<any>
,list<bool>
,list<timestamp>
,list<int32>
][
list<any>
,list<bool>
,list<timestamp>
,list<int64>
]
function top_n_key_sum_cate_where#
top_n_key_sum_cate_where()
Description:
Compute sum of values matching specified condition grouped by category key. Output string for top N category keys in descend order. Each group is represented as ‘K:V’ and separated by comma(,). Empty string returned if no rows selected.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
n Fetch top n keys.
Since: 0.1.0
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
true |
y |
2 |
false |
x |
3 |
true |
y |
4 |
false |
x |
5 |
true |
z |
6 |
true |
z |
SELECT top_n_key_sum_cate_where(value, condition, catagory, 2)
OVER w;
-- output "z:11,y:4"
Supported Types:
[
list<number>
,list<bool>
,list<date>
,list<int32>
][
list<number>
,list<bool>
,list<date>
,list<int64>
][
list<number>
,list<bool>
,list<int16>
,list<int32>
][
list<number>
,list<bool>
,list<int16>
,list<int64>
][
list<number>
,list<bool>
,list<int32>
,list<int32>
][
list<number>
,list<bool>
,list<int32>
,list<int64>
][
list<number>
,list<bool>
,list<int64>
,list<int32>
][
list<number>
,list<bool>
,list<int64>
,list<int64>
][
list<number>
,list<bool>
,list<string>
,list<int32>
][
list<number>
,list<bool>
,list<string>
,list<int64>
][
list<number>
,list<bool>
,list<timestamp>
,list<int32>
][
list<number>
,list<bool>
,list<timestamp>
,list<int64>
]
function top_n_value_avg_cate_where#
top_n_value_avg_cate_where()
Description:
Compute average of values matching specified condition grouped by category key. Output string for top N aggregate values in descend order. Each group is represented as ‘K:V’ and separated by comma(,). Empty string returned if no rows selected.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
n Fetch top n keys.
Since: 0.6.4
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
false |
y |
2 |
false |
x |
3 |
false |
y |
4 |
true |
x |
5 |
true |
z |
6 |
false |
z |
SELECT top_n_value_avg_cate_where(value, condition, catagory, 2)
OVER w;
-- output "z:5,x:4"
Supported Types:
[
list<number>
,list<bool>
,list<date>
,list<int32>
][
list<number>
,list<bool>
,list<date>
,list<int64>
][
list<number>
,list<bool>
,list<int16>
,list<int32>
][
list<number>
,list<bool>
,list<int16>
,list<int64>
][
list<number>
,list<bool>
,list<int32>
,list<int32>
][
list<number>
,list<bool>
,list<int32>
,list<int64>
][
list<number>
,list<bool>
,list<int64>
,list<int32>
][
list<number>
,list<bool>
,list<int64>
,list<int64>
][
list<number>
,list<bool>
,list<string>
,list<int32>
][
list<number>
,list<bool>
,list<string>
,list<int64>
][
list<number>
,list<bool>
,list<timestamp>
,list<int32>
][
list<number>
,list<bool>
,list<timestamp>
,list<int64>
]
function top_n_value_count_cate_where#
top_n_value_count_cate_where()
Description:
Compute count of values matching specified condition grouped by category key. Output string for top N aggregate values in descend order. Each group is represented as ‘K:V’ and separated by comma(,). Empty string returned if no rows selected.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
n Top N.
Since: 0.6.4
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
true |
y |
2 |
true |
x |
3 |
false |
y |
4 |
true |
x |
5 |
true |
z |
6 |
true |
z |
SELECT top_n_value_count_cate_where(value, condition, catagory, 2)
OVER w;
-- output "x:3,y:2"
Supported Types:
[
list<any>
,list<bool>
,list<date>
,list<int32>
][
list<any>
,list<bool>
,list<date>
,list<int64>
][
list<any>
,list<bool>
,list<int16>
,list<int32>
][
list<any>
,list<bool>
,list<int16>
,list<int64>
][
list<any>
,list<bool>
,list<int32>
,list<int32>
][
list<any>
,list<bool>
,list<int32>
,list<int64>
][
list<any>
,list<bool>
,list<int64>
,list<int32>
][
list<any>
,list<bool>
,list<int64>
,list<int64>
][
list<any>
,list<bool>
,list<string>
,list<int32>
][
list<any>
,list<bool>
,list<string>
,list<int64>
][
list<any>
,list<bool>
,list<timestamp>
,list<int32>
][
list<any>
,list<bool>
,list<timestamp>
,list<int64>
]
function top_n_value_max_cate_where#
top_n_value_max_cate_where()
Description:
Compute maximum of values matching specified condition grouped by category key. Output string for top N aggregate values in descend order. Each group is represented as ‘K:V’ and separated by comma(,). Empty string returned if no rows selected.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
n Fetch Top n.
Since: 0.6.4
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
false |
y |
2 |
false |
x |
3 |
true |
y |
4 |
true |
x |
5 |
true |
z |
6 |
false |
z |
SELECT top_n_value_max_cate_where(value, condition, catagory, 2)
OVER w;
-- output "z:5,x:4"
Supported Types:
[
list<number>
,list<bool>
,list<date>
,list<int32>
][
list<number>
,list<bool>
,list<date>
,list<int64>
][
list<number>
,list<bool>
,list<int16>
,list<int32>
][
list<number>
,list<bool>
,list<int16>
,list<int64>
][
list<number>
,list<bool>
,list<int32>
,list<int32>
][
list<number>
,list<bool>
,list<int32>
,list<int64>
][
list<number>
,list<bool>
,list<int64>
,list<int32>
][
list<number>
,list<bool>
,list<int64>
,list<int64>
][
list<number>
,list<bool>
,list<string>
,list<int32>
][
list<number>
,list<bool>
,list<string>
,list<int64>
][
list<number>
,list<bool>
,list<timestamp>
,list<int32>
][
list<number>
,list<bool>
,list<timestamp>
,list<int64>
]
function top_n_value_min_cate_where#
top_n_value_min_cate_where()
Description:
Compute minimum of values matching specified condition grouped by category key. Output string for top N aggregate values in descend order. Each group is represented as ‘K:V’ and separated by comma(,). Empty string returned if no rows selected.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
n Fetch top n keys.
Since: 0.6.4
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
true |
y |
2 |
true |
x |
3 |
true |
y |
4 |
false |
x |
5 |
true |
z |
6 |
true |
z |
SELECT top_n_value_min_cate_where(value, condition, catagory, 2)
OVER w;
-- output "z:5,x:2"
Supported Types:
[
list<number>
,list<bool>
,list<date>
,list<int32>
][
list<number>
,list<bool>
,list<date>
,list<int64>
][
list<number>
,list<bool>
,list<int16>
,list<int32>
][
list<number>
,list<bool>
,list<int16>
,list<int64>
][
list<number>
,list<bool>
,list<int32>
,list<int32>
][
list<number>
,list<bool>
,list<int32>
,list<int64>
][
list<number>
,list<bool>
,list<int64>
,list<int32>
][
list<number>
,list<bool>
,list<int64>
,list<int64>
][
list<number>
,list<bool>
,list<string>
,list<int32>
][
list<number>
,list<bool>
,list<string>
,list<int64>
][
list<number>
,list<bool>
,list<timestamp>
,list<int32>
][
list<number>
,list<bool>
,list<timestamp>
,list<int64>
]
function top_n_value_ratio_cate#
top_n_value_ratio_cate()
Description:
Ratios (cond match cnt / total cnt) for groups.
Parameters:
value Specify value column to aggregate on.
condition Ratio filter condition .
catagory Specify catagory column to group by.
n Top N.
Since: 0.8.1
For each group, ratio value is value
expr count matches condtion divide total rows count. NULL groups or NULL values are never take into count. Output string for top N aggregate values in descend order. Each group is represented as ‘K:V’ and separated by comma(,). Empty string returned if no rows selected.
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
2 |
true |
x |
4 |
true |
x |
1 |
true |
y |
3 |
false |
y |
5 |
true |
z |
6 |
true |
z |
SELECT top_n_value_ratio_cate_where(value, condition, catagory, 2) from t;
-- output "z:1.000000,x:1.000000"
Supported Types:
[
list<any>
,list<bool>
,list<date>
,list<int32>
][
list<any>
,list<bool>
,list<date>
,list<int64>
][
list<any>
,list<bool>
,list<int16>
,list<int32>
][
list<any>
,list<bool>
,list<int16>
,list<int64>
][
list<any>
,list<bool>
,list<int32>
,list<int32>
][
list<any>
,list<bool>
,list<int32>
,list<int64>
][
list<any>
,list<bool>
,list<int64>
,list<int32>
][
list<any>
,list<bool>
,list<int64>
,list<int64>
][
list<any>
,list<bool>
,list<string>
,list<int32>
][
list<any>
,list<bool>
,list<string>
,list<int64>
][
list<any>
,list<bool>
,list<timestamp>
,list<int32>
][
list<any>
,list<bool>
,list<timestamp>
,list<int64>
]
function top_n_value_sum_cate_where#
top_n_value_sum_cate_where()
Description:
Compute sum of values matching specified condition grouped by category key. Output string for top N aggregate values in descend order. Each group is represented as ‘K:V’ and separated by comma(,). Empty string returned if no rows selected.
Parameters:
value Specify value column to aggregate on.
condition Specify condition column.
catagory Specify catagory column to group by.
n Top N.
Since: 0.6.4
Example:
value |
condition |
catagory |
---|---|---|
0 |
true |
x |
1 |
true |
y |
2 |
false |
x |
3 |
false |
y |
4 |
true |
x |
5 |
true |
z |
6 |
true |
z |
SELECT top_n_value_sum_cate_where(value, condition, catagory, 2)
OVER w;
-- output "z:11,x:4"
Supported Types:
[
list<number>
,list<bool>
,list<date>
,list<int32>
][
list<number>
,list<bool>
,list<date>
,list<int64>
][
list<number>
,list<bool>
,list<int16>
,list<int32>
][
list<number>
,list<bool>
,list<int16>
,list<int64>
][
list<number>
,list<bool>
,list<int32>
,list<int32>
][
list<number>
,list<bool>
,list<int32>
,list<int64>
][
list<number>
,list<bool>
,list<int64>
,list<int32>
][
list<number>
,list<bool>
,list<int64>
,list<int64>
][
list<number>
,list<bool>
,list<string>
,list<int32>
][
list<number>
,list<bool>
,list<string>
,list<int64>
][
list<number>
,list<bool>
,list<timestamp>
,list<int32>
][
list<number>
,list<bool>
,list<timestamp>
,list<int64>
]
function topn_frequency#
topn_frequency()
Description:
Return the topN keys sorted by their frequency.
Since: 0.6.5
Supported Types:
[
list<date>
,list<int32>
][
list<number>
,list<int32>
][
list<string>
,list<int32>
][
list<timestamp>
,list<int32>
]
function truncate#
truncate()
Description:
Return the nearest integer that is not greater in magnitude than the expr.
Parameters:
expr
Since: 0.1.0
Example:
SELECT TRUNCATE(1.23);
-- output 1.0
Supported Types:
[
bool
][
number
]
function ucase#
ucase()
Description:
Convert all the characters to uppercase. Note that characters values > 127 are simply returned.
Since: 0.4.0
Example:
SELECT UCASE('Sql') as str1;
--output "SQL"
Supported Types:
[
string
]
function unhex#
unhex()
Description:
Convert hexadecimal to binary string.
Since: 0.7.0
Example:
select unhex("537061726B2053514C");
--output "Spark SQL"
select unhex("7B");
--output "{"
select unhex("zfk");
--output NULL
Supported Types:
[
string
]
function unix_timestamp#
unix_timestamp()
Description:
Cast date or string expression to unix_timestamp. If empty string or NULL is provided, return current timestamp.
Since: 0.7.0
Supported string style:
yyyy-mm-dd
yyyymmdd
yyyy-mm-dd hh:mm:ss
Example:
select unix_timestamp("2020-05-22");
-- output 1590076800
select unix_timestamp("2020-05-22 10:43:40");
-- output 1590115420
select unix_timestamp("");
-- output 1670404338 (the current timestamp)
Supported Types:
[
date
][
string
]
function upper#
upper()
Description:
alias to ucase
function var_pop#
var_pop()
Description:
Compute population variance of values, i.e., sum((x_i - avg)^2) / n
Parameters:
value Specify value column to aggregate on.
Since: 0.8.0
Example:
value |
---|
0 |
3 |
6 |
SELECT var_pop(value) OVER w;
-- output 6.0
Supported Types:
[
list<number>
]
function var_samp#
var_samp()
Description:
Compute population variance of values, i.e., sum((x_i - avg)^2) / (n-1)
Parameters:
value Specify value column to aggregate on.
Since: 0.8.0
Example:
value |
---|
0 |
3 |
6 |
SELECT var_samp(value) OVER w;
-- output 9.0
Supported Types:
[
list<number>
]
function variance#
variance()
Description:
alias to var_samp
function week#
week()
Description:
alias to weekofyear
function weekofyear#
weekofyear()
Description:
Return the week of year for a timestamp or date.
Since: 0.1.0
Example:
select weekofyear(timestamp(1590115420000));
-- output 21
select week(timestamp(1590115420000));
-- output 21
Supported Types:
[
date
][
int64
][
timestamp
]
function window_split#
window_split()
Description:
For each string value from specified column of window, split by delimeter and add segment to output list. Null values are skipped.
Since: 0.6.5
Supported Types:
[
list<string>
,list<string>
]
function window_split_by_key#
window_split_by_key()
Description:
For each string value from specified column of window, split by delimeter and then split each segment as kv pair, then add each key to output list. Null and illegal segments are skipped.
Since: 0.6.5
Supported Types:
[
list<string>
,list<string>
,list<string>
]
function window_split_by_value#
window_split_by_value()
Description:
For each string value from specified column of window, split by delimeter and then split each segment as kv pair, then add each value to output list. Null and illegal segments are skipped.
Since: 0.6.5
Supported Types:
[
list<string>
,list<string>
,list<string>
]
function year#
year()
Description:
Return the year part of a timestamp or date.
Since: 0.1.0
Example:
select year(timestamp(1590115420000));
-- output 2020
Supported Types:
[
date
][
int64
][
timestamp
]