创建 DEPLOYMENT
Contents
创建 DEPLOYMENT#
Syntax#
CreateDeploymentStmt
::= 'DEPLOY' [DeployOptionList] DeploymentName SelectStmt
DeployOptionList
::= DeployOption*
DeployOption
::= 'OPTIONS' '(' DeployOptionItem (',' DeployOptionItem)* ')'
DeploymentName
::= identifier
DeployOption
的定义详见DEPLOYMENT属性DeployOption(可选)。
SelectStmt
的定义详见Select查询语句。
DEPLOY
语句可以将SQL部署到线上。OpenMLDB仅支持部署Select查询语句,并且需要满足OpenMLDB SQL上线规范和要求。
Example
在集群版的在线请求模式下,部署上线一个SQL脚本。
CREATE DATABASE db1;
-- SUCCEED
USE db1;
-- SUCCEED: Database changed
CREATE TABLE demo_table1(c1 string, c2 int, c3 bigint, c4 float, c5 double, c6 timestamp, c7 date);
-- SUCCEED: Create successfully
DEPLOY demo_deploy SELECT c1, c2, sum(c3) OVER w1 AS w1_c3_sum FROM demo_table1 WINDOW w1 AS (PARTITION BY demo_table1.c1 ORDER BY demo_table1.c6 ROWS BETWEEN 2 PRECEDING AND CURRENT ROW);
-- SUCCEED
我们可以使用 SHOW DEPLOYMENT demo_deploy;
命令查看部署的详情,执行结果如下:
--------- -------------------
DB Deployment
--------- -------------------
demo_db demo_deploy
--------- -------------------
1 row in set
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DEPLOY demo_data_service SELECT
c1,
c2,
sum(c3) OVER (w1) AS w1_c3_sum
FROM
demo_table1
WINDOW w1 AS (PARTITION BY demo_table1.c1
ORDER BY demo_table1.c6 ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
;
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 row in set
# Input Schema
--- ------- ------------ ------------
# Field Type IsConstant
--- ------- ------------ ------------
1 c1 Varchar NO
2 c2 Int32 NO
3 c3 Int64 NO
4 c4 Float NO
5 c5 Double NO
6 c6 Timestamp NO
7 c7 Date NO
--- ------- ------------ ------------
# Output Schema
--- ----------- ---------- ------------
# Field Type IsConstant
--- ----------- ---------- ------------
1 c1 Varchar NO
2 c2 Int32 NO
3 w1_c3_sum Int64 NO
--- ----------- ---------- ------------
DeployOption(可选)#
DeployOption
::= 'OPTIONS' '(' DeployOptionItem (',' DeployOptionItem)* ')'
DeployOptionItem
::= 'LONG_WINDOWS' '=' LongWindowDefinitions
| 'SKIP_INDEX_CHECK' '=' string_literal
| 'SYNC' '=' string_literal
| 'RANGE_BIAS' '=' RangeBiasValueExpr
| 'ROWS_BIAS' '=' RowsBiasValueExpr
RangeBiasValueExpr ::= int_literal | interval_literal | string_literal
RowsBiasValueExpr ::= int_literal | string_literal
长窗口优化#
LongWindowDefinitions
::= 'LongWindowDefinition (, LongWindowDefinition)*'
LongWindowDefinition
::= WindowName':'[BucketSize]
WindowName
::= string_literal
BucketSize
::= int_literal | interval_literal
interval_literal ::= int_literal 's'|'m'|'h'|'d'
其中BucketSize
为用于性能优化的可选项,OpenMLDB会根据BucketSize
设置的粒度对表中数据进行预聚合,默认为1d
。
限制条件#
目前长窗口优化有以下几点限制:
SelectStmt
仅支持只涉及一个物理表的情况,即不支持包含join
或union
的SelectStmt
。支持的聚合运算仅限:
sum
,avg
,count
,min
,max
,count_where
,min_where
,max_where
,sum_where
,avg_where
。执行
deploy
命令的时候不允许表中有数据。对于带 where 条件的运算,如
count_where
,min_where
,max_where
,sum_where
,avg_where
,有额外限制:主表必须是内存表 (
storage_mode = 'Memory'
)BucketSize
类型应为范围类型,即取值应为interval_literal
类,比如,long_windows='w1:1d'
是支持的, 不支持long_windows='w1:100'
。where 条件必须是
<column ref> op <const value> 或者 <const value> op <column ref>
的格式。支持的 where op:
>, <, >=, <=, =, !=
where 关联的列
<column ref>
,数据类型不能是 date 或者 timestamp
为了得到最佳的性能提升,数据需按
timestamp
列的递增顺序导入。
Example
DEPLOY demo_deploy OPTIONS(long_windows="w1:1d") SELECT c1, sum(c2) OVER w1 FROM demo_table1
WINDOW w1 AS (PARTITION BY c1 ORDER BY c2 ROWS_RANGE BETWEEN 5d PRECEDING AND CURRENT ROW);
-- SUCCEED
关闭索引类型校验#
默认情况下SKIP_INDEX_CHECK
选项为false
, deploy SQL时如果存在和期望索引key与ts相同的现有索引,还会校验现有索引和期望索引的TTL类型是否一致,并更新表的索引,如果集群版本是0.8.0或更早的,将不支持更新索引的TTL类型。如果这个选项设置为true
, deploy的时候不会校验现有索引,也不会修改现有索引的TTL,仅创建新的期望索引。
Example
DEPLOY demo OPTIONS (SKIP_INDEX_CHECK="TRUE")
SELECT * FROM t1 LAST JOIN t2 ORDER BY t2.col3 ON t1.col1 = t2.col1;
设置同步/异步#
执行deploy的时候可以通过SYNC
选项来设置同步/异步模式, 默认为true
即同步模式。如果deploy语句中涉及的相关表有数据,并且需要添加索引的情况下,执行deploy会发起数据加载等任务,如果SYNC
选项设置为false
就会返回一个任务id。可以通过SHOW JOBS FROM NAMESERVER LIKE '{job_id}'
来查看任务执行状态。
Example
deploy demo options(SYNC="false") SELECT t1.col1, t2.col2, sum(col4) OVER w1 as w1_col4_sum FROM t1 LAST JOIN t2 ORDER BY t2.col3 ON t1.col2 = t2.col2
WINDOW w1 AS (PARTITION BY t1.col2 ORDER BY t1.col3 ROWS BETWEEN 2 PRECEDING AND CURRENT ROW);
设置偏移BIAS#
如果你并不希望数据根据deploy的索引淘汰,或者希望晚一点淘汰,可以在deploy时设置偏移BIAS,常用于数据时间戳并不实时的情况、测试等情况。如果deploy后的索引ttl为abs 3h,但是数据的时间戳是3h前的(以系统时间为基准),那么这条数据就会被淘汰,无法参与计算。设置一定时间或永久的偏移,则可以让数据更久的停留在在线表中。
时间偏移,单位可以是s
、m
、h
、d
,也可以是整数,单位为ms
,也可以是inf
,表示永不淘汰;如果是行数偏移,可以是整数,单位是row
,也可以是inf
,表示永不淘汰。两种偏移中,0均表示不偏移。
注意,我们只将偏移加在deploy的解析索引中,也就是新索引,它们并不是最终索引。最终索引的计算方式是,如果是创建索引,最终索引是解析索引 + 偏移
;如果是更新索引,最终索引是merge(旧索引, 新索引 + 偏移)
。
而时间偏移的单位是min
,我们会在内部将其转换为min
,并且取上界。比如,新索引ttl是abs 2min,加上偏移20s,结果是2min + ub(20s) = 3min
,然后和旧索引1min取上界,最终索引ttl是max(1min, 3min) = 3min
。
Example
DEPLOY demo OPTIONS(RANGE_BIAS="inf", ROWS_BIAS="inf") SELECT t1.col1, t2.col2, sum(col4) OVER w1 as w1_col4_sum FROM t1 LAST JOIN t2 ORDER BY t2.col3 ON t1.col2 = t2.col2
WINDOW w1 AS (PARTITION BY t1.col2 ORDER BY t1.col3 ROWS BETWEEN 2 PRECEDING AND CURRENT ROW);