In this blog, we would like to introduce an excellent standalone tool from the OpenMLDB community — OpenMLDB SQL Emulator (https://github.com/vagetablechicken/OpenMLDBSQLEmulator). This tool allows users to develop and debug OpenMLDB SQL more efficiently and conveniently.
To efficiently implement time-series feature calculations, OpenMLDB SQL has improved and extended standard SQL. In practical use, beginners often encounter problems such as unfamiliar syntax and confusing execution modes when using OpenMLDB SQL. Suppose one directly develops and debugs on OpenMLDB itself, due to issues including deployment, index building, handling large data volume, and so on. In that case, he/she will end up wasting a lot of time on irrelevant tasks and might not be able to pinpoint the root cause easily.
The OpenMLDB SQL Emulator is a lightweight simulation development and debugging tool for OpenMLDB SQL. It allows for SQL validation and debugging operations without the need for deploying an OpenMLDB cluster. We strongly recommend this tool to our application developers. They can use this tool to quickly verify the correctness and deployability of SQL, before switching to the actual OpenMLDB environment for deployment.
Installation
Download the runtime package emulator-1.0.jar
from the project page at https://github.com/vagetablechicken/OpenMLDBSQLEmulator/releases. Use the following method to start (Note that the current release version 1.0 corresponds to SQL syntax for OpenMLDB 0.8.3):
java -jar emulator-1.0.jar
Note: If you want to execute SQL using the run
command to validate results, you will also need to download toydb_run_engine
from the same page and store it in the system's /tmp
directory.
Creation of Virtual Databases and Tables
Once started, it will directly enter the default database emudb
, and no additional database creation is required.
- Databases don't need to be explicitly created. Just use
use <db name>
or specify the database name when creating tables, and the database will be created automatically. - Use the
addtable
command ort
to create a virtual table. Repeatedly creating a table with the same name is considered an update operation and will use the latest table schema. We use a simplified SQL-like syntax to manage tables. For example, the following example creates a table with two columns:
addtable t1 a int, b int64
- Use the
showtables
command or st to view all current databases and tables.
Validation of OpenMLDB SQL
To verify whether OpenMLDB SQL can be deployed in the cluster setting, you can use DEPLOY
. Note that DEPLOYMENT
and index need to be managed. For instance, if a certain DEPLOYMENT
is not needed, it needs to be manually deleted. Similarly, if unnecessary indexes are created, they must be cleaned up.
Hence, it is suggested to test and verify in the Emulator instead. You can use val
and valreq
to perform online batch mode and online request mode (i.e., service deployment) OpenMLDB SQL validation. For example, testing whether a SQL can be DEPLOY
ed online using the valreq
command:
addtable t1 a int, b int64
valreq select count(*) over w1 from t1 window w1 as (partition by a order by b rows between unbounded preceding and current row);
If the test fails, it will print SQL compilation errors. If successful, it will print validate * success
. The entire process happens in a virtual environment, without any concerns about resource utilization after table creation, and without any side effects. Any SQL that passes the valreq
validation will definitely be able to be deployed in a real cluster.
Testing of OpenMLDB SQL
OpenMLDB Emulator can also return computation results to verify if the implemented SQL gives correct calculation results. You can continuously perform computations and online validations until the implemented SQL meets the expectation. This functionality can be achieved through the run
command in the Emulator. Note that using the run
command requires additional support from toydb_run_engine
. You can use the built-in toydb
from the emulator package, or download it from this page (https://github.com/vagetablechicken/OpenMLDBSQLEmulator/releases) and place it directly in /tmp
.
Assuming Emulator already has toydb
installed, here are the steps for testing SQL:
# step 1, generate a yaml template
gencase
# step 2, modify the yaml file to add table and data
# ...
# step 3, load yaml to get table catalog,
# then using val/valreq sql to validate the sql in emulator
loadcase
valreq <sql>
# step 4, dump the sql, this will rewrite the yaml file
dumpcase <sql>
# step 5, run sql using toydb
run
The command gencase
generates a YAML template file, defaulting to the directory /tmp/emu-case.yaml
. You'll need to edit this YAML file as shown below. When editing, consider the following:
- Modify table names, table schema, and their data, which can not be changed in the Emulator.
- Modify the run mode to accept batch or request mode.
- You may leave the SQL section blank. SQL can be written to the file in the Emulator using
dumpcase <sql>
. A common usage is to validate SQL first, then dump it to the case, and finally, use therun
command to confirm if the SQL calculation meets expectations. - The table's indexes don't need manual filling. When using
dumpcase
, indexes can be automatically generated based on the table schema (indexes are not specific to SQL and are only needed to create at least one index when creating the table). If you are not usingdumpcase
, please manually specify at least one index.
# call toydb_run_engine to run this yaml file
# you can generate yaml cases for reproduction by emulator dump or by yourself
# you can set the global default db
db: emudb
cases:
- id: 0
desc: describe this case
# you can set batch mode
mode: request
db: emudb # you can set default db for case, if not set, use the global default db
inputs:
- name: t1
db: emudb # you can set db for each table, if not set, use the default db(table db > case db > global db)
# must set table schema, emulator can't do this
columns: ["id int", "pk1 string","col1 int32", "std_ts timestamp"]
# gen by emulator, just to init table, not the deployment index
indexs: []
# must set the data, emulator can't do this
data: |
1, A, 1, 1590115420000
2, B, 1, 1590115420000
# query: only support single query, to check the result by `expect`
sql: |
# optional, you can just check the output, or add your expect
# expect:
# schema: id:int, pk1:string, col1:int, std_ts:timestamp, w1_col1_sum:int, w2_col1_sum:int, w3_col1_sum:int
# order: id
# data: |
# 1, A, 1, 1590115420000, 1, 1, 1
# 2, B, 1, 1590115420000, 1, 1, 1
For simplicity, let's not make any modifications and directly use this template to demonstrate how to modify the running case. In the Emulator, executing loadcase
will load the table information from this case into the Emulator. You can confirm the successful loading of the case's tables by using st/showtables
.
emudb> st
emudb={t1=id:int32,pk1:string,col1:int32,std_ts:timestamp}
You can see that the table information has been successfully loaded. Now, we can use valreq
to confirm if the SQL we've written is syntactically correct and deployable. Subsequently, you can perform a computation test on this SQL using the dumpcase
and run
commands. For example:
valreq select count(*) over w1 from t1 window w1 as (partition by id order by std_ts rows between unbounded preceding and current row);
dumpcase select count(*) over w1 from t1 window w1 as (partition by id order by std_ts rows between unbounded preceding and current row);
run
dumpcase
command actually writes the SQL and default indexes into the case file, and the run
command executes this case file. Therefore, if you are skilled enough, you can directly modify this case file and then run it in the Emulator using run
, or alternatively, use toydb_run_engine --yaml_path=...
to run it. After execution, you will obtain the computed results for debugging and inspection purposes.
More
The OpenMLDB SQL Emulator also features a genddl
function that helps users generate optimal index creation statements directly from SQL. This feature aids in avoiding redundant indexes and currently supports only single database. In future, there will be improvements in index handling, providing simpler and more convenient operations to guide users in index management.
Additionally, for usage of the Emulator, it's recommended to utilize the ?help
and ?list-all
prompts. Commands are in lowercase, but SQL parameter inputs are case-insensitive and do not require additional double quotes, aligning with CLI conventions. Functionalities such as command history and exporting the current environment will be added to facilitate user operations and integration with real OpenMLDB clusters in future updates.
For more information on OpenMLDB:
- Official website: https://openmldb.ai/
- GitHub: https://github.com/4paradigm/OpenMLDB
- Documentation: https://openmldb.ai/docs/en/
- Join us on Slack !