Cluster vs. Standalone Versions
Contents
Cluster vs. Standalone Versions#
1. Installation and Deployment#
Please refer to this document for detail: Install and Deploy.
2. Usage#
2.1. Workflows#
Steps |
Difference |
---|---|
Database and table creation |
None |
Offline data preparation |
None |
Offline feature extraction |
None |
SQL deployment |
None |
Online data preparation |
- For the cluster version, because the offline and online storage engines are separated, the online data has to be imported explicitly. |
Online real-time feature extraction |
None |
2.2. Execution Modes#
The cluster version supports the system variable execute_mode
, which supports configuring the execution mode. In the cluster version, the offline
and online
execution modes correspond to the offline and online databases, respectively. For the standalone version, there is no such concept of “execution mode”.
For the cluster version, you can execute the below command in CLI to set the execution mode:
> SET @@execute_mode = "offline" | "online"
In offline execution mode, it is asynchronous by default. We can set as synchronous so that the command will be blocked util the job has been finished.
set @@sync_job = true;
2.3. Offline Task Management#
Offline task management is a unique feature of the cluster version.
The LOAD DATA
and SELECT INTO
command are blocking in the standalone version. However, the cluster version submits a task for those commands, and provides the commands SHOW JOBS
and SHOW JOB
to investigate the status of offline tasks. For details, see Offline Task Management.
2.4. SQL#
The differences in SQL query capabilities supported by the cluster version and the standalone version include:
Offline task management statement
Standalone version of OpenMLDB is not supported
The cluster version of OpenMLDB supports offline task management statements, including:
SHOW JOBS
,SHOW JOB
, etc.
Execution mode
The Standalone version of OpenMLDB does not support setting execution mode.
Clustered OpenMLDB can configure the execution mode:
SET @@execute_mode = ...
Use of
CREATE TABLE
The cluster version supports the properties related to distributed computing and storage, including
REPLICANUM
,DISTRIBUTION
,PARTITIONNUM
; the standalone version does not support such properties.
Use of
SELECT INTO
The output of
SELECT INTO
for the standalone version is a file.The output of
SELECT INTO
for the cluster version is a directory.
In the online execution mode of the cluster version, only simple single-table based query statements are supported:
It only supports column, expression, and single-row processing functions (scalar functions) and their combined expression operations
A single table query does not contain GROUP BY clause, HAVING clause and WINDOW sub-clause.
A single table query only involves the query on a single table, but no JOIN based multiple table computation.