Data Import Quickstart
Contents
Data Import Quickstart#
There are two versions of OpenMLDB: the standalone version and cluster version.
For standalone version, datasets are all stored in the memory. Only
LOAD DATA
can be used to import data in this mode.For the cluster version, datasets are stored separately in the offline and online storage engines. Offline and online ends don’t share the data.
This tutorial will focus on the data import methods of cluster version.
Data Import Methods of Cluster Version#
1 Offline Import (LOAD DATA
)#
OpenMLDB doesn’t have its specialized offline storage engine, but it requires user to specify the offline storage path, that is modifying the configuration option of taskmanager:
offline.data.prefix
. You can use third-party storage engines, like local directory, HDFS, s3 to configure.There is only one way to import data offline: using
LOAD DATA
command. Hard copy will be adopted as default.OpenMLDB will copy the original data to the path of
offline.data.prefix
by default. The files ofcsv
andparquet
format are supported.LOAD DATA
with a soft link is also supported, you can use the optiondeep_copy=false
to configure. Only the storage path of the datasets will be saved in OpenMLDB in a soft link. Both thecsv
andparquet
files are supported as well.
Note
If the offline path of the table is a soft link, OpenMLDB doesn’t support appending data to the table as it doesn’t have write access to the files in the soft link path. You can overwrire the offline path of the table. If the path has been overwritten, the data in the original directory will not be removed, only the directory in the OpenMLDB will change.
2 Online Import#
The online modes of OpenMLDB cluster version provide online storage engine (stored in memory). Only hard copy can be used in online import.
2.1 LOAD DATA
#
LOAD DATA
command can be used in Online Request and Online Preview mode to load csv
files and parquet
files.
2.2 Stream#
Data can be loaded from Pulsar
, Kafka
and RocketMQ
as well, see the following links for detail.
Note#
The openmldb-import tool can be used for bulk load, importing the data quickly into the standalone or the online storage of cluster version.
The bulk load tool is still in development. There are some restrictions for usage:
Only
csv
files can be loaded.The tool is supported only on a single machine. The requirement for the memory of the single machine is high and maybe the memory should be larger than the size of the data to be imported.