# LOAD DATA INFILE
The `LOAD DATA INFILE` statement efficiently reads data from files into tables in the database. `LOAD DATA INFILE` and `SELECT INTO OUTFILE` complement each other. To export data from a table to a file, use [SELECT INTO OUTFILE](../dql/SELECT_INTO_STATEMENT.md). To import file data into a table, use `LOAD DATA INFILE`. Note that the order of columns in the imported file schema should match the order of columns in the table schema.
```{note}
Regardless of the load_mode, the `filePath` in INFILE can be a single filename, a directory, or use the `*` wildcard.
- For load_mode=cluster, the specific format is equivalent to `DataFrameReader.read.load(String)`. You can use the spark shell to read the desired file path and confirm whether it can be successfully read. If there are multiple file formats in the directory, only files in the FORMAT specified in LoadDataInfileOptionsList will be selected.
- For load_mode=local, it uses glob to select all matching files and does not check the format of individual files. Therefore, ensure that the files meeting the conditions are all in CSV format, and it is recommended to use `*.csv` to restrict the file format.
```
## Syntax
```sql
LoadDataInfileStmt
::= 'LOAD' 'DATA' 'INFILE' filePath 'INTO' 'TABLE' tableName LoadDataInfileOptionsList
filePath
::= URI
tableName
::= string_literal
LoadDataInfileOptionsList
::= 'OPTIONS' '(' LoadDataInfileOptionItem (',' LoadDataInfileOptionItem)* ')'
LoadDataInfileOptionItem
::= 'DELIMITER' '=' string_literal
|'HEADER' '=' bool_literal
|'NULL_VALUE' '=' string_literal
|'FORMAT' '=' string_literal
|'QUOTE' '=' string_literal
|'MODE' '=' string_literal
|'DEEP_COPY' '=' bool_literal
|'LOAD_MODE' '=' string_literal
|'THREAD' '=' int_literal
URI
::= 'file://FilePathPattern'
|'hdfs://FilePathPattern'
|'hive://[db.]table'
|'FilePathPattern'
FilePathPattern
::= string_literal
```
The `FilePathPattern` supports wildcard character `*`, with the same match rules as `ls FilePathPattern`.
Supports loading data from Hive, but needs extra settings, see [Hive Support](#Hive-support).
The following table introduces the parameters of `LOAD DATA INFILE`.
| Parameter | Type | Default Value | Note |
| ---------- | ------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| delimiter | String | , | It defines the column separator, the default value is `,`. |
| header | Boolean | true | It indicates that whether the table to import has a header. If the value is `true`, the table has a header. |
| null_value | String | null | It defines the string that will be used to replace the `NULL` value when loading data. |
| format | String | csv | It defines the format of the input file.
`csv` is the default format.
`parquet` format is supported in the cluster version. |
| quote | String | " | It defines the string surrounding the input data. The string length should be <= 1.
load_mode='cluster': default is `"`, the content surrounded by a pair of the quote characters will be parsed as a whole. For example, if the surrounding string is `"#"` then the original data like `1, 1.0, #This is a string, with comma#, normal_string` will be converted to four fields. The first field is an integer 1, the second is a float 1.0, the third field is a string "This is a string, with comma" and the 4th is "normal_string" even it's no quote.
load_mode='local': default is `\0`, which means that the string surrounding the input data is empty. |
| mode | String | "error_if_exists" | It defines the input mode.
`error_if_exists` is the default mode which indicates that an error will be thrown out if the offline table already has data. This input mode is only supported by the offline execution mode.
`overwrite` indicates that if the file already exists, the data will overwrite the contents of the original file. This input mode is only supported by the offline execution mode.
`append` indicates that if the table already exists, the data will be appended to the original table. Both offline and online execution modes support this input mode. |
| deep_copy | Boolean | true | It defines whether `deep_copy` is used. Only offline load supports `deep_copy=false`, you can specify the `INFILE` path as the offline storage address of the table to avoid hard copy. |
| load_mode | String | cluster | `load_mode='local'` only supports loading the `csv` local files into the `online` storage; It loads the data synchronously by the client process.
`load_mode='cluster'` only supports the cluster version. It loads the data via Spark synchronously or asynchronously. |
| thread | Integer | 1 | It only works for data loading locally, i.e., `load_mode='local'` or in the standalone version; It defines the number of threads used for data loading. The max value is `50`. |
| writer_type | String | single | The writer type for inserting data in cluster online loading. The optional values are `single` and `batch`, with the default being `single`. `single` means data is read and written on the fly, saving memory. `batch`, on the other hand, reads the entire RDD partition, confirms the data type validity, and then writes it to the cluster, requiring more memory. In some cases, the `batch` mode is advantageous for filtering data that has not been written, facilitating the retry of this portion of data. |
| put_if_absent | Boolean | false | When there are no duplicate rows in the source data and it does not duplicate existing data in the table, you can use this option to avoid inserting duplicate data, especially when retrying after a job failure. Equivalent to using `INSERT OR IGNORE`. For more details, see the following. |
```{note}
In the cluster version, the `LOAD DATA INFILE` statement will determine whether to import data into online or offline storage based on the current execution mode (`execute_mode`). In the standalone version, there is no storage distinction, and data will only be imported into online storage, and the `deep_copy` option is not supported.
The specific rules are described below.
```
```{warning} INFILE Path
:class: warning
In the cluster version, if `load_mode='cluster'`, the reading of the `INFILE` path is done by the batch job. If it is a relative path, the batch job needs to access the relative path.
In a production environment, the execution of the batch job is usually scheduled by the YARN cluster, making it difficult to determine the specific executor. In a test environment, if it is also a multi-node deployment, it is difficult to determine the specific executor of the batch job.
Therefore, it is recommended to use absolute paths as much as possible. In local testing, use `file://` for local files; in a production environment, it is recommended to use HDFS or other file systems.
```
## SQL Statement Template
```sql
LOAD DATA INFILE 'file_path' INTO TABLE 'table_name' OPTIONS (key = value, ...);
```
## Hive Support
OpenMLDB supports importing data from Hive, but it requires additional settings and has certain limitations. For details, see [Hive Support](../../integration/offline_data_sources/hive.md).
## Examples:
Read data from the `data.csv` file into the `t1` table in online storage. Use `,` as the column delimiter.
```sql
set @@execute_mode='online';
LOAD DATA INFILE 'data.csv' INTO TABLE t1 OPTIONS(delimiter = ',', mode = 'append');
```
Read data from the `data.csv` file into the `t1` table. Use `,` as the column delimiter, and replace the string "NA" with NULL.
```sql
LOAD DATA INFILE 'data.csv' INTO TABLE t1 OPTIONS(delimiter = ',', mode = 'append', null_value='NA');
```
Copy the `data_path` into the `t1` table as offline data.
```sql
set @@execute_mode='offline';
LOAD DATA INFILE 'data_path' INTO TABLE t1 OPTIONS(deep_copy=false);
```
Import tables from the Hive data warehouse in offline mode:
```sql
set @@execute_mode='offline';
LOAD DATA INFILE 'hive://db1.t1' INTO TABLE t1;
```
## Online Import Rules
Online import only allows `mode='append'` and cannot be used with `overwrite` or `error_if_exists`.
If the `insert_memory_usage_limit` session variable is set, the server will fail if the memory usage exceeds the specified value.
## Offline Import Rules
The offline information for a table can be viewed using `desc