AutoFE
Contents
AutoFE#
AutoFe support to choose top features from dataset, and generate the SQL for top features. You can use the SQL to be the feature extraction script.
Usage#
git clone https://github.com/4paradigm/OpenMLDB.git
cd python/openmldb_autofe
pip install .
openmldb_autofe <yaml_path>
yaml config#
More detail example is in 配AutoFE test yaml
The required options are shown below:
apiserver: 127.0.0.1:9080 # we use apiserver to connect OpenMLDB
db: demo_db # the db name when AutoFE do feature selection
tables:
- table: t1
schema: "id string, vendor_id int, ..., trip_duration int" # 表schema
file_path: file://... # AutoFE feature selection will use the real feature, so we need data
- table: t2
...
main_table: t1 # set it if only one table; set a main table when multi tables
label: trip_duration # the label column in main table
windows:
- name: w1 # main table time window
partition_by: vendor_id
order_by: pickup_datetime
window_type: rows_range
start: 1d PRECEDING
end: CURRENT ROW
- name: w2 # union time window, UNION only supports the same schema tables now
union: t2
partition_by: vendor_id
order_by: pickup_datetime
window_type: rows_range
start: 1d PRECEDING
end: CURRENT ROW
# offline_feature_path: # write to file:///tmp/autofe_offline_feature if not set. If OpenMLDB cluster is distributed, you should ensure that taskmanager and autofe progress can read the path
topk: 10 # the num of top features we selected