
AutoFe supports picking top features based on the dataset and generating the SQL for feature extractions. You can use the generated SQL directly as the feature extraction script.


git clone
cd python/openmldb_autofe
pip install .
openmldb_autofe <yaml_path>

yaml Configuration#

More detailed configurations can be found in AutoFE test yaml

The required options are shown below:

apiserver: # we use apiserver to connect OpenMLDB
db: demo_db # the db name when AutoFE do feature selection
  - table: t1
    schema: "id string, vendor_id int, ..., trip_duration int" # table schema
    file_path: file://... # AutoFE feature selection will use the real feature, so we need data

  - table: t2

main_table: t1 # set it if only one table; set a main table when multiple tables
label: trip_duration # the label column in main table

  - name: w1 # main table time window
    partition_by: vendor_id
    order_by: pickup_datetime
    window_type: rows_range
    start: 1d PRECEDING
    end: CURRENT ROW

  - name: w2 # union time window, UNION only supports the same schema tables now
    union: t2
    partition_by: vendor_id
    order_by: pickup_datetime
    window_type: rows_range
    start: 1d PRECEDING
    end: CURRENT ROW

# offline_feature_path: # write to file:///tmp/autofe_offline_feature if not set. If OpenMLDB cluster is distributed, you should ensure that taskmanager and autofe progress can read the path

topk: 10 # the num of top features to select