EvalML Components and Pipelines

EvalML searches and trains multiple machine learnining pipelines in order to find the best one for your data. Each pipeline is made up of various components that can learn from the data, transform the data and ultimately predict labels given new data. Below we’ll show an example of an EvalML pipeline. You can find a more in-depth look into components or learn how you can construct and use your own pipelines.

XGBoost Pipeline

The EvalML XGBoost Pipeline is made up of four different components: a one-hot encoder, a missing value imputer, a feature selector and an XGBoost estimator. To initialize a pipeline you need a parameters dictionary.

Parameters

The parameters dictionary needs to be in the format of a two-layered dictionary where the first key-value pair is the component name and component parameters dictionary. The component parameters dictionary consists of a key value pair of parameter name and parameter values. An example will be shown below and component parameters can be found here.

[1]:
from evalml.demos import load_breast_cancer
from evalml.pipelines import XGBoostBinaryPipeline

X, y = load_breast_cancer()

parameters = {
        'Simple Imputer': {
            'impute_strategy': 'mean'
        },
        'RF Classifier Select From Model': {
            "percent_features": 0.5,
            "number_features": X.shape[1],
            "n_estimators": 20,
            "max_depth": 5
        },
        'XGBoost Classifier': {
            "n_estimators": 20,
            "eta": 0.5,
            "min_child_weight": 5,
            "max_depth": 10,
        }
    }

xgp = XGBoostBinaryPipeline(parameters=parameters, random_state=5)
xgp.graph()
[1]:
../_images/pipelines_overview_4_0.svg

From the above graph we can see each component and its parameters. Each component takes in data and feeds it to the next. You can see more detailed information by calling .describe():

[2]:
xgp.describe()
******************************************
* XGBoost Binary Classification Pipeline *
******************************************

Problem Type: Binary Classification
Model Family: XGBoost

Pipeline Steps
==============
1. One Hot Encoder
         * top_n : 10
2. Simple Imputer
         * impute_strategy : mean
         * fill_value : None
3. XGBoost Classifier
         * eta : 0.5
         * max_depth : 10
         * min_child_weight : 5
         * n_estimators : 20

You can then fit and score an individual pipeline with an objective. An objective can either be a string representation of an EvalML objective or an EvalML objective class. You can find more objectives here.

[3]:
xgp.fit(X, y)
xgp.score(X, y, objectives=['f1'])
[3]:
OrderedDict([('F1', 0.9916434540389972)])