Automated Machine Learning (AutoML) Search — EvalML 0.82.0 documentation

Background#

Machine Learning#

Machine learning (ML) is the process of constructing a mathematical model of a system based on a sample dataset collected from that system.

One of the main goals of training an ML model is to teach the model to separate the signal present in the data from the noise inherent in system and in the data collection process. If this is done effectively, the model can then be used to make accurate predictions about the system when presented with new, similar data. Additionally, introspecting on an ML model can reveal key information about the system being modeled, such as which inputs and transformations of the inputs are most useful to the ML model for learning the signal in the data, and are therefore the most predictive.

There are a variety of ML problem types. Supervised learning describes the case where the collected data contains an output value to be modeled and a set of inputs with which to train the model. EvalML focuses on training supervised learning models.

EvalML supports three common supervised ML problem types. The first is regression, where the target value to model is a continuous numeric value. Next are binary and multiclass classification, where the target value to model consists of two or more discrete values or categories. The choice of which supervised ML problem type is most appropriate depends on domain expertise and on how the model will be evaluated and used.

EvalML is currently building support for supervised time series problems: time series regression, time series binary classification, and time series multiclass classification. While we’ve added some features to tackle these kinds of problems, our functionality is still being actively developed so please be mindful of that before using it.

AutoML and Search#

AutoML is the process of automating the construction, training and evaluation of ML models. Given a data and some configuration, AutoML searches for the most effective and accurate ML model or models to fit the dataset. During the search, AutoML will explore different combinations of model type, model parameters and model architecture.

An effective AutoML solution offers several advantages over constructing and tuning ML models by hand. AutoML can assist with many of the difficult aspects of ML, such as avoiding overfitting and underfitting, imbalanced data, detecting data leakage and other potential issues with the problem setup, and automatically applying best-practice data cleaning, feature engineering, feature selection and various modeling techniques. AutoML can also leverage search algorithms to optimally sweep the hyperparameter search space, resulting in model performance which would be difficult to achieve by manual training.

AutoML in EvalML#

EvalML supports all of the above and more.

In its simplest usage, the AutoML search interface requires only the input data, the target data and a problem_type specifying what kind of supervised ML problem to model.

** Graphing methods, like verbose AutoMLSearch, on Jupyter Notebook and Jupyter Lab require ipywidgets to be installed.

** If graphing on Jupyter Lab, jupyterlab-plotly required. To download this, make sure you have npm installed.

[1]:

import evalml
from evalml.utils import infer_feature_types

X, y = evalml.demos.load_fraud(n_rows=650)

             Number of Features
Boolean                       1
Categorical                   6
Numeric                       5

Number of training examples: 650
Targets
False    86.31%
True     13.69%
Name: count, dtype: object

The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.

To provide data to EvalML, it is recommended that you initialize a Woodwork accessor on your data. This allows you to easily control how EvalML will treat each of your features before training a model.

EvalML also accepts pandas input, and will run type inference on top of the input pandas data. If you’d like to change the types inferred by EvalML, you can use the infer_feature_types utility method, which takes pandas or numpy input and converts it to a Woodwork data structure. The feature_types parameter can be used to specify what types specific columns should be.

Feature types such as Natural Language must be specified in this way, otherwise Woodwork will infer it as Unknown type and drop it during the AutoMLSearch.

In the example below, we reformat a couple features to make them easily consumable by the model, and then specify that the provider, which would have otherwise been inferred as a column with natural language, is a categorical column.

[2]:

X.ww["expiration_date"] = X["expiration_date"].apply(
    lambda x: "20{}-01-{}".format(x.split("/")[1], x.split("/")[0])
)
X = infer_feature_types(
    X,
    feature_types={
        "store_id": "categorical",
        "expiration_date": "datetime",
        "lat": "categorical",
        "lng": "categorical",
        "provider": "categorical",
    },
)

The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.

In order to validate the results of the pipeline creation and optimization process, we will save some of our data as a holdout set.

[3]:

X_train, X_holdout, y_train, y_holdout = evalml.preprocessing.split_data(
    X, y, problem_type="binary", test_size=0.2
)

Data Checks#

Before calling AutoMLSearch.search, we should run some sanity checks on our data to ensure that the input data being passed will not run into some common issues before running a potentially time-consuming search. EvalML has various data checks that makes this easy. Each data check will return a collection of warnings and errors if it detects potential issues with the input data. This allows users to inspect their data to avoid confusing errors that may arise during the search process. You can learn about each of the data checks available through our data checks guide.

Here, we will run the DefaultDataChecks class, which contains a series of data checks that are generally useful.

[4]:

from evalml.data_checks import DefaultDataChecks

data_checks = DefaultDataChecks("binary", "log loss binary")
data_checks.validate(X_train, y_train)

[4]:

[]

Since there were no warnings or errors returned, we can safely continue with the search process.

Holdout Set for Pipeline Ranking#

If the holdout_set_size parameter is set and the input dataset has more than 500 rows, AutoMLSearch will create a holdout set from holdout_set_size of the training data. Alternatively, a holdout set can be manually specified by using the X_holdout and y_holdout parameters in AutoMLSearch(). In this example, the holdout set created previously will be used by AutoML search.

During the AutoML search process, the mean of the objective scores of all cross validation folds (shown the “mean_cv_score” column in the pipeline rankings), is calculated. This score is passed to the AutoML search tuner to further optimize the hyperparameters of the next batch of pipelines.

After, the pipeline will be fitted on the entire training dataset and scored on this new holdout set. This score is represented under the “ranking_score” column on the pipeline rankings board and is used to rank pipeline performance.

If a dataset has less than 500 rows or holdout_set_size=0 (which is the default setting), the “mean_cv_score” will be used as the ranking_score instead.

[5]:

automl = evalml.automl.AutoMLSearch(
    X_train=X_train,
    y_train=y_train,
    X_holdout=X_holdout,
    y_holdout=y_holdout,
    problem_type="binary",
    verbose=True,
)
automl.search(interactive_plot=False)

AutoMLSearch will use the holdout set to score and rank pipelines.
Removing columns ['currency'] because they are of 'Unknown' type
Using default limit of max_batches=2.


*****************************
* Beginning pipeline search *
*****************************

Optimizing for Log Loss Binary.
Lower score is better.

Using SequentialEngine to train and score pipelines.
Searching up to 2 batches for a total of None pipelines.
Allowed model families:

Evaluating Baseline Pipeline: Mode Baseline Binary Classification Pipeline
Mode Baseline Binary Classification Pipeline:
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 4.921
        Starting holdout set scoring
        Finished holdout set scoring - Log Loss Binary: 4.991

*****************************
* Evaluating Batch Number 1 *
*****************************

Random Forest Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + One Hot Encoder + Oversampler + RF Classifier Select From Model:
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 0.254
        Starting holdout set scoring
        Finished holdout set scoring - Log Loss Binary: 0.219

*****************************
* Evaluating Batch Number 2 *
*****************************

LightGBM Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Oversampler:
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 0.300
        Starting holdout set scoring
        Finished holdout set scoring - Log Loss Binary: 0.161
Extra Trees Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Oversampler:
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 0.361
        Starting holdout set scoring
        Finished holdout set scoring - Log Loss Binary: 0.348
Elastic Net Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Standard Scaler + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Standard Scaler + Oversampler:
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 0.375
        Starting holdout set scoring
        Finished holdout set scoring - Log Loss Binary: 0.400
XGBoost Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Oversampler:
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 0.260
        Starting holdout set scoring
        Finished holdout set scoring - Log Loss Binary: 0.167
Logistic Regression Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Standard Scaler + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Standard Scaler + Oversampler:
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 0.374
        Starting holdout set scoring
        Finished holdout set scoring - Log Loss Binary: 0.402

Search finished after 35.08 seconds
Best pipeline: LightGBM Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Oversampler
Best pipeline Log Loss Binary: 0.160955

[5]:

{1: {'Random Forest Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + One Hot Encoder + Oversampler + RF Classifier Select From Model': 6.49235200881958,
  'Total time of batch': 6.623929977416992},
 2: {'LightGBM Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Oversampler': 3.9312703609466553,
  'Extra Trees Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Oversampler': 5.874424695968628,
  'Elastic Net Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Standard Scaler + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Standard Scaler + Oversampler': 5.292828559875488,
  'XGBoost Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Oversampler': 4.316744089126587,
  'Logistic Regression Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Standard Scaler + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Standard Scaler + Oversampler': 7.292546272277832,
  'Total time of batch': 27.519416093826294}}

With the verbose argument set to True, the AutoML search will log its progress, reporting each pipeline and parameter set evaluated during the search. The search iteration plot shown during AutoML search tracks the current pipeline’s validation score (tracked as the gray point) against the best pipeline validation score (tracked as the blue line).

There are a number of mechanisms to control the AutoML search time. One way is to set the max_batches parameter which controls the maximum number of rounds of AutoML to evaluate, where each round may train and score a variable number of pipelines. Another way is to set the max_iterations parameter which controls the maximum number of candidate models to be evaluated during AutoML. By default, AutoML will search for a single batch. The first pipeline to be evaluated will always be a baseline model representing a trivial solution.

The AutoML interface supports a variety of other parameters. For a comprehensive list, please refer to the API reference.

We also provide a standalone search method which does all of the above in a single line, and returns the AutoMLSearch instance and data check results. If there were data check errors, AutoML will not be run and no AutoMLSearch instance will be returned.

Detecting Problem Type#

EvalML includes a simple method, detect_problem_type, to help determine the problem type given the target data.

This function can return the predicted problem type as a ProblemType enum, choosing from ProblemType.BINARY, ProblemType.MULTICLASS, and ProblemType.REGRESSION. If the target data is invalid (for instance when there is only 1 unique label), the function will throw an error instead.

[6]:

import pandas as pd
from evalml.problem_types import detect_problem_type

y_binary = pd.Series([0, 1, 1, 0, 1, 1])
detect_problem_type(y_binary)

[6]:

<ProblemTypes.BINARY: 'binary'>

Objective parameter#

AutoMLSearch takes in an objective parameter to determine which objective to optimize for. By default, this parameter is set to auto, which allows AutoML to choose LogLossBinary for binary classification problems, LogLossMulticlass for multiclass classification problems, and R2 for regression problems.

It should be noted that the objective parameter is only used in ranking and helping choose the pipelines to iterate over, but is not used to optimize each individual pipeline during fit-time.

To get the default objective for each problem type, you can use the get_default_primary_search_objective function.

[7]:

from evalml.automl import get_default_primary_search_objective

binary_objective = get_default_primary_search_objective("binary")
multiclass_objective = get_default_primary_search_objective("multiclass")
regression_objective = get_default_primary_search_objective("regression")

print(binary_objective.name)
print(multiclass_objective.name)
print(regression_objective.name)

Log Loss Binary
Log Loss Multiclass
R2

Using custom pipelines#

EvalML’s AutoML algorithm generates a set of pipelines to search with. To provide a custom set instead, set allowed_component_graphs to a dictionary of custom component graphs. AutoMLSearch will use these to generate Pipeline instances. Note: this will prevent AutoML from generating other pipelines to search over.

[8]:

from evalml.pipelines import MulticlassClassificationPipeline


automl_custom = evalml.automl.AutoMLSearch(
    X_train=X_train,
    y_train=y_train,
    problem_type="multiclass",
    verbose=True,
    allowed_component_graphs={
        "My_pipeline": ["Simple Imputer", "Random Forest Classifier"],
        "My_other_pipeline": ["One Hot Encoder", "Random Forest Classifier"],
    },
)

AutoMLSearch will use mean CV score to rank pipelines.
Removing columns ['currency'] because they are of 'Unknown' type
Using default limit of max_batches=2.

Stopping the search early#

To stop the search early, hit Ctrl-C. This will bring up a prompt asking for confirmation. Responding with y will immediately stop the search. Responding with n will continue the search.

Callback functions#

AutoMLSearch supports several callback functions, which can be specified as parameters when initializing an AutoMLSearch object. They are:

start_iteration_callback
add_result_callback
error_callback

Start Iteration Callback#

Users can set start_iteration_callback to set what function is called before each pipeline training iteration. This callback function must take three positional parameters: the pipeline class, the pipeline parameters, and the AutoMLSearch object.

[9]:

## start_iteration_callback example function
def start_iteration_callback_example(pipeline_class, pipeline_params, automl_obj):
    print("Training pipeline with the following parameters:", pipeline_params)

Add Result Callback#

Users can set add_result_callback to set what function is called after each pipeline training iteration. This callback function must take three positional parameters: a dictionary containing the training results for the new pipeline, an untrained_pipeline containing the parameters used during training, and the AutoMLSearch object.

[10]:

## add_result_callback example function
def add_result_callback_example(pipeline_results_dict, untrained_pipeline, automl_obj):
    print(
        "Results for trained pipeline with the following parameters:",
        pipeline_results_dict,
    )

Error Callback#

Users can set the error_callback to set what function called when search() errors and raises an Exception. This callback function takes three positional parameters: the Exception raised, the traceback, and the AutoMLSearch object. This callback function must also accept kwargs, so AutoMLSearch is able to pass along other parameters used by default.

Evalml defines several error callback functions, which can be found under evalml.automl.callbacks. They are:

silent_error_callback
raise_error_callback
log_and_save_error_callback
raise_and_save_error_callback
log_error_callback (default used when error_callback is None)

[11]:

# error_callback example; this is implemented in the evalml library
def raise_error_callback(exception, traceback, automl, **kwargs):
    """Raises the exception thrown by the AutoMLSearch object. Also logs the exception as an error."""
    logger.error(f"AutoMLSearch raised a fatal exception: {str(exception)}")
    logger.error("\n".join(traceback))
    raise exception

View Rankings#

A summary of all the pipelines built can be returned as a pandas DataFrame which is sorted by the validation score.

For AutoML searches completed with a holdout set, the validation score is the holdout score of the pipeline fitted using the entire training dataset.
For AutoML searches completed without a holdout set, the validation score is the average score across all cross-validation folds.

[12]:

automl.rankings

[12]:

	id	pipeline_name	search_order	ranking_score	holdout_score	mean_cv_score	standard_deviation_cv_score	percent_better_than_baseline	high_variance_cv	parameters
0	2	LightGBM Classifier w/ Label Encoder + Select ...	2	0.160955	0.160955	0.299971	0.206176	93.904575	False	{'Label Encoder': {'positive_label': None}, 'N...
1	5	XGBoost Classifier w/ Label Encoder + Select C...	5	0.166761	0.166761	0.260214	0.148578	94.712440	False	{'Label Encoder': {'positive_label': None}, 'N...
2	1	Random Forest Classifier w/ Label Encoder + Dr...	1	0.219145	0.219145	0.254382	0.045124	94.830946	False	{'Label Encoder': {'positive_label': None}, 'D...
3	3	Extra Trees Classifier w/ Label Encoder + Sele...	3	0.348408	0.348408	0.361341	0.021758	92.657543	False	{'Label Encoder': {'positive_label': None}, 'N...
4	4	Elastic Net Classifier w/ Label Encoder + Sele...	4	0.400375	0.400375	0.374725	0.050027	92.385573	False	{'Label Encoder': {'positive_label': None}, 'N...
5	6	Logistic Regression Classifier w/ Label Encode...	6	0.401581	0.401581	0.374364	0.049925	92.392914	False	{'Label Encoder': {'positive_label': None}, 'N...
6	0	Mode Baseline Binary Classification Pipeline	0	4.990660	4.990660	4.921248	0.112910	0.000000	False	{'Label Encoder': {'positive_label': None}, 'B...

Recommendation Score#

If you would like a more robust evaluation of the performance of your models, EvalML additionally provides a recommendation score alongside the selected objective. The recommendation score is a weighted average of a number of default objectives for your problem type, normalized and scaled so that the final score can be interpreted as a percentage from 0 to 100. This weighted score provides a more holistic understanding of model performance, and prioritizes model generalizability rather than one single objective which may not completely serve your use case.

[13]:

automl.get_recommendation_scores(use_pipeline_names=True)

[13]:

{'Baseline Classifier': 25.0,
 'Random Forest Classifier': 89.2028059447534,
 'LightGBM Classifier': 91.29441485901573,
 'Extra Trees Classifier': 76.4891509448369,
 'Elastic Net Classifier': 64.98618569828929,
 'XGBoost Classifier': 90.93050768077345,
 'Logistic Regression Classifier': 64.88094236798518}

[14]:

automl.get_recommendation_scores(priority="F1", use_pipeline_names=True)

[14]:

{'Baseline Classifier': 16.666666666666664,
 'Random Forest Classifier': 87.42552654381409,
 'LightGBM Classifier': 90.02960990601049,
 'Extra Trees Classifier': 68.38407164438401,
 'Elastic Net Classifier': 53.893229489916436,
 'XGBoost Classifier': 89.78700512051563,
 'Logistic Regression Classifier': 53.8230672697137}

To see what objectives are included in the recommendation score, you can use:

[15]:

evalml.objectives.get_default_recommendation_objectives("binary")

[15]:

{'AUC', 'Balanced Accuracy Binary', 'F1', 'Log Loss Binary'}

If you would like to automatically rank your pipelines by this recommendation score, you can set use_recommendation=True when initializing AutoMLSearch.

[16]:

automl_recommendation = evalml.automl.AutoMLSearch(
    X_train=X_train,
    y_train=y_train,
    X_holdout=X_holdout,
    y_holdout=y_holdout,
    problem_type="binary",
    use_recommendation=True,
)
automl_recommendation.search(interactive_plot=False)

automl_recommendation.rankings[
    [
        "id",
        "pipeline_name",
        "search_order",
        "recommendation_score",
        "holdout_score",
        "mean_cv_score",
    ]
]

[16]:

	id	pipeline_name	search_order	recommendation_score	holdout_score	mean_cv_score
0	2	LightGBM Classifier w/ Label Encoder + Select ...	2	91.294415	0.160955	0.299971
1	5	XGBoost Classifier w/ Label Encoder + Select C...	5	90.930508	0.166761	0.260214
2	1	Random Forest Classifier w/ Label Encoder + Dr...	1	89.202806	0.219145	0.254382
3	3	Extra Trees Classifier w/ Label Encoder + Sele...	3	76.489151	0.348408	0.361341
4	4	Elastic Net Classifier w/ Label Encoder + Sele...	4	64.986186	0.400375	0.374725
5	6	Logistic Regression Classifier w/ Label Encode...	6	64.880942	0.401581	0.374364
6	0	Mode Baseline Binary Classification Pipeline	0	25.000000	4.990660	4.921248

There is a helper function on the AutoMLSearch object to help you understand how the recommendation score was calculated. It displays the raw scores of the objectives included within the score calculation. Here, we take a look at pipeline with id=9, the Decision Tree pipeline:

[17]:

automl_recommendation.get_recommendation_score_breakdown(3)

[17]:

{'F1': 0.5217391304347826,
 'Balanced Accuracy Binary': 0.7619047619047619,
 'Log Loss Binary': 0.3484078428021002,
 'AUC': 0.845734126984127}

Describe Pipeline#

Each pipeline is given an id. We can get more information about any particular pipeline using that id. Here, we will get more information about the pipeline with id = 1.

[18]:

automl.describe_pipeline(1)


**************************************************************************************************************************************************************************
* Random Forest Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + One Hot Encoder + Oversampler + RF Classifier Select From Model *
**************************************************************************************************************************************************************************

Problem Type: binary
Model Family: Random Forest

Pipeline Steps
==============
1. Label Encoder
         * positive_label : None
2. Drop Columns Transformer
         * columns : ['currency']
3. DateTime Featurizer
         * features_to_extract : ['year', 'month', 'day_of_week', 'hour']
         * encode_as_categories : False
         * time_index : None
4. Imputer
         * categorical_impute_strategy : most_frequent
         * numeric_impute_strategy : mean
         * boolean_impute_strategy : most_frequent
         * categorical_fill_value : None
         * numeric_fill_value : None
         * boolean_fill_value : None
5. One Hot Encoder
         * top_n : 10
         * features_to_encode : None
         * categories : None
         * drop : if_binary
         * handle_unknown : ignore
         * handle_missing : error
6. Oversampler
         * sampling_ratio : 0.25
         * k_neighbors_default : 5
         * n_jobs : -1
         * sampling_ratio_dict : None
         * categorical_features : [3, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
         * k_neighbors : 5
7. RF Classifier Select From Model
         * number_features : None
         * n_estimators : 10
         * max_depth : None
         * percent_features : 0.5
         * threshold : median
         * n_jobs : -1
8. Random Forest Classifier
         * n_estimators : 100
         * max_depth : 6
         * n_jobs : -1

Training
========
Training for binary problems.
Total training time (including CV): 6.5 seconds

Cross Validation
----------------
             Log Loss Binary  MCC Binary  Gini   AUC  Precision    F1  Balanced Accuracy Binary  Accuracy Binary # Training # Validation
0                      0.240       0.823 0.844 0.922      1.000 0.829                     0.854            0.960        346          174
1                      0.305       0.524 0.493 0.747      1.000 0.467                     0.652            0.908        347          173
2                      0.218       0.875 0.839 0.920      1.000 0.884                     0.896            0.971        347          173
mean                   0.254       0.741 0.726 0.863      1.000 0.727                     0.801            0.946          -            -
std                    0.045       0.189 0.201 0.101      0.000 0.227                     0.130            0.034          -            -
coef of var            0.177       0.255 0.277 0.117      0.000 0.312                     0.163            0.036          -            -

Get Pipeline#

We can get the object of any pipeline via their id as well:

[19]:

pipeline = automl.get_pipeline(1)
print(pipeline.name)
print(pipeline.parameters)

Random Forest Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + One Hot Encoder + Oversampler + RF Classifier Select From Model
{'Label Encoder': {'positive_label': None}, 'Drop Columns Transformer': {'columns': ['currency']}, 'DateTime Featurizer': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False, 'time_index': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Oversampler': {'sampling_ratio': 0.25, 'k_neighbors_default': 5, 'n_jobs': -1, 'sampling_ratio_dict': None, 'categorical_features': [3, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49], 'k_neighbors': 5}, 'RF Classifier Select From Model': {'number_features': None, 'n_estimators': 10, 'max_depth': None, 'percent_features': 0.5, 'threshold': 'median', 'n_jobs': -1}, 'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}

Get best pipeline#

If you specifically want to get the best pipeline, there is a convenient accessor for that. The pipeline returned is already fitted on the input X, y data that we passed to AutoMLSearch. To turn off this default behavior, set train_best_pipeline=False when initializing AutoMLSearch.

[20]:

best_pipeline = automl.best_pipeline
print(best_pipeline.name)
print(best_pipeline.parameters)
best_pipeline.predict(X_train)

LightGBM Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Oversampler
{'Label Encoder': {'positive_label': None}, 'Numeric Pipeline - Select Columns By Type Transformer': {'column_types': ['category', 'EmailAddress', 'URL'], 'exclude': True}, 'Numeric Pipeline - Label Encoder': {'positive_label': None}, 'Numeric Pipeline - Drop Columns Transformer': {'columns': ['currency']}, 'Numeric Pipeline - DateTime Featurizer': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False, 'time_index': None}, 'Numeric Pipeline - Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'Numeric Pipeline - Select Columns Transformer': {'columns': ['card_id', 'store_id', 'amount', 'customer_present', 'lat', 'lng', 'datetime_month', 'datetime_day_of_week', 'datetime_hour']}, 'Categorical Pipeline - Select Columns Transformer': {'columns': ['expiration_date', 'provider', 'region', 'country']}, 'Categorical Pipeline - Label Encoder': {'positive_label': None}, 'Categorical Pipeline - Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'Categorical Pipeline - One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Oversampler': {'sampling_ratio': 0.25, 'k_neighbors_default': 5, 'n_jobs': -1, 'sampling_ratio_dict': None, 'categorical_features': [3, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48], 'k_neighbors': 5}, 'LightGBM Classifier': {'boosting_type': 'gbdt', 'learning_rate': 0.1, 'n_estimators': 100, 'max_depth': 0, 'num_leaves': 31, 'min_child_samples': 20, 'n_jobs': -1, 'bagging_freq': 0, 'bagging_fraction': 0.9, 'verbose': -1}}

[20]:

id
144    False
253     True
221    False
432    False
384    False
       ...
128    False
98     False
472    False
642    False
494    False
Name: fraud, Length: 520, dtype: bool

Training and Scoring Multiple Pipelines using AutoMLSearch#

AutoMLSearch will automatically fit the best pipeline on the entire training data. It also provides an easy API for training and scoring other pipelines.

If you’d like to train one or more pipelines on the entire training data, you can use the train_pipelines method.

Similarly, if you’d like to score one or more pipelines on a particular dataset, you can use the score_pipelines method.

[21]:

trained_pipelines = automl.train_pipelines([automl.get_pipeline(i) for i in [0, 1, 2]])
trained_pipelines

[21]:

{'Mode Baseline Binary Classification Pipeline': pipeline = BinaryClassificationPipeline(component_graph={'Label Encoder': ['Label Encoder', 'X', 'y'], 'Baseline Classifier': ['Baseline Classifier', 'Label Encoder.x', 'Label Encoder.y']}, parameters={'Label Encoder':{'positive_label': None}, 'Baseline Classifier':{'strategy': 'mode'}}, custom_name='Mode Baseline Binary Classification Pipeline', random_seed=0),
 'Random Forest Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + One Hot Encoder + Oversampler + RF Classifier Select From Model': pipeline = BinaryClassificationPipeline(component_graph={'Label Encoder': ['Label Encoder', 'X', 'y'], 'Drop Columns Transformer': ['Drop Columns Transformer', 'X', 'Label Encoder.y'], 'DateTime Featurizer': ['DateTime Featurizer', 'Drop Columns Transformer.x', 'Label Encoder.y'], 'Imputer': ['Imputer', 'DateTime Featurizer.x', 'Label Encoder.y'], 'One Hot Encoder': ['One Hot Encoder', 'Imputer.x', 'Label Encoder.y'], 'Oversampler': ['Oversampler', 'One Hot Encoder.x', 'Label Encoder.y'], 'RF Classifier Select From Model': ['RF Classifier Select From Model', 'Oversampler.x', 'Oversampler.y'], 'Random Forest Classifier': ['Random Forest Classifier', 'RF Classifier Select From Model.x', 'Oversampler.y']}, parameters={'Label Encoder':{'positive_label': None}, 'Drop Columns Transformer':{'columns': ['currency']}, 'DateTime Featurizer':{'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False, 'time_index': None}, 'Imputer':{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'One Hot Encoder':{'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Oversampler':{'sampling_ratio': 0.25, 'k_neighbors_default': 5, 'n_jobs': -1, 'sampling_ratio_dict': None, 'categorical_features': [3, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49], 'k_neighbors': 5}, 'RF Classifier Select From Model':{'number_features': None, 'n_estimators': 10, 'max_depth': None, 'percent_features': 0.5, 'threshold': 'median', 'n_jobs': -1}, 'Random Forest Classifier':{'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}, random_seed=0),
 'LightGBM Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Oversampler': pipeline = BinaryClassificationPipeline(component_graph={'Label Encoder': ['Label Encoder', 'X', 'y'], 'Numeric Pipeline - Select Columns By Type Transformer': ['Select Columns By Type Transformer', 'X', 'Label Encoder.y'], 'Numeric Pipeline - Label Encoder': ['Label Encoder', 'Numeric Pipeline - Select Columns By Type Transformer.x', 'Label Encoder.y'], 'Numeric Pipeline - Drop Columns Transformer': ['Drop Columns Transformer', 'Numeric Pipeline - Select Columns By Type Transformer.x', 'Numeric Pipeline - Label Encoder.y'], 'Numeric Pipeline - DateTime Featurizer': ['DateTime Featurizer', 'Numeric Pipeline - Drop Columns Transformer.x', 'Numeric Pipeline - Label Encoder.y'], 'Numeric Pipeline - Imputer': ['Imputer', 'Numeric Pipeline - DateTime Featurizer.x', 'Numeric Pipeline - Label Encoder.y'], 'Numeric Pipeline - Select Columns Transformer': ['Select Columns Transformer', 'Numeric Pipeline - Imputer.x', 'Numeric Pipeline - Label Encoder.y'], 'Categorical Pipeline - Select Columns Transformer': ['Select Columns Transformer', 'X', 'Label Encoder.y'], 'Categorical Pipeline - Label Encoder': ['Label Encoder', 'Categorical Pipeline - Select Columns Transformer.x', 'Label Encoder.y'], 'Categorical Pipeline - Imputer': ['Imputer', 'Categorical Pipeline - Select Columns Transformer.x', 'Categorical Pipeline - Label Encoder.y'], 'Categorical Pipeline - One Hot Encoder': ['One Hot Encoder', 'Categorical Pipeline - Imputer.x', 'Categorical Pipeline - Label Encoder.y'], 'Oversampler': ['Oversampler', 'Numeric Pipeline - Select Columns Transformer.x', 'Categorical Pipeline - One Hot Encoder.x', 'Categorical Pipeline - Label Encoder.y'], 'LightGBM Classifier': ['LightGBM Classifier', 'Oversampler.x', 'Oversampler.y']}, parameters={'Label Encoder':{'positive_label': None}, 'Numeric Pipeline - Select Columns By Type Transformer':{'column_types': ['category', 'EmailAddress', 'URL'], 'exclude': True}, 'Numeric Pipeline - Label Encoder':{'positive_label': None}, 'Numeric Pipeline - Drop Columns Transformer':{'columns': ['currency']}, 'Numeric Pipeline - DateTime Featurizer':{'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False, 'time_index': None}, 'Numeric Pipeline - Imputer':{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'Numeric Pipeline - Select Columns Transformer':{'columns': ['card_id', 'store_id', 'amount', 'customer_present', 'lat', 'lng', 'datetime_month', 'datetime_day_of_week', 'datetime_hour']}, 'Categorical Pipeline - Select Columns Transformer':{'columns': ['expiration_date', 'provider', 'region', 'country']}, 'Categorical Pipeline - Label Encoder':{'positive_label': None}, 'Categorical Pipeline - Imputer':{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'Categorical Pipeline - One Hot Encoder':{'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Oversampler':{'sampling_ratio': 0.25, 'k_neighbors_default': 5, 'n_jobs': -1, 'sampling_ratio_dict': None, 'categorical_features': [3, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48], 'k_neighbors': 5}, 'LightGBM Classifier':{'boosting_type': 'gbdt', 'learning_rate': 0.1, 'n_estimators': 100, 'max_depth': 0, 'num_leaves': 31, 'min_child_samples': 20, 'n_jobs': -1, 'bagging_freq': 0, 'bagging_fraction': 0.9, 'verbose': -1}}, random_seed=0)}

[22]:

pipeline_holdout_scores = automl.score_pipelines(
    [trained_pipelines[name] for name in trained_pipelines.keys()],
    X_holdout,
    y_holdout,
    ["Accuracy Binary", "F1", "AUC"],
)
pipeline_holdout_scores

[22]:

{'Mode Baseline Binary Classification Pipeline': OrderedDict([('Accuracy Binary',
               0.8615384615384616),
              ('F1', 0.0),
              ('AUC', 0.5)]),
 'Random Forest Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + One Hot Encoder + Oversampler + RF Classifier Select From Model': OrderedDict([('Accuracy Binary',
               0.9615384615384616),
              ('F1', 0.8387096774193548),
              ('AUC', 0.9122023809523809)]),
 'LightGBM Classifier w/ Label Encoder + Select Columns By Type Transformer + Label Encoder + Drop Columns Transformer + DateTime Featurizer + Imputer + Select Columns Transformer + Select Columns Transformer + Label Encoder + Imputer + One Hot Encoder + Oversampler': OrderedDict([('Accuracy Binary',
               0.9692307692307692),
              ('F1', 0.8750000000000001),
              ('AUC', 0.9201388888888888)])}

Saving AutoMLSearch and pipelines from AutoMLSearch#

There are two ways to save results from AutoMLSearch.

You can save the AutoMLSearch object itself, calling .save(<filepath>) to do so. This will allow you to save the AutoMLSearch state and reload all pipelines from this.
If you want to save a pipeline from AutoMLSearch for future use, pipeline classes themselves have a .save(<filepath>) method.

[23]:

# saving the entire automl search
automl.save("automl.cloudpickle")
automl2 = evalml.automl.AutoMLSearch.load("automl.cloudpickle")
# saving the best pipeline using .save()
best_pipeline.save("pipeline.cloudpickle")
best_pipeline_copy = evalml.pipelines.PipelineBase.load("pipeline.cloudpickle")

Limiting the AutoML Search Space#

The AutoML search algorithm first trains each component in the pipeline with their default values. After the first iteration, it then tweaks the parameters of these components using the pre-defined hyperparameter ranges that these components have. To limit the search over certain hyperparameter ranges, you can specify a search_parameters argument with your AutoMLSearch parameters. These parameters will limit the hyperparameter search space or pipeline parameter space.

Hyperparameter ranges can be found through the API reference for each component. Parameter arguments must be specified as dictionaries, but the associated values must be skopt.space Real, Integer, Categorical objects for setting hyperparameter ranges.

If however you’d like to specify certain values for the initial batch of the AutoML search algorithm, you can use the search_parameters argument with non skopt.space objects. This will set the initial batch’s component parameters to the values passed by this argument.

[24]:

from evalml import AutoMLSearch
from evalml.demos import load_fraud
from skopt.space import Categorical
from evalml.model_family import ModelFamily
import woodwork as ww

X, y = load_fraud(n_rows=1000)

# example of setting parameter to just one value
search_parameters = {"Imputer": {"numeric_impute_strategy": "mean"}}


# limit the numeric impute strategy to include only `median` and `most_frequent`
# `mean` is the default value for this argument, but it doesn't need to be included in the specified hyperparameter range for this to work
search_parameters = {
    "Imputer": {"numeric_impute_strategy": Categorical(["median", "most_frequent"])}
}

# using this custom hyperparameter means that our Imputer components in these pipelines will only search through
# 'median' and 'most_frequent' strategies for 'numeric_impute_strategy'
automl_constrained = AutoMLSearch(
    X_train=X,
    y_train=y,
    problem_type="binary",
    search_parameters=search_parameters,
    verbose=True,
)

             Number of Features
Boolean                       1
Categorical                   6
Numeric                       5

Number of training examples: 1000
Targets
False    85.90%
True     14.10%
Name: count, dtype: object
AutoMLSearch will use mean CV score to rank pipelines.
Using default limit of max_batches=2.

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.82.0/lib/python3.8/site-packages/woodwork/type_sys/utils.py:40: UserWarning:

The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.82.0/lib/python3.8/site-packages/woodwork/type_sys/utils.py:40: UserWarning:

The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.82.0/lib/python3.8/site-packages/woodwork/type_sys/utils.py:40: UserWarning:

The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.82.0/lib/python3.8/site-packages/woodwork/type_sys/utils.py:40: UserWarning:

Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.