Automated Machine Learning (AutoML) Search¶

Background¶

Machine Learning¶

Machine learning (ML) is the process of constructing a mathematical model of a system based on a sample dataset collected from that system.

One of the main goals of training an ML model is to teach the model to separate the signal present in the data from the noise inherent in system and in the data collection process. If this is done effectively, the model can then be used to make accurate predictions about the system when presented with new, similar data. Additionally, introspecting on an ML model can reveal key information about the system being modeled, such as which inputs and transformations of the inputs are most useful to the ML model for learning the signal in the data, and are therefore the most predictive.

There are a variety of ML problem types. Supervised learning describes the case where the collected data contains an output value to be modeled and a set of inputs with which to train the model. EvalML focuses on training supervised learning models.

EvalML supports three common supervised ML problem types. The first is regression, where the target value to model is a continuous numeric value. Next are binary and multiclass classification, where the target value to model consists of two or more discrete values or categories. The choice of which supervised ML problem type is most appropriate depends on domain expertise and on how the model will be evaluated and used.

AutoML and Search¶

AutoML is the process of automating the construction, training and evaluation of ML models. Given a data and some configuration, AutoML searches for the most effective and accurate ML model or models to fit the dataset. During the search, AutoML will explore different combinations of model type, model parameters and model architecture.

An effective AutoML solution offers several advantages over constructing and tuning ML models by hand. AutoML can assist with many of the difficult aspects of ML, such as avoiding overfitting and underfitting, imbalanced data, detecting data leakage and other potential issues with the problem setup, and automatically applying best-practice data cleaning, feature engineering, feature selection and various modeling techniques. AutoML can also leverage search algorithms to optimally sweep the hyperparameter search space, resulting in model performance which would be difficult to achieve by manual training.

AutoML in EvalML¶

EvalML supports all of the above and more.

In its simplest usage, the AutoML search interface requires only the input data, the target data and a problem_type specifying what kind of supervised ML problem to model.

[1]:

import evalml

X, y = evalml.demos.load_breast_cancer()

automl = evalml.automl.AutoMLSearch(problem_type='binary')
automl.search(X, y)

Using default limit of max_pipelines=5.

Generating pipelines to search over...
*****************************
* Beginning pipeline search *
*****************************

Optimizing for Log Loss Binary.
Lower score is better.

Searching up to 5 pipelines.
Allowed model families: linear_model, catboost, xgboost, random_forest

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.11.2/lib/python3.7/site-packages/evalml/pipelines/components/transformers/preprocessing/text_featurization.py:35: RuntimeWarning: No text columns were given to TextFeaturizer, component will have no effect
  warnings.warn("No text columns were given to TextFeaturizer, component will have no effect", RuntimeWarning)

(1/5) Mode Baseline Binary Classification P... Elapsed:00:00
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 0.660
(2/5) CatBoost Classifier w/ Simple Imputer    Elapsed:00:00
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 0.094
(3/5) XGBoost Classifier w/ Simple Imputer     Elapsed:00:22
        Starting cross validation
[08:29:45] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[08:29:45] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.11.2/lib/python3.7/site-packages/xgboost/sklearn.py:888: UserWarning:

The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.11.2/lib/python3.7/site-packages/xgboost/sklearn.py:888: UserWarning:

The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].

[08:29:45] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
        Finished cross validation - mean Log Loss Binary: 0.101
(4/5) Random Forest Classifier w/ Simple Im... Elapsed:00:22
        Starting cross validation

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.11.2/lib/python3.7/site-packages/xgboost/sklearn.py:888: UserWarning:

The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].

        Finished cross validation - mean Log Loss Binary: 0.123
(5/5) Logistic Regression Classifier w/ Sim... Elapsed:00:24
        Starting cross validation
        Finished cross validation - mean Log Loss Binary: 0.091

Search finished after 00:25
Best pipeline: Logistic Regression Classifier w/ Simple Imputer + Standard Scaler
Best pipeline Log Loss Binary: 0.091164

The AutoML search will log its progress, reporting each pipeline and parameter set evaluated during the search.

By default, AutoML will search a fixed number of pipeline and parameter pairs (5). The first pipeline to be evaluated will always be a baseline model representing a trivial solution.

The AutoML interface supports a variety of other parameters. For a comprehensive list, please refer to the API reference.

View Rankings¶

A summary of all the pipelines built can be returned as a pandas DataFrame which is sorted by score.

[2]:

automl.rankings

[2]:

	id	pipeline_name	score	high_variance_cv	parameters
0	4	Logistic Regression Classifier w/ Simple Imput...	0.091164	False	{'Simple Imputer': {'impute_strategy': 'most_f...
1	1	CatBoost Classifier w/ Simple Imputer	0.093553	False	{'Simple Imputer': {'impute_strategy': 'most_f...
2	2	XGBoost Classifier w/ Simple Imputer	0.100965	True	{'Simple Imputer': {'impute_strategy': 'most_f...
3	3	Random Forest Classifier w/ Simple Imputer	0.122537	False	{'Simple Imputer': {'impute_strategy': 'most_f...
4	0	Mode Baseline Binary Classification Pipeline	0.660321	False	{'Baseline Classifier': {'strategy': 'random_w...

Describe Pipeline¶

Each pipeline is given an id. We can get more information about any particular pipeline using that id. Here, we will get more information about the pipeline with id = 1.

[3]:

automl.describe_pipeline(1)

*****************************************
* CatBoost Classifier w/ Simple Imputer *
*****************************************

Problem Type: Binary Classification
Model Family: CatBoost

Pipeline Steps
==============
1. Simple Imputer
         * impute_strategy : most_frequent
         * fill_value : None
2. CatBoost Classifier
         * n_estimators : 1000
         * eta : 0.03
         * max_depth : 6
         * bootstrap_type : None

Training
========
Training for Binary Classification problems.
Total training time (including CV): 22.2 seconds

Cross Validation
----------------
             Log Loss Binary  Accuracy Binary  Balanced Accuracy Binary    F1  Precision   AUC  MCC Binary # Training # Testing
0                      0.106            0.958                     0.949 0.967      0.951 0.995       0.910    379.000   190.000
1                      0.082            0.979                     0.975 0.983      0.975 0.994       0.955    379.000   190.000
2                      0.093            0.974                     0.976 0.979      0.991 0.990       0.944    380.000   189.000
mean                   0.094            0.970                     0.967 0.976      0.973 0.993       0.936          -         -
std                    0.012            0.011                     0.015 0.008      0.020 0.003       0.024          -         -
coef of var            0.128            0.011                     0.016 0.009      0.021 0.003       0.025          -         -

Get Pipeline¶

We can get the object of any pipeline via their id as well:

[4]:

pipeline = automl.get_pipeline(1)
print(pipeline.name)
print(pipeline.parameters)

CatBoost Classifier w/ Simple Imputer
{'Simple Imputer': {'impute_strategy': 'most_frequent', 'fill_value': None}, 'CatBoost Classifier': {'n_estimators': 1000, 'eta': 0.03, 'max_depth': 6, 'bootstrap_type': None}}

Get best pipeline¶

If we specifically want to get the best pipeline, there is a convenient accessor for that.

[5]:

best_pipeline = automl.best_pipeline
print(best_pipeline.name)
print(best_pipeline.parameters)

Logistic Regression Classifier w/ Simple Imputer + Standard Scaler
{'Simple Imputer': {'impute_strategy': 'most_frequent', 'fill_value': None}, 'Logistic Regression Classifier': {'penalty': 'l2', 'C': 1.0, 'n_jobs': -1}}

Access raw results¶

The AutoMLSearch class records detailed results information under the results field, including information about the cross-validation scoring and parameters.

[6]:

automl.results

[6]:

{'pipeline_results': {0: {'id': 0,
   'pipeline_name': 'Mode Baseline Binary Classification Pipeline',
   'pipeline_class': evalml.pipelines.classification.baseline_binary.ModeBaselineBinaryPipeline,
   'pipeline_summary': 'Baseline Classifier',
   'parameters': {'Baseline Classifier': {'strategy': 'random_weighted'}},
   'score': 0.660320827581381,
   'high_variance_cv': False,
   'training_time': 0.02292943000793457,
   'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.6608932451679239),
                  ('Accuracy Binary', 0.6263157894736842),
                  ('Balanced Accuracy Binary', 0.5),
                  ('F1', 0.7702265372168284),
                  ('Precision', 0.6263157894736842),
                  ('AUC', 0.5),
                  ('MCC Binary', 0.0),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.6608932451679239,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.6608932451679239),
                  ('Accuracy Binary', 0.6263157894736842),
                  ('Balanced Accuracy Binary', 0.5),
                  ('F1', 0.7702265372168284),
                  ('Precision', 0.6263157894736842),
                  ('AUC', 0.5),
                  ('MCC Binary', 0.0),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.6608932451679239,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.6591759924082952),
                  ('Accuracy Binary', 0.6296296296296297),
                  ('Balanced Accuracy Binary', 0.5),
                  ('F1', 0.7727272727272727),
                  ('Precision', 0.6296296296296297),
                  ('AUC', 0.5),
                  ('MCC Binary', 0.0),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 0.6591759924082952,
     'binary_classification_threshold': 0.5}]},
  1: {'id': 1,
   'pipeline_name': 'CatBoost Classifier w/ Simple Imputer',
   'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
   'pipeline_summary': 'CatBoost Classifier w/ Simple Imputer',
   'parameters': {'Simple Imputer': {'impute_strategy': 'most_frequent',
     'fill_value': None},
    'CatBoost Classifier': {'n_estimators': 1000,
     'eta': 0.03,
     'max_depth': 6,
     'bootstrap_type': None}},
   'score': 0.09355285580998496,
   'high_variance_cv': False,
   'training_time': 22.245115518569946,
   'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.10583268649418161),
                  ('Accuracy Binary', 0.9578947368421052),
                  ('Balanced Accuracy Binary', 0.9493431175287016),
                  ('F1', 0.9669421487603305),
                  ('Precision', 0.9512195121951219),
                  ('AUC', 0.9945555687063559),
                  ('MCC Binary', 0.909956827190137),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.10583268649418161,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.08186397218927995),
                  ('Accuracy Binary', 0.9789473684210527),
                  ('Balanced Accuracy Binary', 0.9746715587643509),
                  ('F1', 0.9833333333333334),
                  ('Precision', 0.9752066115702479),
                  ('AUC', 0.9943188543022844),
                  ('MCC Binary', 0.955011564828661),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.08186397218927995,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.09296190874649334),
                  ('Accuracy Binary', 0.9735449735449735),
                  ('Balanced Accuracy Binary', 0.9760504201680673),
                  ('F1', 0.9787234042553192),
                  ('Precision', 0.9913793103448276),
                  ('AUC', 0.9899159663865547),
                  ('MCC Binary', 0.9443109474170326),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 0.09296190874649334,
     'binary_classification_threshold': 0.5}]},
  2: {'id': 2,
   'pipeline_name': 'XGBoost Classifier w/ Simple Imputer',
   'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
   'pipeline_summary': 'XGBoost Classifier w/ Simple Imputer',
   'parameters': {'Simple Imputer': {'impute_strategy': 'most_frequent',
     'fill_value': None},
    'XGBoost Classifier': {'eta': 0.1,
     'max_depth': 6,
     'min_child_weight': 1,
     'n_estimators': 100}},
   'score': 0.10096523570751793,
   'high_variance_cv': True,
   'training_time': 0.4224998950958252,
   'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.11449876085695762),
                  ('Accuracy Binary', 0.9578947368421052),
                  ('Balanced Accuracy Binary', 0.9521836903775595),
                  ('F1', 0.9666666666666667),
                  ('Precision', 0.9586776859504132),
                  ('AUC', 0.9915966386554622),
                  ('MCC Binary', 0.9097672817424011),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.11449876085695762,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.07421583775339011),
                  ('Accuracy Binary', 0.9736842105263158),
                  ('Balanced Accuracy Binary', 0.9676293052432241),
                  ('F1', 0.979253112033195),
                  ('Precision', 0.9672131147540983),
                  ('AUC', 0.9959758551307847),
                  ('MCC Binary', 0.943843520216036),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.07421583775339011,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.11418110851220609),
                  ('Accuracy Binary', 0.9576719576719577),
                  ('Balanced Accuracy Binary', 0.9605042016806722),
                  ('F1', 0.9658119658119659),
                  ('Precision', 0.9826086956521739),
                  ('AUC', 0.9885954381752701),
                  ('MCC Binary', 0.9112159507396058),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 0.11418110851220609,
     'binary_classification_threshold': 0.5}]},
  3: {'id': 3,
   'pipeline_name': 'Random Forest Classifier w/ Simple Imputer',
   'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
   'pipeline_summary': 'Random Forest Classifier w/ Simple Imputer',
   'parameters': {'Simple Imputer': {'impute_strategy': 'most_frequent',
     'fill_value': None},
    'Random Forest Classifier': {'n_estimators': 100,
     'max_depth': 6,
     'n_jobs': -1}},
   'score': 0.12253681387225619,
   'high_variance_cv': False,
   'training_time': 1.4093050956726074,
   'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.13984688783161608),
                  ('Accuracy Binary', 0.9421052631578948),
                  ('Balanced Accuracy Binary', 0.9338975026630371),
                  ('F1', 0.9543568464730291),
                  ('Precision', 0.9426229508196722),
                  ('AUC', 0.9893478518167831),
                  ('MCC Binary', 0.8757606542930872),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.13984688783161608,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.1201072101539428),
                  ('Accuracy Binary', 0.9631578947368421),
                  ('Balanced Accuracy Binary', 0.9563853710498283),
                  ('F1', 0.9709543568464729),
                  ('Precision', 0.9590163934426229),
                  ('AUC', 0.989347851816783),
                  ('MCC Binary', 0.9211492315750531),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.1201072101539428,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.10765634363120971),
                  ('Accuracy Binary', 0.9682539682539683),
                  ('Balanced Accuracy Binary', 0.9689075630252101),
                  ('F1', 0.9745762711864406),
                  ('Precision', 0.9829059829059829),
                  ('AUC', 0.9927971188475391),
                  ('MCC Binary', 0.9325680982740896),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 0.10765634363120971,
     'binary_classification_threshold': 0.5}]},
  4: {'id': 4,
   'pipeline_name': 'Logistic Regression Classifier w/ Simple Imputer + Standard Scaler',
   'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline,
   'pipeline_summary': 'Logistic Regression Classifier w/ Simple Imputer + Standard Scaler',
   'parameters': {'Simple Imputer': {'impute_strategy': 'most_frequent',
     'fill_value': None},
    'Logistic Regression Classifier': {'penalty': 'l2',
     'C': 1.0,
     'n_jobs': -1}},
   'score': 0.09116380517655309,
   'high_variance_cv': False,
   'training_time': 0.9941501617431641,
   'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.09347817517438463),
                  ('Accuracy Binary', 0.9789473684210527),
                  ('Balanced Accuracy Binary', 0.9775121316132087),
                  ('F1', 0.9831932773109243),
                  ('Precision', 0.9831932773109243),
                  ('AUC', 0.9936087110900698),
                  ('MCC Binary', 0.9550242632264173),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.09347817517438463,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.08320464479579018),
                  ('Accuracy Binary', 0.9736842105263158),
                  ('Balanced Accuracy Binary', 0.9647887323943662),
                  ('F1', 0.9794238683127572),
                  ('Precision', 0.9596774193548387),
                  ('AUC', 0.9975144987572493),
                  ('MCC Binary', 0.9445075449666159),
                  ('# Training', 379),
                  ('# Testing', 190)]),
     'score': 0.08320464479579018,
     'binary_classification_threshold': 0.5},
    {'all_objective_scores': OrderedDict([('Log Loss Binary',
                   0.09680859555948443),
                  ('Accuracy Binary', 0.9735449735449735),
                  ('Balanced Accuracy Binary', 0.9760504201680673),
                  ('F1', 0.9787234042553192),
                  ('Precision', 0.9913793103448276),
                  ('AUC', 0.9906362545018007),
                  ('MCC Binary', 0.9443109474170326),
                  ('# Training', 380),
                  ('# Testing', 189)]),
     'score': 0.09680859555948443,
     'binary_classification_threshold': 0.5}]}},
 'search_order': [0, 1, 2, 3, 4]}

User Guide Objectives