Automated Machine Learning (AutoML) Search¶
Background¶
Machine Learning¶
Machine learning (ML) is the process of constructing a mathematical model of a system based on a sample dataset collected from that system.
One of the main goals of training an ML model is to teach the model to separate the signal present in the data from the noise inherent in system and in the data collection process. If this is done effectively, the model can then be used to make accurate predictions about the system when presented with new, similar data. Additionally, introspecting on an ML model can reveal key information about the system being modeled, such as which inputs and transformations of the inputs are most useful to the ML model for learning the signal in the data, and are therefore the most predictive.
There are a variety of ML problem types. Supervised learning describes the case where the collected data contains an output value to be modeled and a set of inputs with which to train the model. EvalML focuses on training supervised learning models.
EvalML supports three common supervised ML problem types. The first is regression, where the target value to model is a continuous numeric value. Next are binary and multiclass classification, where the target value to model consists of two or more discrete values or categories. The choice of which supervised ML problem type is most appropriate depends on domain expertise and on how the model will be evaluated and used.
EvalML is currently building support for supervised time series problems: time series regression, time series binary classification, and time series multiclass classification. While we’ve added some features to tackle these kinds of problems, our functionality is still being actively developed so please be mindful of that before using it.
AutoML and Search¶
AutoML is the process of automating the construction, training and evaluation of ML models. Given a data and some configuration, AutoML searches for the most effective and accurate ML model or models to fit the dataset. During the search, AutoML will explore different combinations of model type, model parameters and model architecture.
An effective AutoML solution offers several advantages over constructing and tuning ML models by hand. AutoML can assist with many of the difficult aspects of ML, such as avoiding overfitting and underfitting, imbalanced data, detecting data leakage and other potential issues with the problem setup, and automatically applying best-practice data cleaning, feature engineering, feature selection and various modeling techniques. AutoML can also leverage search algorithms to optimally sweep the hyperparameter search space, resulting in model performance which would be difficult to achieve by manual training.
AutoML in EvalML¶
EvalML supports all of the above and more.
In its simplest usage, the AutoML search interface requires only the input data, the target data and a problem_type
specifying what kind of supervised ML problem to model.
** Graphing methods, like verbose AutoMLSearch, on Jupyter Notebook and Jupyter Lab require ipywidgets to be installed.
** If graphing on Jupyter Lab, jupyterlab-plotly required. To download this, make sure you have npm installed.
[1]:
import evalml
from evalml.utils import infer_feature_types
X, y = evalml.demos.load_fraud(n_rows=250)
Number of Features
Boolean 1
Categorical 6
Numeric 5
Number of training examples: 250
Targets
False 88.40%
True 11.60%
Name: fraud, dtype: object
To provide data to EvalML, it is recommended that you initialize a Woodwork accessor on your data. This allows you to easily control how EvalML will treat each of your features before training a model.
EvalML also accepts pandas
input, and will run type inference on top of the input pandas
data. If you’d like to change the types inferred by EvalML, you can use the infer_feature_types
utility method, which takes pandas or numpy input and converts it to a Woodwork data structure. The feature_types
parameter can be used to specify what types specific columns should be.
Feature types such as Natural Language
must be specified in this way, otherwise Woodwork will infer it as Unknown
type and drop it during the AutoMLSearch.
In the example below, we reformat a couple features to make them easily consumable by the model, and then specify that the provider, which would have otherwise been inferred as a column with natural language, is a categorical column.
[2]:
X.ww['expiration_date'] = X['expiration_date'].apply(lambda x: '20{}-01-{}'.format(x.split("/")[1], x.split("/")[0]))
X = infer_feature_types(X, feature_types= {'store_id': 'categorical',
'expiration_date': 'datetime',
'lat': 'categorical',
'lng': 'categorical',
'provider': 'categorical'})
In order to validate the results of the pipeline creation and optimization process, we will save some of our data as a holdout set.
[3]:
X_train, X_holdout, y_train, y_holdout = evalml.preprocessing.split_data(X, y, problem_type='binary', test_size=.2)
Data Checks¶
Before calling AutoMLSearch.search
, we should run some sanity checks on our data to ensure that the input data being passed will not run into some common issues before running a potentially time-consuming search. EvalML has various data checks that makes this easy. Each data check will return a collection of warnings and errors if it detects potential issues with the input data. This allows users to inspect their data to avoid confusing errors that may arise during the search process. You
can learn about each of the data checks available through our data checks guide
Here, we will run the DefaultDataChecks
class, which contains a series of data checks that are generally useful.
[4]:
from evalml.data_checks import DefaultDataChecks
data_checks = DefaultDataChecks("binary", "log loss binary")
data_checks.validate(X_train, y_train)
[4]:
{'warnings': [], 'errors': [], 'actions': []}
Since there were no warnings or errors returned, we can safely continue with the search process.
[5]:
automl = evalml.automl.AutoMLSearch(X_train=X_train, y_train=y_train, problem_type='binary', verbose=True)
automl.search()
Using default limit of max_batches=1.
Removing columns ['currency'] because they are of 'Unknown' type
Generating pipelines to search over...
8 pipelines ready for search.
*****************************
* Beginning pipeline search *
*****************************
Optimizing for Log Loss Binary.
Lower score is better.
Using SequentialEngine to train and score pipelines.
Searching up to 1 batches for a total of 9 pipelines.
Allowed model families: linear_model, linear_model, xgboost, lightgbm, catboost, random_forest, decision_tree, extra_trees
Evaluating Baseline Pipeline: Mode Baseline Binary Classification Pipeline
Mode Baseline Binary Classification Pipeline:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 3.970
*****************************
* Evaluating Batch Number 1 *
*****************************
Elastic Net Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.607
Logistic Regression Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.620
XGBoost Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.269
LightGBM Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.316
CatBoost Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + Oversampler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.624
Random Forest Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.292
Decision Tree Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 3.579
High coefficient of variation (cv >= 0.5) within cross validation scores.
Decision Tree Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler may not perform as estimated on unseen data.
Extra Trees Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.354
Search finished after 00:23
Best pipeline: XGBoost Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler
Best pipeline Log Loss Binary: 0.269267
With the verbose
argument set to True, the AutoML search will log its progress, reporting each pipeline and parameter set evaluated during the search.
There are a number of mechanisms to control the AutoML search time. One way is to set the max_batches
parameter which controls the maximum number of rounds of AutoML to evaluate, where each round may train and score a variable number of pipelines. Another way is to set the max_iterations
parameter which controls the maximum number of candidate models to be evaluated during AutoML. By default, AutoML will search for a single batch. The first pipeline to be evaluated will always be a
baseline model representing a trivial solution.
The AutoML interface supports a variety of other parameters. For a comprehensive list, please refer to the API reference.
We also provide a standalone search method which does all of the above in a single line, and returns the AutoMLSearch
instance and data check results. If there were data check errors, AutoML will not be run and no AutoMLSearch
instance will be returned.
Detecting Problem Type¶
EvalML includes a simple method, detect_problem_type
, to help determine the problem type given the target data.
This function can return the predicted problem type as a ProblemType enum, choosing from ProblemType.BINARY, ProblemType.MULTICLASS, and ProblemType.REGRESSION. If the target data is invalid (for instance when there is only 1 unique label), the function will throw an error instead.
[6]:
import pandas as pd
from evalml.problem_types import detect_problem_type
y_binary = pd.Series([0, 1, 1, 0, 1, 1])
detect_problem_type(y_binary)
[6]:
<ProblemTypes.BINARY: 'binary'>
Objective parameter¶
AutoMLSearch takes in an objective
parameter to determine which objective
to optimize for. By default, this parameter is set to auto
, which allows AutoML to choose LogLossBinary
for binary classification problems, LogLossMulticlass
for multiclass classification problems, and R2
for regression problems.
It should be noted that the objective
parameter is only used in ranking and helping choose the pipelines to iterate over, but is not used to optimize each individual pipeline during fit-time.
To get the default objective for each problem type, you can use the get_default_primary_search_objective
function.
[7]:
from evalml.automl import get_default_primary_search_objective
binary_objective = get_default_primary_search_objective("binary")
multiclass_objective = get_default_primary_search_objective("multiclass")
regression_objective = get_default_primary_search_objective("regression")
print(binary_objective.name)
print(multiclass_objective.name)
print(regression_objective.name)
Log Loss Binary
Log Loss Multiclass
R2
Using custom pipelines¶
EvalML’s AutoML algorithm generates a set of pipelines to search with. To provide a custom set instead, set allowed_component_graphs to a dictionary of custom component graphs. AutoMLSearch
will use these to generate Pipeline
instances. Note: this will prevent AutoML from generating other pipelines to search over.
[8]:
from evalml.pipelines import MulticlassClassificationPipeline
automl_custom = evalml.automl.AutoMLSearch(
X_train=X_train,
y_train=y_train,
problem_type='multiclass',
verbose=True,
allowed_component_graphs={"My_pipeline": ['Simple Imputer', 'Random Forest Classifier'],
"My_other_pipeline": ['One Hot Encoder', 'Random Forest Classifier']})
Using default limit of max_batches=1.
Removing columns ['currency'] because they are of 'Unknown' type
2 pipelines ready for search.
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.40.0/lib/python3.8/site-packages/evalml/automl/automl_algorithm/iterative_algorithm.py:478: ParameterNotUsedWarning:
Parameters for components {'Oversampler', 'Drop Columns Transformer'} will not be used to instantiate the pipeline since they don't appear in the pipeline
Stopping the search early¶
To stop the search early, hit Ctrl-C
. This will bring up a prompt asking for confirmation. Responding with y
will immediately stop the search. Responding with n
will continue the search.
Callback functions¶
AutoMLSearch
supports several callback functions, which can be specified as parameters when initializing an AutoMLSearch
object. They are:
start_iteration_callback
add_result_callback
error_callback
Start Iteration Callback¶
Users can set start_iteration_callback
to set what function is called before each pipeline training iteration. This callback function must take three positional parameters: the pipeline class, the pipeline parameters, and the AutoMLSearch
object.
[9]:
## start_iteration_callback example function
def start_iteration_callback_example(pipeline_class, pipeline_params, automl_obj):
print ("Training pipeline with the following parameters:", pipeline_params)
Add Result Callback¶
Users can set add_result_callback
to set what function is called after each pipeline training iteration. This callback function must take three positional parameters: a dictionary containing the training results for the new pipeline, an untrained_pipeline containing the parameters used during training, and the AutoMLSearch
object.
[10]:
## add_result_callback example function
def add_result_callback_example(pipeline_results_dict, untrained_pipeline, automl_obj):
print ("Results for trained pipeline with the following parameters:", pipeline_results_dict)
Error Callback¶
Users can set the error_callback
to set what function called when search()
errors and raises an Exception
. This callback function takes three positional parameters: the Exception raised
, the traceback, and the AutoMLSearch object
. This callback function must also accept kwargs
, so AutoMLSearch
is able to pass along other parameters used by default.
Evalml defines several error callback functions, which can be found under evalml.automl.callbacks
. They are:
silent_error_callback
raise_error_callback
log_and_save_error_callback
raise_and_save_error_callback
log_error_callback
(default used whenerror_callback
is None)
[11]:
# error_callback example; this is implemented in the evalml library
def raise_error_callback(exception, traceback, automl, **kwargs):
"""Raises the exception thrown by the AutoMLSearch object. Also logs the exception as an error."""
logger.error(f'AutoMLSearch raised a fatal exception: {str(exception)}')
logger.error("\n".join(traceback))
raise exception
View Rankings¶
A summary of all the pipelines built can be returned as a pandas DataFrame which is sorted by score. The score column contains the average score across all cross-validation folds while the validation_score column is computed from the first cross-validation fold.
[12]:
automl.rankings
[12]:
id | pipeline_name | search_order | mean_cv_score | standard_deviation_cv_score | validation_score | percent_better_than_baseline | high_variance_cv | parameters | |
---|---|---|---|---|---|---|---|---|---|
0 | 3 | XGBoost Classifier w/ Label Encoder + Drop Col... | 3 | 0.269267 | 0.163355 | 0.195522 | 93.218170 | False | {'Label Encoder': {'positive_label': None}, 'D... |
1 | 6 | Random Forest Classifier w/ Label Encoder + Dr... | 6 | 0.292264 | 0.032252 | 0.285956 | 92.638975 | False | {'Label Encoder': {'positive_label': None}, 'D... |
2 | 4 | LightGBM Classifier w/ Label Encoder + Drop Co... | 4 | 0.316412 | 0.156072 | 0.229386 | 92.030785 | False | {'Label Encoder': {'positive_label': None}, 'D... |
3 | 8 | Extra Trees Classifier w/ Label Encoder + Drop... | 8 | 0.353909 | 0.019897 | 0.339161 | 91.086375 | False | {'Label Encoder': {'positive_label': None}, 'D... |
4 | 1 | Elastic Net Classifier w/ Label Encoder + Drop... | 1 | 0.607364 | 0.180430 | 0.431472 | 84.702786 | False | {'Label Encoder': {'positive_label': None}, 'D... |
5 | 2 | Logistic Regression Classifier w/ Label Encode... | 2 | 0.620303 | 0.189890 | 0.440208 | 84.376900 | False | {'Label Encoder': {'positive_label': None}, 'D... |
6 | 5 | CatBoost Classifier w/ Label Encoder + Drop Co... | 5 | 0.623625 | 0.000803 | 0.622742 | 84.293237 | False | {'Label Encoder': {'positive_label': None}, 'D... |
7 | 7 | Decision Tree Classifier w/ Label Encoder + Dr... | 7 | 3.578892 | 0.752281 | 2.779762 | 9.861200 | True | {'Label Encoder': {'positive_label': None}, 'D... |
8 | 0 | Mode Baseline Binary Classification Pipeline | 0 | 3.970423 | 0.266060 | 4.124033 | 0.000000 | False | {'Label Encoder': {'positive_label': None}, 'B... |
Describe Pipeline¶
Each pipeline is given an id
. We can get more information about any particular pipeline using that id
. Here, we will get more information about the pipeline with id = 1
.
[13]:
automl.describe_pipeline(1)
*********************************************************************************************************************************************************************
* Elastic Net Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler + Standard Scaler *
*********************************************************************************************************************************************************************
Problem Type: binary
Model Family: Linear
Pipeline Steps
==============
1. Label Encoder
* positive_label : None
2. Drop Columns Transformer
* columns : ['currency']
3. DateTime Featurization Component
* features_to_extract : ['year', 'month', 'day_of_week', 'hour']
* encode_as_categories : False
* time_index : None
4. Imputer
* categorical_impute_strategy : most_frequent
* numeric_impute_strategy : mean
* categorical_fill_value : None
* numeric_fill_value : None
5. One Hot Encoder
* top_n : 10
* features_to_encode : None
* categories : None
* drop : if_binary
* handle_unknown : ignore
* handle_missing : error
6. Oversampler
* sampling_ratio : 0.25
* k_neighbors_default : 5
* n_jobs : -1
* sampling_ratio_dict : None
* categorical_features : [3, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43]
* k_neighbors : 5
7. Standard Scaler
8. Elastic Net Classifier
* penalty : elasticnet
* C : 1.0
* l1_ratio : 0.15
* n_jobs : -1
* multi_class : auto
* solver : saga
Training
========
Training for binary problems.
Total training time (including CV): 3.0 seconds
Cross Validation
----------------
Log Loss Binary MCC Binary Gini AUC Precision F1 Balanced Accuracy Binary Accuracy Binary # Training # Validation
0 0.431 0.101 0.322 0.661 0.250 0.167 0.537 0.851 133 67
1 0.599 0.326 0.148 0.574 0.429 0.400 0.654 0.866 133 67
2 0.792 0.070 0.056 0.528 0.136 0.207 0.553 0.652 134 66
mean 0.607 0.166 0.175 0.588 0.272 0.258 0.581 0.789 - -
std 0.180 0.140 0.135 0.068 0.147 0.125 0.063 0.120 - -
coef of var 0.297 0.843 0.771 0.115 0.542 0.484 0.109 0.151 - -
Get Pipeline¶
We can get the object of any pipeline via their id
as well:
[14]:
pipeline = automl.get_pipeline(1)
print(pipeline.name)
print(pipeline.parameters)
Elastic Net Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler + Standard Scaler
{'Label Encoder': {'positive_label': None}, 'Drop Columns Transformer': {'columns': ['currency']}, 'DateTime Featurization Component': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False, 'time_index': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Oversampler': {'sampling_ratio': 0.25, 'k_neighbors_default': 5, 'n_jobs': -1, 'sampling_ratio_dict': None, 'categorical_features': [3, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], 'k_neighbors': 5}, 'Elastic Net Classifier': {'penalty': 'elasticnet', 'C': 1.0, 'l1_ratio': 0.15, 'n_jobs': -1, 'multi_class': 'auto', 'solver': 'saga'}}
Get best pipeline¶
If you specifically want to get the best pipeline, there is a convenient accessor for that. The pipeline returned is already fitted on the input X, y data that we passed to AutoMLSearch. To turn off this default behavior, set train_best_pipeline=False
when initializing AutoMLSearch.
[15]:
best_pipeline = automl.best_pipeline
print(best_pipeline.name)
print(best_pipeline.parameters)
best_pipeline.predict(X_train)
XGBoost Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler
{'Label Encoder': {'positive_label': None}, 'Drop Columns Transformer': {'columns': ['currency']}, 'DateTime Featurization Component': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False, 'time_index': None}, 'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Oversampler': {'sampling_ratio': 0.25, 'k_neighbors_default': 5, 'n_jobs': -1, 'sampling_ratio_dict': None, 'categorical_features': [3, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], 'k_neighbors': 5}, 'XGBoost Classifier': {'eta': 0.1, 'max_depth': 6, 'min_child_weight': 1, 'n_estimators': 100, 'n_jobs': -1, 'eval_metric': 'logloss'}}
[15]:
id
93 False
243 False
211 False
44 False
6 False
...
148 False
171 False
94 False
246 False
95 False
Name: fraud, Length: 200, dtype: bool
Training and Scoring Multiple Pipelines using AutoMLSearch¶
AutoMLSearch will automatically fit the best pipeline on the entire training data. It also provides an easy API for training and scoring other pipelines.
If you’d like to train one or more pipelines on the entire training data, you can use the train_pipelines
method
Similarly, if you’d like to score one or more pipelines on a particular dataset, you can use the train_pipelines
method
[16]:
trained_pipelines = automl.train_pipelines([automl.get_pipeline(i) for i in [0, 1, 2]])
trained_pipelines
[16]:
{'Mode Baseline Binary Classification Pipeline': pipeline = BinaryClassificationPipeline(component_graph={'Label Encoder': ['Label Encoder', 'X', 'y'], 'Baseline Classifier': ['Baseline Classifier', 'Label Encoder.x', 'Label Encoder.y']}, parameters={'Label Encoder':{'positive_label': None}, 'Baseline Classifier':{'strategy': 'mode'}}, custom_name='Mode Baseline Binary Classification Pipeline', random_seed=0),
'Elastic Net Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler + Standard Scaler': pipeline = BinaryClassificationPipeline(component_graph={'Label Encoder': ['Label Encoder', 'X', 'y'], 'Drop Columns Transformer': ['Drop Columns Transformer', 'X', 'Label Encoder.y'], 'DateTime Featurization Component': ['DateTime Featurization Component', 'Drop Columns Transformer.x', 'Label Encoder.y'], 'Imputer': ['Imputer', 'DateTime Featurization Component.x', 'Label Encoder.y'], 'One Hot Encoder': ['One Hot Encoder', 'Imputer.x', 'Label Encoder.y'], 'Oversampler': ['Oversampler', 'One Hot Encoder.x', 'Label Encoder.y'], 'Standard Scaler': ['Standard Scaler', 'Oversampler.x', 'Oversampler.y'], 'Elastic Net Classifier': ['Elastic Net Classifier', 'Standard Scaler.x', 'Oversampler.y']}, parameters={'Label Encoder':{'positive_label': None}, 'Drop Columns Transformer':{'columns': ['currency']}, 'DateTime Featurization Component':{'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False, 'time_index': None}, 'Imputer':{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'One Hot Encoder':{'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Oversampler':{'sampling_ratio': 0.25, 'k_neighbors_default': 5, 'n_jobs': -1, 'sampling_ratio_dict': None, 'categorical_features': [3, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], 'k_neighbors': 5}, 'Elastic Net Classifier':{'penalty': 'elasticnet', 'C': 1.0, 'l1_ratio': 0.15, 'n_jobs': -1, 'multi_class': 'auto', 'solver': 'saga'}}, random_seed=0),
'Logistic Regression Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler + Standard Scaler': pipeline = BinaryClassificationPipeline(component_graph={'Label Encoder': ['Label Encoder', 'X', 'y'], 'Drop Columns Transformer': ['Drop Columns Transformer', 'X', 'Label Encoder.y'], 'DateTime Featurization Component': ['DateTime Featurization Component', 'Drop Columns Transformer.x', 'Label Encoder.y'], 'Imputer': ['Imputer', 'DateTime Featurization Component.x', 'Label Encoder.y'], 'One Hot Encoder': ['One Hot Encoder', 'Imputer.x', 'Label Encoder.y'], 'Oversampler': ['Oversampler', 'One Hot Encoder.x', 'Label Encoder.y'], 'Standard Scaler': ['Standard Scaler', 'Oversampler.x', 'Oversampler.y'], 'Logistic Regression Classifier': ['Logistic Regression Classifier', 'Standard Scaler.x', 'Oversampler.y']}, parameters={'Label Encoder':{'positive_label': None}, 'Drop Columns Transformer':{'columns': ['currency']}, 'DateTime Featurization Component':{'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False, 'time_index': None}, 'Imputer':{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'One Hot Encoder':{'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Oversampler':{'sampling_ratio': 0.25, 'k_neighbors_default': 5, 'n_jobs': -1, 'sampling_ratio_dict': None, 'categorical_features': [3, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], 'k_neighbors': 5}, 'Logistic Regression Classifier':{'penalty': 'l2', 'C': 1.0, 'n_jobs': -1, 'multi_class': 'auto', 'solver': 'lbfgs'}}, random_seed=0)}
[17]:
pipeline_holdout_scores = automl.score_pipelines([trained_pipelines[name] for name in trained_pipelines.keys()],
X_holdout,
y_holdout,
['Accuracy Binary', 'F1', 'AUC'])
pipeline_holdout_scores
[17]:
{'Mode Baseline Binary Classification Pipeline': OrderedDict([('Accuracy Binary',
0.88),
('F1', 0.0),
('AUC', 0.5)]),
'Elastic Net Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler + Standard Scaler': OrderedDict([('Accuracy Binary',
0.4),
('F1', 0.21052631578947367),
('AUC', 0.5037878787878788)]),
'Logistic Regression Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler + Standard Scaler': OrderedDict([('Accuracy Binary',
0.4),
('F1', 0.21052631578947367),
('AUC', 0.5113636363636364)])}
Saving AutoMLSearch and pipelines from AutoMLSearch¶
There are two ways to save results from AutoMLSearch.
You can save the AutoMLSearch object itself, calling
.save(<filepath>)
to do so. This will allow you to save the AutoMLSearch state and reload all pipelines from this.If you want to save a pipeline from AutoMLSearch for future use, pipeline classes themselves have a
.save(<filepath>)
method.
[18]:
# saving the entire automl search
automl.save("automl.cloudpickle")
automl2 = evalml.automl.AutoMLSearch.load("automl.cloudpickle")
# saving the best pipeline using .save()
best_pipeline.save("pipeline.cloudpickle")
best_pipeline_copy = evalml.pipelines.PipelineBase.load("pipeline.cloudpickle")
Limiting the AutoML Search Space¶
The AutoML search algorithm first trains each component in the pipeline with their default values. After the first iteration, it then tweaks the parameters of these components using the pre-defined hyperparameter ranges that these components have. To limit the search over certain hyperparameter ranges, you can specify a custom_hyperparameters
argument with your AutoMLSearch
parameters. These parameters will limit the hyperparameter search space.
Hyperparameter ranges can be found through the API reference for each component. Parameter arguments must be specified as dictionaries, but the associated values can be single values or skopt.space
Real, Integer, Categorical values.
If however you’d like to specify certain values for the initial batch of the AutoML search algorithm, you can use the pipeline_parameters
argument. This will set the initial batch’s component parameters to the values passed by this argument.
[19]:
from evalml import AutoMLSearch
from evalml.demos import load_fraud
from skopt.space import Categorical
from evalml.model_family import ModelFamily
import woodwork as ww
X, y = load_fraud(n_rows=1000)
# example of setting parameter to just one value
custom_hyperparameters = {'Imputer': {
'numeric_impute_strategy': 'mean'
}}
# limit the numeric impute strategy to include only `median` and `most_frequent`
# `mean` is the default value for this argument, but it doesn't need to be included in the specified hyperparameter range for this to work
custom_hyperparameters = {'Imputer': {
'numeric_impute_strategy': Categorical(['median', 'most_frequent'])
}}
# set the initial batch numeric impute strategy strategy to 'median'
pipeline_parameters = {'Imputer': {
'numeric_impute_strategy': 'median'
}}
# using this custom hyperparameter means that our Imputer components in these pipelines will only search through
# 'median' and 'most_frequent' strategies for 'numeric_impute_strategy', and the initial batch parameter will be
# set to 'median'
automl_constrained = AutoMLSearch(X_train=X, y_train=y, problem_type='binary',
pipeline_parameters=pipeline_parameters,
custom_hyperparameters=custom_hyperparameters,
verbose=True)
Number of Features
Boolean 1
Categorical 6
Numeric 5
Number of training examples: 1000
Targets
False 85.90%
True 14.10%
Name: fraud, dtype: object
Using default limit of max_batches=1.
Generating pipelines to search over...
8 pipelines ready for search.
Imbalanced Data¶
The AutoML search algorithm now has functionality to handle imbalanced data during classification! AutoMLSearch now provides two additional parameters, sampler_method
and sampler_balanced_ratio
, that allow you to let AutoMLSearch know whether to sample imbalanced data, and how to do so. sampler_method
takes in either Undersampler
, Oversampler
, auto
, or None as the sampler to use, and sampler_balanced_ratio
specifies the minority/majority
ratio that you want to
sample to. Details on the Undersampler and Oversampler components can be found in the documentation.
This can be used for imbalanced datasets, like the fraud dataset, which has a ‘minority:majority’ ratio of < 0.2.
[20]:
automl_auto = AutoMLSearch(X_train=X, y_train=y, problem_type='binary')
automl_auto.allowed_pipelines[-1]
[20]:
pipeline = BinaryClassificationPipeline(component_graph={'Label Encoder': ['Label Encoder', 'X', 'y'], 'DateTime Featurization Component': ['DateTime Featurization Component', 'X', 'Label Encoder.y'], 'Imputer': ['Imputer', 'DateTime Featurization Component.x', 'Label Encoder.y'], 'One Hot Encoder': ['One Hot Encoder', 'Imputer.x', 'Label Encoder.y'], 'Oversampler': ['Oversampler', 'One Hot Encoder.x', 'Label Encoder.y'], 'Extra Trees Classifier': ['Extra Trees Classifier', 'Oversampler.x', 'Oversampler.y']}, parameters={'Label Encoder':{'positive_label': None}, 'DateTime Featurization Component':{'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False, 'time_index': None}, 'Imputer':{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'One Hot Encoder':{'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Oversampler':{'sampling_ratio': 0.25, 'k_neighbors_default': 5, 'n_jobs': -1, 'sampling_ratio_dict': None}, 'Extra Trees Classifier':{'n_estimators': 100, 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_jobs': -1}}, random_seed=0)
The Oversampler is chosen as the default sampling component here, since the sampler_balanced_ratio = 0.25
. If you specified a lower ratio, for instance sampler_balanced_ratio = 0.1
, then there would be no sampling component added here. This is because if a ratio of 0.1 would be considered balanced, then a ratio of 0.2 would also be balanced.
The Oversampler uses SMOTE under the hood, and automatically selects whether to use SMOTE, SMOTEN, or SMOTENC based on the data it receives.
[21]:
automl_auto_ratio = AutoMLSearch(X_train=X, y_train=y, problem_type='binary', sampler_balanced_ratio=0.1)
automl_auto_ratio.allowed_pipelines[-1]
[21]:
pipeline = BinaryClassificationPipeline(component_graph={'Label Encoder': ['Label Encoder', 'X', 'y'], 'DateTime Featurization Component': ['DateTime Featurization Component', 'X', 'Label Encoder.y'], 'Imputer': ['Imputer', 'DateTime Featurization Component.x', 'Label Encoder.y'], 'One Hot Encoder': ['One Hot Encoder', 'Imputer.x', 'Label Encoder.y'], 'Extra Trees Classifier': ['Extra Trees Classifier', 'One Hot Encoder.x', 'Label Encoder.y']}, parameters={'Label Encoder':{'positive_label': None}, 'DateTime Featurization Component':{'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False, 'time_index': None}, 'Imputer':{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'One Hot Encoder':{'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Extra Trees Classifier':{'n_estimators': 100, 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_jobs': -1}}, random_seed=0)
Additionally, you can add more fine-grained sampling ratios by passing in a sampling_ratio_dict
in pipeline parameters. For this dictionary, AutoMLSearch expects the keys to be int values from 0 to n-1
for the classes, and the values would be the sampler_balanced__ratio
associated with each target. This dictionary would override the AutoML argument sampler_balanced_ratio
. Below, you can see the scenario for Oversampler component on this dataset. Note that the logic for
Undersamplers is included in the commented section.
[22]:
# In this case, the majority class is the negative class
# for the oversampler, we don't want to oversample this class, so class 0 (majority) will have a ratio of 1 to itself
# for the minority class 1, we want to oversample it to have a minority/majority ratio of 0.5, which means we want minority to have 1/2 the samples as the minority
sampler_ratio_dict = {0: 1, 1: 0.5}
pipeline_parameters = {"Oversampler": {"sampler_balanced_ratio": sampler_ratio_dict}}
automl_auto_ratio_dict = AutoMLSearch(X_train=X, y_train=y, problem_type='binary', pipeline_parameters=pipeline_parameters)
automl_auto_ratio_dict.allowed_pipelines[-1]
# Undersampler case
# we don't want to undersample this class, so class 1 (minority) will have a ratio of 1 to itself
# for the majority class 0, we want to undersample it to have a minority/majority ratio of 0.5, which means we want majority to have 2x the samples as the minority
# sampler_ratio_dict = {0: 0.5, 1: 1}
# pipeline_parameters = {"Oversampler": {"sampler_balanced_ratio": sampler_ratio_dict}}
# automl_auto_ratio_dict = AutoMLSearch(X_train=X, y_train=y, problem_type='binary', pipeline_parameters=pipeline_parameters)
[22]:
pipeline = BinaryClassificationPipeline(component_graph={'Label Encoder': ['Label Encoder', 'X', 'y'], 'DateTime Featurization Component': ['DateTime Featurization Component', 'X', 'Label Encoder.y'], 'Imputer': ['Imputer', 'DateTime Featurization Component.x', 'Label Encoder.y'], 'One Hot Encoder': ['One Hot Encoder', 'Imputer.x', 'Label Encoder.y'], 'Oversampler': ['Oversampler', 'One Hot Encoder.x', 'Label Encoder.y'], 'Extra Trees Classifier': ['Extra Trees Classifier', 'Oversampler.x', 'Oversampler.y']}, parameters={'Label Encoder':{'positive_label': None}, 'DateTime Featurization Component':{'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False, 'time_index': None}, 'Imputer':{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'One Hot Encoder':{'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Oversampler':{'sampling_ratio': 0.25, 'k_neighbors_default': 5, 'n_jobs': -1, 'sampling_ratio_dict': None, 'sampler_balanced_ratio': {0: 1, 1: 0.5}}, 'Extra Trees Classifier':{'n_estimators': 100, 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_jobs': -1}}, random_seed=0)
Adding ensemble methods to AutoML¶
Stacking¶
Stacking is an ensemble machine learning algorithm that involves training a model to best combine the predictions of several base learning algorithms. First, each base learning algorithms is trained using the given data. Then, the combining algorithm or meta-learner is trained on the predictions made by those base learning algorithms to make a final prediction.
AutoML enables stacking using the ensembling
flag during initalization; this is set to False
by default. The stacking ensemble pipeline runs in its own batch after a whole cycle of training has occurred (each allowed pipeline trains for one batch). Note that this means a large number of iterations may need to run before the stacking ensemble runs. It is also important to note that only the first CV fold is calculated for stacking ensembles because the model internally uses CV
folds.
[23]:
X, y = evalml.demos.load_breast_cancer()
automl_with_ensembling = AutoMLSearch(X_train=X, y_train=y,
problem_type="binary",
allowed_model_families=[ModelFamily.LINEAR_MODEL],
max_batches=4,
ensembling=True,
verbose=True)
automl_with_ensembling.search()
Number of Features
Numeric 30
Number of training examples: 569
Targets
benign 62.74%
malignant 37.26%
Name: target, dtype: object
Generating pipelines to search over...
Ensembling will run every 3 batches.
2 pipelines ready for search.
*****************************
* Beginning pipeline search *
*****************************
Optimizing for Log Loss Binary.
Lower score is better.
Using SequentialEngine to train and score pipelines.
Searching up to 4 batches for a total of 14 pipelines.
Allowed model families: linear_model, linear_model
Evaluating Baseline Pipeline: Mode Baseline Binary Classification Pipeline
Mode Baseline Binary Classification Pipeline:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 12.868
*****************************
* Evaluating Batch Number 1 *
*****************************
Elastic Net Classifier w/ Label Encoder + Imputer + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.077
Logistic Regression Classifier w/ Label Encoder + Imputer + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.077
*****************************
* Evaluating Batch Number 2 *
*****************************
Logistic Regression Classifier w/ Label Encoder + Imputer + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.097
Logistic Regression Classifier w/ Label Encoder + Imputer + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.080
Logistic Regression Classifier w/ Label Encoder + Imputer + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.085
Logistic Regression Classifier w/ Label Encoder + Imputer + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.091
Logistic Regression Classifier w/ Label Encoder + Imputer + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.097
*****************************
* Evaluating Batch Number 3 *
*****************************
Elastic Net Classifier w/ Label Encoder + Imputer + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.075
Elastic Net Classifier w/ Label Encoder + Imputer + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.075
Elastic Net Classifier w/ Label Encoder + Imputer + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.075
Elastic Net Classifier w/ Label Encoder + Imputer + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.076
Elastic Net Classifier w/ Label Encoder + Imputer + Standard Scaler:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.079
*****************************
* Evaluating Batch Number 4 *
*****************************
Stacked Ensemble Classification Pipeline:
Starting cross validation
Finished cross validation - mean Log Loss Binary: 0.103
Search finished after 00:25
Best pipeline: Elastic Net Classifier w/ Label Encoder + Imputer + Standard Scaler
Best pipeline Log Loss Binary: 0.075387
We can view more information about the stacking ensemble pipeline (which was the best performing pipeline) by calling .describe()
.
[24]:
automl_with_ensembling.best_pipeline.describe()
***********************************************************************
* Elastic Net Classifier w/ Label Encoder + Imputer + Standard Scaler *
***********************************************************************
Problem Type: binary
Model Family: Linear
Number of features: 30
Pipeline Steps
==============
1. Label Encoder
* positive_label : None
2. Imputer
* categorical_impute_strategy : most_frequent
* numeric_impute_strategy : median
* categorical_fill_value : None
* numeric_fill_value : None
3. Standard Scaler
4. Elastic Net Classifier
* penalty : elasticnet
* C : 8.123565600467177
* l1_ratio : 0.47997717237505744
* n_jobs : -1
* multi_class : auto
* solver : saga
Access raw results¶
The AutoMLSearch
class records detailed results information under the results
field, including information about the cross-validation scoring and parameters.
[25]:
automl.results
[25]:
{'pipeline_results': {0: {'id': 0,
'pipeline_name': 'Mode Baseline Binary Classification Pipeline',
'pipeline_class': evalml.pipelines.binary_classification_pipeline.BinaryClassificationPipeline,
'pipeline_summary': 'Baseline Classifier w/ Label Encoder',
'parameters': {'Label Encoder': {'positive_label': None},
'Baseline Classifier': {'strategy': 'mode'}},
'mean_cv_score': 3.970423187263591,
'standard_deviation_cv_score': 0.26606000431837074,
'high_variance_cv': False,
'training_time': 0.9727675914764404,
'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
4.124033002377396),
('MCC Binary', 0.0),
('Gini', 0.0),
('AUC', 0.5),
('Precision', 0.0),
('F1', 0.0),
('Balanced Accuracy Binary', 0.5),
('Accuracy Binary', 0.8805970149253731),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 4.124033002377396,
'binary_classification_threshold': 9.16384630183206e-53},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
4.124033002377395),
('MCC Binary', 0.0),
('Gini', 0.0),
('AUC', 0.5),
('Precision', 0.0),
('F1', 0.0),
('Balanced Accuracy Binary', 0.5),
('Accuracy Binary', 0.8805970149253731),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 4.124033002377395,
'binary_classification_threshold': 9.16384630183206e-53},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
3.6632035570359824),
('MCC Binary', 0.0),
('Gini', 0.0),
('AUC', 0.5),
('Precision', 0.0),
('F1', 0.0),
('Balanced Accuracy Binary', 0.5),
('Accuracy Binary', 0.8939393939393939),
('# Training', 134),
('# Validation', 66)]),
'mean_cv_score': 3.6632035570359824,
'binary_classification_threshold': 9.16384630183206e-53}],
'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 0,
'MCC Binary': 0,
'Gini': 0,
'AUC': 0,
'Precision': 0,
'F1': 0,
'Balanced Accuracy Binary': 0,
'Accuracy Binary': 0},
'percent_better_than_baseline': 0,
'validation_score': 4.124033002377396},
1: {'id': 1,
'pipeline_name': 'Elastic Net Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler + Standard Scaler',
'pipeline_class': evalml.pipelines.binary_classification_pipeline.BinaryClassificationPipeline,
'pipeline_summary': 'Elastic Net Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler + Standard Scaler',
'parameters': {'Label Encoder': {'positive_label': None},
'Drop Columns Transformer': {'columns': ['currency']},
'DateTime Featurization Component': {'features_to_extract': ['year',
'month',
'day_of_week',
'hour'],
'encode_as_categories': False,
'time_index': None},
'Imputer': {'categorical_impute_strategy': 'most_frequent',
'numeric_impute_strategy': 'mean',
'categorical_fill_value': None,
'numeric_fill_value': None},
'One Hot Encoder': {'top_n': 10,
'features_to_encode': None,
'categories': None,
'drop': 'if_binary',
'handle_unknown': 'ignore',
'handle_missing': 'error'},
'Oversampler': {'sampling_ratio': 0.25,
'k_neighbors_default': 5,
'n_jobs': -1,
'sampling_ratio_dict': None,
'categorical_features': [3,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43],
'k_neighbors': 5},
'Elastic Net Classifier': {'penalty': 'elasticnet',
'C': 1.0,
'l1_ratio': 0.15,
'n_jobs': -1,
'multi_class': 'auto',
'solver': 'saga'}},
'mean_cv_score': 0.6073641356206173,
'standard_deviation_cv_score': 0.18042974370220446,
'high_variance_cv': False,
'training_time': 2.983219861984253,
'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.4314721295243963),
('MCC Binary', 0.10148381259321973),
('Gini', 0.3220338983050848),
('AUC', 0.6610169491525424),
('Precision', 0.25),
('F1', 0.16666666666666666),
('Balanced Accuracy Binary', 0.5370762711864407),
('Accuracy Binary', 0.8507462686567164),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.4314721295243963,
'binary_classification_threshold': 0.5340008530295668},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.5986075324162158),
('MCC Binary', 0.3256659533260589),
('Gini', 0.14830508474576254),
('AUC', 0.5741525423728813),
('Precision', 0.42857142857142855),
('F1', 0.39999999999999997),
('Balanced Accuracy Binary', 0.6536016949152542),
('Accuracy Binary', 0.8656716417910447),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.5986075324162158,
'binary_classification_threshold': 0.4490165676883502},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.79201274492124),
('MCC Binary', 0.06958890006392211),
('Gini', 0.0556900726392251),
('AUC', 0.5278450363196125),
('Precision', 0.13636363636363635),
('F1', 0.20689655172413793),
('Balanced Accuracy Binary', 0.5532687651331719),
('Accuracy Binary', 0.6515151515151515),
('# Training', 134),
('# Validation', 66)]),
'mean_cv_score': 0.79201274492124,
'binary_classification_threshold': 0.3327217472321081}],
'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 84.70278590028053,
'MCC Binary': inf,
'Gini': inf,
'AUC': 8.767150928167878,
'Precision': 27.164502164502164,
'F1': 25.785440613026818,
'Balanced Accuracy Binary': 8.131557707828907,
'Accuracy Binary': -9.573345394240917},
'percent_better_than_baseline': 84.70278590028053,
'validation_score': 0.4314721295243963},
2: {'id': 2,
'pipeline_name': 'Logistic Regression Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler + Standard Scaler',
'pipeline_class': evalml.pipelines.binary_classification_pipeline.BinaryClassificationPipeline,
'pipeline_summary': 'Logistic Regression Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler + Standard Scaler',
'parameters': {'Label Encoder': {'positive_label': None},
'Drop Columns Transformer': {'columns': ['currency']},
'DateTime Featurization Component': {'features_to_extract': ['year',
'month',
'day_of_week',
'hour'],
'encode_as_categories': False,
'time_index': None},
'Imputer': {'categorical_impute_strategy': 'most_frequent',
'numeric_impute_strategy': 'mean',
'categorical_fill_value': None,
'numeric_fill_value': None},
'One Hot Encoder': {'top_n': 10,
'features_to_encode': None,
'categories': None,
'drop': 'if_binary',
'handle_unknown': 'ignore',
'handle_missing': 'error'},
'Oversampler': {'sampling_ratio': 0.25,
'k_neighbors_default': 5,
'n_jobs': -1,
'sampling_ratio_dict': None,
'categorical_features': [3,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43],
'k_neighbors': 5},
'Logistic Regression Classifier': {'penalty': 'l2',
'C': 1.0,
'n_jobs': -1,
'multi_class': 'auto',
'solver': 'lbfgs'}},
'mean_cv_score': 0.6203031849130022,
'standard_deviation_cv_score': 0.1898901497265282,
'high_variance_cv': False,
'training_time': 4.226473569869995,
'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.4402075637709514),
('MCC Binary', 0.14283901342792224),
('Gini', 0.2838983050847457),
('AUC', 0.6419491525423728),
('Precision', 0.3333333333333333),
('F1', 0.18181818181818182),
('Balanced Accuracy Binary', 0.5455508474576272),
('Accuracy Binary', 0.8656716417910447),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.4402075637709514,
'binary_classification_threshold': 0.5232642260235977},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.6020345867674461),
('MCC Binary', 0.2457400868151266),
('Gini', 0.14406779661016955),
('AUC', 0.5720338983050848),
('Precision', 0.4),
('F1', 0.3076923076923077),
('Balanced Accuracy Binary', 0.5995762711864407),
('Accuracy Binary', 0.8656716417910447),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.6020345867674461,
'binary_classification_threshold': 0.4745908587234502},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.8186674042006088),
('MCC Binary', 0.06958890006392211),
('Gini', 0.03147699757869238),
('AUC', 0.5157384987893462),
('Precision', 0.13636363636363635),
('F1', 0.20689655172413793),
('Balanced Accuracy Binary', 0.5532687651331719),
('Accuracy Binary', 0.6515151515151515),
('# Training', 134),
('# Validation', 66)]),
'mean_cv_score': 0.8186674042006088,
'binary_classification_threshold': 0.3256059490546064}],
'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 84.3769000014199,
'MCC Binary': inf,
'Gini': inf,
'AUC': 7.657384987893456,
'Precision': 28.989898989898993,
'F1': 23.21356804115425,
'Balanced Accuracy Binary': 6.613196125907994,
'Accuracy Binary': -9.075832956429963},
'percent_better_than_baseline': 84.3769000014199,
'validation_score': 0.4402075637709514},
3: {'id': 3,
'pipeline_name': 'XGBoost Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler',
'pipeline_class': evalml.pipelines.binary_classification_pipeline.BinaryClassificationPipeline,
'pipeline_summary': 'XGBoost Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler',
'parameters': {'Label Encoder': {'positive_label': None},
'Drop Columns Transformer': {'columns': ['currency']},
'DateTime Featurization Component': {'features_to_extract': ['year',
'month',
'day_of_week',
'hour'],
'encode_as_categories': False,
'time_index': None},
'Imputer': {'categorical_impute_strategy': 'most_frequent',
'numeric_impute_strategy': 'mean',
'categorical_fill_value': None,
'numeric_fill_value': None},
'One Hot Encoder': {'top_n': 10,
'features_to_encode': None,
'categories': None,
'drop': 'if_binary',
'handle_unknown': 'ignore',
'handle_missing': 'error'},
'Oversampler': {'sampling_ratio': 0.25,
'k_neighbors_default': 5,
'n_jobs': -1,
'sampling_ratio_dict': None,
'categorical_features': [3,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43],
'k_neighbors': 5},
'XGBoost Classifier': {'eta': 0.1,
'max_depth': 6,
'min_child_weight': 1,
'n_estimators': 100,
'n_jobs': -1,
'eval_metric': 'logloss'}},
'mean_cv_score': 0.26926735779748645,
'standard_deviation_cv_score': 0.1633547848901683,
'high_variance_cv': False,
'training_time': 2.974979877471924,
'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.19552199778162352),
('MCC Binary', 0.0),
('Gini', 0.7033898305084747),
('AUC', 0.8516949152542374),
('Precision', 0.0),
('F1', 0.0),
('Balanced Accuracy Binary', 0.5),
('Accuracy Binary', 0.8805970149253731),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.19552199778162352,
'binary_classification_threshold': 0.9167545941225166},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.45649296312983995),
('MCC Binary', 0.36540212737375455),
('Gini', 0.4110169491525424),
('AUC', 0.7055084745762712),
('Precision', 0.6666666666666666),
('F1', 0.36363636363636365),
('Balanced Accuracy Binary', 0.6165254237288136),
('Accuracy Binary', 0.8955223880597015),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.45649296312983995,
'binary_classification_threshold': 0.9022083865093031},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.15578711248099583),
('MCC Binary', 0.3600976668493281),
('Gini', 0.8062953995157385),
('AUC', 0.9031476997578692),
('Precision', 1.0),
('F1', 0.25),
('Balanced Accuracy Binary', 0.5714285714285714),
('Accuracy Binary', 0.9090909090909091),
('# Training', 134),
('# Validation', 66)]),
'mean_cv_score': 0.15578711248099583,
'binary_classification_threshold': 0.9453852686467735}],
'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 93.21816982478724,
'MCC Binary': inf,
'Gini': inf,
'AUC': 32.01170298627925,
'Precision': 55.55555555555555,
'F1': 20.454545454545457,
'Balanced Accuracy Binary': 6.265133171912829,
'Accuracy Binary': 1.0025629428614624},
'percent_better_than_baseline': 93.21816982478724,
'validation_score': 0.19552199778162352},
4: {'id': 4,
'pipeline_name': 'LightGBM Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler',
'pipeline_class': evalml.pipelines.binary_classification_pipeline.BinaryClassificationPipeline,
'pipeline_summary': 'LightGBM Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler',
'parameters': {'Label Encoder': {'positive_label': None},
'Drop Columns Transformer': {'columns': ['currency']},
'DateTime Featurization Component': {'features_to_extract': ['year',
'month',
'day_of_week',
'hour'],
'encode_as_categories': False,
'time_index': None},
'Imputer': {'categorical_impute_strategy': 'most_frequent',
'numeric_impute_strategy': 'mean',
'categorical_fill_value': None,
'numeric_fill_value': None},
'One Hot Encoder': {'top_n': 10,
'features_to_encode': None,
'categories': None,
'drop': 'if_binary',
'handle_unknown': 'ignore',
'handle_missing': 'error'},
'Oversampler': {'sampling_ratio': 0.25,
'k_neighbors_default': 5,
'n_jobs': -1,
'sampling_ratio_dict': None,
'categorical_features': [3,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43],
'k_neighbors': 5},
'LightGBM Classifier': {'boosting_type': 'gbdt',
'learning_rate': 0.1,
'n_estimators': 100,
'max_depth': 0,
'num_leaves': 31,
'min_child_samples': 20,
'n_jobs': -1,
'bagging_freq': 0,
'bagging_fraction': 0.9}},
'mean_cv_score': 0.31641155204854204,
'standard_deviation_cv_score': 0.15607221058209464,
'high_variance_cv': False,
'training_time': 2.419414520263672,
'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.2293858184062148),
('MCC Binary', 0.6842908506285673),
('Gini', 0.7415254237288136),
('AUC', 0.8707627118644068),
('Precision', 1.0),
('F1', 0.6666666666666666),
('Balanced Accuracy Binary', 0.75),
('Accuracy Binary', 0.9402985074626866),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.2293858184062148,
'binary_classification_threshold': 0.5325189767997198},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.4965934577153518),
('MCC Binary', 0.36540212737375455),
('Gini', 0.47457627118644075),
('AUC', 0.7372881355932204),
('Precision', 0.6666666666666666),
('F1', 0.36363636363636365),
('Balanced Accuracy Binary', 0.6165254237288136),
('Accuracy Binary', 0.8955223880597015),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.4965934577153518,
'binary_classification_threshold': 0.7752331605063649},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.22325538002405956),
('MCC Binary', 0.5132181379714255),
('Gini', 0.8062953995157385),
('AUC', 0.9031476997578692),
('Precision', 1.0),
('F1', 0.4444444444444445),
('Balanced Accuracy Binary', 0.6428571428571428),
('Accuracy Binary', 0.9242424242424242),
('# Training', 134),
('# Validation', 66)]),
'mean_cv_score': 0.22325538002405956,
'binary_classification_threshold': 0.9436049942755129}],
'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 92.03078520537726,
'MCC Binary': inf,
'Gini': inf,
'AUC': 33.70661824051654,
'Precision': 88.88888888888889,
'F1': 49.158249158249156,
'Balanced Accuracy Binary': 16.979418886198548,
'Accuracy Binary': 3.497663199155754},
'percent_better_than_baseline': 92.03078520537726,
'validation_score': 0.2293858184062148},
5: {'id': 5,
'pipeline_name': 'CatBoost Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + Oversampler',
'pipeline_class': evalml.pipelines.binary_classification_pipeline.BinaryClassificationPipeline,
'pipeline_summary': 'CatBoost Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + Oversampler',
'parameters': {'Label Encoder': {'positive_label': None},
'Drop Columns Transformer': {'columns': ['currency']},
'DateTime Featurization Component': {'features_to_extract': ['year',
'month',
'day_of_week',
'hour'],
'encode_as_categories': False,
'time_index': None},
'Imputer': {'categorical_impute_strategy': 'most_frequent',
'numeric_impute_strategy': 'mean',
'categorical_fill_value': None,
'numeric_fill_value': None},
'Oversampler': {'sampling_ratio': 0.25,
'k_neighbors_default': 5,
'n_jobs': -1,
'sampling_ratio_dict': None,
'categorical_features': [4, 7, 8, 3],
'k_neighbors': 5},
'CatBoost Classifier': {'n_estimators': 10,
'eta': 0.03,
'max_depth': 6,
'bootstrap_type': None,
'silent': True,
'allow_writing_files': False,
'n_jobs': -1}},
'mean_cv_score': 0.6236249685739278,
'standard_deviation_cv_score': 0.0008026761021476908,
'high_variance_cv': False,
'training_time': 1.39483642578125,
'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.6227420171620137),
('MCC Binary', 0.5879652413195868),
('Gini', 0.7415254237288136),
('AUC', 0.8707627118644068),
('Precision', 1.0),
('F1', 0.5454545454545454),
('Balanced Accuracy Binary', 0.6875),
('Accuracy Binary', 0.9253731343283582),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.6227420171620137,
'binary_classification_threshold': 0.4898933112926254},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.6243105477991306),
('MCC Binary', 0.47636443708895493),
('Gini', 0.3347457627118644),
('AUC', 0.6673728813559322),
('Precision', 1.0),
('F1', 0.4),
('Balanced Accuracy Binary', 0.625),
('Accuracy Binary', 0.9104477611940298),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.6243105477991306,
'binary_classification_threshold': 0.5135974515444475},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.6238223407606391),
('MCC Binary', 0.7374135894078153),
('Gini', 0.9515738498789348),
('AUC', 0.9757869249394674),
('Precision', 1.0),
('F1', 0.7272727272727273),
('Balanced Accuracy Binary', 0.7857142857142857),
('Accuracy Binary', 0.9545454545454546),
('# Training', 134),
('# Validation', 66)]),
'mean_cv_score': 0.6238223407606391,
'binary_classification_threshold': 0.49602309674485134}],
'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 84.29323678709098,
'MCC Binary': inf,
'Gini': inf,
'AUC': 33.79741727199356,
'Precision': 100.0,
'F1': 55.757575757575765,
'Balanced Accuracy Binary': 19.940476190476186,
'Accuracy Binary': 4.507764209256759},
'percent_better_than_baseline': 84.29323678709098,
'validation_score': 0.6227420171620137},
6: {'id': 6,
'pipeline_name': 'Random Forest Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler',
'pipeline_class': evalml.pipelines.binary_classification_pipeline.BinaryClassificationPipeline,
'pipeline_summary': 'Random Forest Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler',
'parameters': {'Label Encoder': {'positive_label': None},
'Drop Columns Transformer': {'columns': ['currency']},
'DateTime Featurization Component': {'features_to_extract': ['year',
'month',
'day_of_week',
'hour'],
'encode_as_categories': False,
'time_index': None},
'Imputer': {'categorical_impute_strategy': 'most_frequent',
'numeric_impute_strategy': 'mean',
'categorical_fill_value': None,
'numeric_fill_value': None},
'One Hot Encoder': {'top_n': 10,
'features_to_encode': None,
'categories': None,
'drop': 'if_binary',
'handle_unknown': 'ignore',
'handle_missing': 'error'},
'Oversampler': {'sampling_ratio': 0.25,
'k_neighbors_default': 5,
'n_jobs': -1,
'sampling_ratio_dict': None,
'categorical_features': [3,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43],
'k_neighbors': 5},
'Random Forest Classifier': {'n_estimators': 100,
'max_depth': 6,
'n_jobs': -1}},
'mean_cv_score': 0.29226383693917385,
'standard_deviation_cv_score': 0.032252391685893105,
'high_variance_cv': False,
'training_time': 2.5629451274871826,
'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.2859561431522344),
('MCC Binary', 0.5879652413195868),
('Gini', 0.5423728813559321),
('AUC', 0.771186440677966),
('Precision', 1.0),
('F1', 0.5454545454545454),
('Balanced Accuracy Binary', 0.6875),
('Accuracy Binary', 0.9253731343283582),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.2859561431522344,
'binary_classification_threshold': 0.5140851372128995},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.32720410418439),
('MCC Binary', 0.36540212737375455),
('Gini', 0.4279661016949152),
('AUC', 0.7139830508474576),
('Precision', 0.6666666666666666),
('F1', 0.36363636363636365),
('Balanced Accuracy Binary', 0.6165254237288136),
('Accuracy Binary', 0.8955223880597015),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.32720410418439,
'binary_classification_threshold': 0.4792774241693588},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.2636312634808971),
('MCC Binary', 0.5132181379714255),
('Gini', 0.8159806295399514),
('AUC', 0.9079903147699757),
('Precision', 1.0),
('F1', 0.4444444444444445),
('Balanced Accuracy Binary', 0.6428571428571428),
('Accuracy Binary', 0.9242424242424242),
('# Training', 134),
('# Validation', 66)]),
'mean_cv_score': 0.2636312634808971,
'binary_classification_threshold': 0.5786532369228834}],
'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 92.63897516323439,
'MCC Binary': inf,
'Gini': inf,
'AUC': 29.771993543179974,
'Precision': 88.88888888888889,
'F1': 45.11784511784512,
'Balanced Accuracy Binary': 14.896085552865213,
'Accuracy Binary': 3.0001507613447997},
'percent_better_than_baseline': 92.63897516323439,
'validation_score': 0.2859561431522344},
7: {'id': 7,
'pipeline_name': 'Decision Tree Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler',
'pipeline_class': evalml.pipelines.binary_classification_pipeline.BinaryClassificationPipeline,
'pipeline_summary': 'Decision Tree Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler',
'parameters': {'Label Encoder': {'positive_label': None},
'Drop Columns Transformer': {'columns': ['currency']},
'DateTime Featurization Component': {'features_to_extract': ['year',
'month',
'day_of_week',
'hour'],
'encode_as_categories': False,
'time_index': None},
'Imputer': {'categorical_impute_strategy': 'most_frequent',
'numeric_impute_strategy': 'mean',
'categorical_fill_value': None,
'numeric_fill_value': None},
'One Hot Encoder': {'top_n': 10,
'features_to_encode': None,
'categories': None,
'drop': 'if_binary',
'handle_unknown': 'ignore',
'handle_missing': 'error'},
'Oversampler': {'sampling_ratio': 0.25,
'k_neighbors_default': 5,
'n_jobs': -1,
'sampling_ratio_dict': None,
'categorical_features': [3,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43],
'k_neighbors': 5},
'Decision Tree Classifier': {'criterion': 'gini',
'max_features': 'auto',
'max_depth': 6,
'min_samples_split': 2,
'min_weight_fraction_leaf': 0.0}},
'mean_cv_score': 3.5788918030647223,
'standard_deviation_cv_score': 0.752281287139661,
'high_variance_cv': True,
'training_time': 2.3931379318237305,
'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
2.779761586468773),
('MCC Binary', 0.0),
('Gini', 0.24788135593220328),
('AUC', 0.6239406779661016),
('Precision', 0.11940298507462686),
('F1', 0.21333333333333335),
('Balanced Accuracy Binary', 0.5),
('Accuracy Binary', 0.11940298507462686),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 2.779761586468773,
'binary_classification_threshold': -9.163846384083647e-53},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
3.6835496247911563),
('MCC Binary', 0.3256659533260589),
('Gini', 0.43432203389830515),
('AUC', 0.7171610169491526),
('Precision', 0.42857142857142855),
('F1', 0.39999999999999997),
('Balanced Accuracy Binary', 0.6536016949152542),
('Accuracy Binary', 0.8656716417910447),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 3.6835496247911563,
'binary_classification_threshold': 0.9999999885588493},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
4.273364197934237),
('MCC Binary', 0.12053151055354616),
('Gini', 0.19370460048426152),
('AUC', 0.5968523002421308),
('Precision', 0.16666666666666666),
('F1', 0.24),
('Balanced Accuracy Binary', 0.5871670702179177),
('Accuracy Binary', 0.7121212121212122),
('# Training', 134),
('# Validation', 66)]),
'mean_cv_score': 4.273364197934237,
'binary_classification_threshold': 9.16384630183206e-53}],
'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 9.86120032380507,
'MCC Binary': inf,
'Gini': inf,
'AUC': 14.598466505246178,
'Precision': 23.8213693437574,
'F1': 28.444444444444443,
'Balanced Accuracy Binary': 8.025625504439072,
'Accuracy Binary': -31.931252826775204},
'percent_better_than_baseline': 9.86120032380507,
'validation_score': 2.779761586468773},
8: {'id': 8,
'pipeline_name': 'Extra Trees Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler',
'pipeline_class': evalml.pipelines.binary_classification_pipeline.BinaryClassificationPipeline,
'pipeline_summary': 'Extra Trees Classifier w/ Label Encoder + Drop Columns Transformer + DateTime Featurization Component + Imputer + One Hot Encoder + Oversampler',
'parameters': {'Label Encoder': {'positive_label': None},
'Drop Columns Transformer': {'columns': ['currency']},
'DateTime Featurization Component': {'features_to_extract': ['year',
'month',
'day_of_week',
'hour'],
'encode_as_categories': False,
'time_index': None},
'Imputer': {'categorical_impute_strategy': 'most_frequent',
'numeric_impute_strategy': 'mean',
'categorical_fill_value': None,
'numeric_fill_value': None},
'One Hot Encoder': {'top_n': 10,
'features_to_encode': None,
'categories': None,
'drop': 'if_binary',
'handle_unknown': 'ignore',
'handle_missing': 'error'},
'Oversampler': {'sampling_ratio': 0.25,
'k_neighbors_default': 5,
'n_jobs': -1,
'sampling_ratio_dict': None,
'categorical_features': [3,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43],
'k_neighbors': 5},
'Extra Trees Classifier': {'n_estimators': 100,
'max_features': 'auto',
'max_depth': 6,
'min_samples_split': 2,
'min_weight_fraction_leaf': 0.0,
'n_jobs': -1}},
'mean_cv_score': 0.3539086319913039,
'standard_deviation_cv_score': 0.01989728866370217,
'high_variance_cv': False,
'training_time': 2.5589206218719482,
'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.33916127352005254),
('MCC Binary', 0.10893529869649102),
('Gini', 0.44915254237288127),
('AUC', 0.7245762711864406),
('Precision', 0.14),
('F1', 0.24137931034482757),
('Balanced Accuracy Binary', 0.573093220338983),
('Accuracy Binary', 0.34328358208955223),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.33916127352005254,
'binary_classification_threshold': 0.10564566937992552},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.3460248979547855),
('MCC Binary', 0.3256659533260589),
('Gini', 0.30508474576271194),
('AUC', 0.652542372881356),
('Precision', 0.42857142857142855),
('F1', 0.39999999999999997),
('Balanced Accuracy Binary', 0.6536016949152542),
('Accuracy Binary', 0.8656716417910447),
('# Training', 133),
('# Validation', 67)]),
'mean_cv_score': 0.3460248979547855,
'binary_classification_threshold': 0.2616091320867158},
{'all_objective_scores': OrderedDict([('Log Loss Binary',
0.3765397244990737),
('MCC Binary', 0.11873608643007195),
('Gini', 0.18644067796610164),
('AUC', 0.5932203389830508),
('Precision', 0.25),
('F1', 0.18181818181818182),
('Balanced Accuracy Binary', 0.5460048426150121),
('Accuracy Binary', 0.8636363636363636),
('# Training', 134),
('# Validation', 66)]),
'mean_cv_score': 0.3765397244990737,
'binary_classification_threshold': 0.38599058059200875}],
'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 91.08637504620214,
'MCC Binary': inf,
'Gini': inf,
'AUC': 15.677966101694917,
'Precision': 27.285714285714285,
'F1': 27.439916405433646,
'Balanced Accuracy Binary': 9.089991928974984,
'Accuracy Binary': -19.418061209105986},
'percent_better_than_baseline': 91.08637504620214,
'validation_score': 0.33916127352005254}},
'search_order': [0, 1, 2, 3, 4, 5, 6, 7, 8]}
Parallel AutoML¶
By default, all pipelines in an AutoML batch are evaluated in series. Pipelines can be evaluated in parallel to improve performance during AutoML search. This is accomplished by a futures style submission and evaluation of pipelines in a batch. As of this writing, the pipelines use a threaded model for concurrent evaluation. This is similar to the currently implemented n_jobs
parameter in the estimators, which uses increased numbers of threads to train and evaluate estimators.
Quick Start¶
To quickly use some parallelism to enhance the pipeline searching, a string can be passed through to AutoMLSearch during initialization to setup the parallel engine and client within the AutoMLSearch object. The current options are “cf_threaded”, “cf_process”, “dask_threaded” and “dask_process” and indicate the futures backend to use and whether to use threaded- or process-level parallelism.
[26]:
automl_cf_threaded = AutoMLSearch(X_train=X, y_train=y,
problem_type="binary",
allowed_model_families=[ModelFamily.LINEAR_MODEL],
engine="cf_threaded")
automl_cf_threaded.search(show_iteration_plot = False)
automl_cf_threaded.close_engine()
Parallelism with Concurrent Futures¶
The EngineBase
class is robust and extensible enough to support futures-like implementations from a variety of libraries. The CFEngine
extends the EngineBase
to use the native Python concurrent.futures library. The CFEngine
supports both thread- and process-level parallelism. The type of parallelism can be chosen using either the ThreadPoolExecutor
or the ProcessPoolExecutor
. If either executor is passed a
max_workers
parameter, it will set the number of processes and threads spawned. If not, the default number of processes will be equal to the number of processors available and the number of threads set to five times the number of processors available.
Here, the CFEngine is invoked with default parameters, which is threaded parallelism using all available threads.
[27]:
from concurrent.futures import ThreadPoolExecutor
from evalml.automl.engine.cf_engine import CFEngine, CFClient
cf_engine = CFEngine(CFClient(ThreadPoolExecutor(max_workers=4)))
automl_cf_threaded = AutoMLSearch(X_train=X, y_train=y,
problem_type="binary",
allowed_model_families=[ModelFamily.LINEAR_MODEL],
engine=cf_engine)
automl_cf_threaded.search(show_iteration_plot = False)
automl_cf_threaded.close_engine()
Note: the cell demonstrating process-level parallelism is a markdown due to incompatibility with our ReadTheDocs build. It can be run successfully locally.
from concurrent.futures import ProcessPoolExecutor
# Repeat the process but using process-level parallelism\
cf_engine = CFEngine(CFClient(ProcessPoolExecutor(max_workers=2)))
automl_cf_process = AutoMLSearch(X_train=X, y_train=y,
problem_type="binary",
engine="cf_process")
automl_cf_process.search(show_iteration_plot = False)
automl_cf_process.close_engine()
Parallelism with Dask¶
Thread or process level parallelism can be explicitly invoked for the DaskEngine
(as well as the CFEngine
). The processes
can be set to True
and the number of processes set using n_workers
. If processes
is set to False
, then the resulting parallelism will be threaded and n_workers
will represent the threads used. Examples of both follow.
[28]:
from dask.distributed import LocalCluster
from evalml.automl.engine import DaskEngine
dask_engine_p2 = DaskEngine(cluster=LocalCluster(processes=True, n_workers = 2))
automl_dask_p2 = AutoMLSearch(X_train=X, y_train=y,
problem_type="binary",
allowed_model_families=[ModelFamily.LINEAR_MODEL],
engine=dask_engine_p2)
automl_dask_p2.search(show_iteration_plot = False)
# Explicitly shutdown the automl object's LocalCluster
automl_dask_p2.close_engine()
Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
Loky-backed parallel loops cannot be called in a multiprocessing, setting n_jobs=1
[29]:
dask_engine_t4 = DaskEngine(cluster=LocalCluster(processes=False, n_workers = 4))
automl_dask_t4 = AutoMLSearch(X_train=X, y_train=y,
problem_type="binary",
allowed_model_families=[ModelFamily.LINEAR_MODEL],
engine=dask_engine_t4)
automl_dask_t4.search(show_iteration_plot = False)
automl_dask_t4.close_engine()
As we can see, a significant performance gain can result from simply using something other than the default SequentialEngine
, ranging from a 100% speed up with multiple processes to 500% speedup with multiple threads!
[30]:
print("Sequential search duration: %s" % str(automl.search_duration))
print("Concurrent futures (threaded) search duration: %s" % str(automl_cf_threaded.search_duration))
print("Dask (two processes) search duration: %s" % str(automl_dask_p2.search_duration))
print("Dask (four threads)search duration: %s" % str(automl_dask_t4.search_duration))
Sequential search duration: 23.4571533203125
Concurrent futures (threaded) search duration: 5.07236385345459
Dask (two processes) search duration: 11.090572834014893
Dask (four threads)search duration: 5.851441860198975