Machine learning (ML) is the process of constructing a mathematical model of a system based on a sample dataset collected from that system.
One of the main goals of training an ML model is to teach the model to separate the signal present in the data from the noise inherent in system and in the data collection process. If this is done effectively, the model can then be used to make accurate predictions about the system when presented with new, similar data. Additionally, introspecting on an ML model can reveal key information about the system being modeled, such as which inputs and transformations of the inputs are most useful to the ML model for learning the signal in the data, and are therefore the most predictive.
There are a variety of ML problem types. Supervised learning describes the case where the collected data contains an output value to be modeled and a set of inputs with which to train the model. EvalML focuses on training supervised learning models.
EvalML supports three common supervised ML problem types. The first is regression, where the target value to model is a continuous numeric value. Next are binary and multiclass classification, where the target value to model consists of two or more discrete values or categories. The choice of which supervised ML problem type is most appropriate depends on domain expertise and on how the model will be evaluated and used.
AutoML is the process of automating the construction, training and evaluation of ML models. Given a data and some configuration, AutoML searches for the most effective and accurate ML model or models to fit the dataset. During the search, AutoML will explore different combinations of model type, model parameters and model architecture.
An effective AutoML solution offers several advantages over constructing and tuning ML models by hand. AutoML can assist with many of the difficult aspects of ML, such as avoiding overfitting and underfitting, imbalanced data, detecting data leakage and other potential issues with the problem setup, and automatically applying best-practice data cleaning, feature engineering, feature selection and various modeling techniques. AutoML can also leverage search algorithms to optimally sweep the hyperparameter search space, resulting in model performance which would be difficult to achieve by manual training.
EvalML supports all of the above and more.
In its simplest usage, the AutoML search interface requires only the input data, the target data and a problem_type specifying what kind of supervised ML problem to model.
problem_type
** Graphing methods, like AutoMLSearch, on Jupyter Notebook and Jupyter Lab require ipywidgets to be installed.
** If graphing on Jupyter Lab, jupyterlab-plotly required. To download this, make sure you have npm installed.
Note: To provide data to EvalML, it is recommended that you create a DataTable object using the Woodwork project.
DataTable
EvalML also accepts pandas input, and will run type inference on top of the input pandas data. If you’d like to change the types inferred by EvalML, you can use the infer_feature_types utility method as follows. The infer_feature_types utility method takes pandas or numpy input and converts it to a Woodwork data structure. It takes in a feature_types parameter which can be used to specify what types specific columns should be. In the example below, we specify that the provider, which would have otherwise been inferred as a column with natural language, is a categorical column.
pandas
infer_feature_types
feature_types
[1]:
import evalml from evalml.utils import infer_feature_types X, y = evalml.demos.load_fraud(n_rows=1000, return_pandas=True) X = infer_feature_types(X, feature_types={'provider': 'categorical'})
Number of Features Boolean 1 Categorical 6 Numeric 5 Number of training examples: 1000 Targets False 85.90% True 14.10% Name: fraud, dtype: object
[2]:
automl = evalml.automl.AutoMLSearch(X_train=X, y_train=y, problem_type='binary') automl.search()
Using default limit of max_batches=1. Generating pipelines to search over... ***************************** * Beginning pipeline search * ***************************** Optimizing for Log Loss Binary. Lower score is better. Searching up to 1 batches for a total of 9 pipelines. Allowed model families: xgboost, lightgbm, linear_model, decision_tree, extra_trees, random_forest, catboost
Batch 1: (1/9) Mode Baseline Binary Classification P... Elapsed:00:00 Starting cross validation Finished cross validation - mean Log Loss Binary: 4.870 Batch 1: (2/9) Logistic Regression Classifier w/ Imp... Elapsed:00:00 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.382 Batch 1: (3/9) Random Forest Classifier w/ Imputer +... Elapsed:00:03 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.252 Batch 1: (4/9) XGBoost Classifier w/ Imputer + DateT... Elapsed:00:06 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.246 Batch 1: (5/9) CatBoost Classifier w/ Imputer + Date... Elapsed:00:08 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.538 Batch 1: (6/9) Elastic Net Classifier w/ Imputer + D... Elapsed:00:09 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.408 Batch 1: (7/9) Extra Trees Classifier w/ Imputer + D... Elapsed:00:11 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.357 Batch 1: (8/9) LightGBM Classifier w/ Imputer + Date... Elapsed:00:13 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.352 High coefficient of variation (cv >= 0.2) within cross validation scores. LightGBM Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder may not perform as estimated on unseen data. Batch 1: (9/9) Decision Tree Classifier w/ Imputer +... Elapsed:00:15 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.483 High coefficient of variation (cv >= 0.2) within cross validation scores. Decision Tree Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder may not perform as estimated on unseen data. Search finished after 00:16 Best pipeline: XGBoost Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder Best pipeline Log Loss Binary: 0.245965
The AutoML search will log its progress, reporting each pipeline and parameter set evaluated during the search.
There are a number of mechanisms to control the AutoML search time. One way is to set the max_batches parameter which controls the maximum number of rounds of AutoML to evaluate, where each round may train and score a variable number of pipelines. Another way is to set the max_iterations parameter which controls the maximum number of candidate models to be evaluated during AutoML. By default, AutoML will search for a single batch. The first pipeline to be evaluated will always be a baseline model representing a trivial solution.
max_batches
max_iterations
The AutoML interface supports a variety of other parameters. For a comprehensive list, please refer to the API reference.
EvalML includes a simple method, detect_problem_type, to help determine the problem type given the target data.
detect_problem_type
This function can return the predicted problem type as a ProblemType enum, choosing from ProblemType.BINARY, ProblemType.MULTICLASS, and ProblemType.REGRESSION. If the target data is invalid (for instance when there is only 1 unique label), the function will throw an error instead.
[3]:
import pandas as pd from evalml.problem_types import detect_problem_type y = pd.Series([0, 1, 1, 0, 1, 1]) detect_problem_type(y)
<ProblemTypes.BINARY: 'binary'>
AutoMLSearch takes in an objective parameter to determine which objective to optimize for. By default, this parameter is set to auto, which allows AutoML to choose LogLossBinary for binary classification problems, LogLossMulticlass for multiclass classification problems, and R2 for regression problems.
objective
auto
LogLossBinary
LogLossMulticlass
R2
It should be noted that the objective parameter is only used in ranking and helping choose the pipelines to iterate over, but is not used to optimize each individual pipeline during fit-time.
To get the default objective for each problem type, you can use the get_default_primary_search_objective function.
get_default_primary_search_objective
[4]:
from evalml.automl import get_default_primary_search_objective binary_objective = get_default_primary_search_objective("binary") multiclass_objective = get_default_primary_search_objective("multiclass") regression_objective = get_default_primary_search_objective("regression") print(binary_objective.name) print(multiclass_objective.name) print(regression_objective.name)
Log Loss Binary Log Loss Multiclass R2
AutoMLSearch.search runs a set of data checks before beginning the search process to ensure that the input data being passed will not run into some common issues before running a potentially time-consuming search. If the data checks find any potential errors, an exception will be thrown before the search begins, allowing users to inspect their data to avoid confusing errors that may arise later during the search process.
AutoMLSearch.search
This behavior is controlled by the data_checks parameter which can take in either a DataChecks object, a list of DataCheck objects, None, or valid string inputs ("disabled", "auto"). By default, this parameter is set to auto, which runs the default collection of data sets defined in the DefaultDataChecks class. If set to "disabled" or None, no data checks will run.
data_checks
DataChecks
DataCheck
None
"disabled"
"auto"
DefaultDataChecks
EvalML’s AutoML algorithm generates a set of pipelines to search with. To provide a custom set instead, set allowed_pipelines to a list of custom pipeline classes. Note: this will prevent AutoML from generating other pipelines to search over.
[5]:
from evalml.pipelines import MulticlassClassificationPipeline class CustomMulticlassClassificationPipeline(MulticlassClassificationPipeline): component_graph = ['Simple Imputer', 'Random Forest Classifier'] automl_custom = evalml.automl.AutoMLSearch(X_train=X, y_train=y, problem_type='multiclass', allowed_pipelines=[CustomMulticlassClassificationPipeline])
Using default limit of max_batches=1.
To stop the search early, hit Ctrl-C. This will bring up a prompt asking for confirmation. Responding with y will immediately stop the search. Responding with n will continue the search.
Ctrl-C
y
n
AutoMLSearch supports several callback functions, which can be specified as parameters when initializing an AutoMLSearch object. They are:
AutoMLSearch
start_iteration_callback
add_result_callback
error_callback
Users can set start_iteration_callback to set what function is called before each pipeline training iteration. This callback function must take three positional parameters: the pipeline class, the pipeline parameters, and the AutoMLSearch object.
[6]:
## start_iteration_callback example function def start_iteration_callback_example(pipeline_class, pipeline_params, automl_obj): print ("Training pipeline with the following parameters:", pipeline_params)
Users can set add_result_callback to set what function is called after each pipeline training iteration. This callback function must take three positional parameters: a dictionary containing the training results for the new pipeline, an untrained_pipeline containing the parameters used during training, and the AutoMLSearch object.
[7]:
## add_result_callback example function def add_result_callback_example(pipeline_results_dict, untrained_pipeline, automl_obj): print ("Results for trained pipeline with the following parameters:", pipeline_results_dict)
Users can set the error_callback to set what function called when search() errors and raises an Exception. This callback function takes three positional parameters: the Exception raised, the traceback, and the AutoMLSearch object. This callback function must also accept kwargs, so AutoMLSearch is able to pass along other parameters used by default.
search()
Exception
Exception raised
AutoMLSearch object
kwargs
Evalml defines several error callback functions, which can be found under evalml.automl.callbacks. They are:
evalml.automl.callbacks
silent_error_callback
raise_error_callback
log_and_save_error_callback
raise_and_save_error_callback
log_error_callback (default used when error_callback is None)
log_error_callback
[8]:
# error_callback example; this is implemented in the evalml library def raise_error_callback(exception, traceback, automl, **kwargs): """Raises the exception thrown by the AutoMLSearch object. Also logs the exception as an error.""" logger.error(f'AutoMLSearch raised a fatal exception: {str(exception)}') logger.error("\n".join(traceback)) raise exception
A summary of all the pipelines built can be returned as a pandas DataFrame which is sorted by score. The score column contains the average score across all cross-validation folds while the validation_score column is computed from the first cross-validation fold.
[9]:
automl.rankings
Each pipeline is given an id. We can get more information about any particular pipeline using that id. Here, we will get more information about the pipeline with id = 1.
id
id = 1
[10]:
automl.describe_pipeline(1)
******************************************************************************************************************** * Logistic Regression Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder + Standard Scaler * ******************************************************************************************************************** Problem Type: binary Model Family: Linear Pipeline Steps ============== 1. Imputer * categorical_impute_strategy : most_frequent * numeric_impute_strategy : mean * categorical_fill_value : None * numeric_fill_value : None 2. DateTime Featurization Component * features_to_extract : ['year', 'month', 'day_of_week', 'hour'] * encode_as_categories : False 3. One Hot Encoder * top_n : 10 * features_to_encode : None * categories : None * drop : None * handle_unknown : ignore * handle_missing : error 4. Standard Scaler 5. Logistic Regression Classifier * penalty : l2 * C : 1.0 * n_jobs : -1 * multi_class : auto * solver : lbfgs Training ======== Training for binary problems. Total training time (including CV): 3.6 seconds Cross Validation ---------------- Log Loss Binary MCC Binary AUC Precision F1 Balanced Accuracy Binary Accuracy Binary # Training # Validation 0 0.396 0.347 0.707 0.632 0.364 0.615 0.874 666.000 334.000 1 0.406 0.239 0.707 0.455 0.290 0.585 0.853 667.000 333.000 2 0.346 0.292 0.773 0.667 0.271 0.578 0.871 667.000 333.000 mean 0.382 0.293 0.729 0.584 0.308 0.593 0.866 - - std 0.032 0.054 0.039 0.114 0.049 0.020 0.012 - - coef of var 0.084 0.183 0.053 0.195 0.159 0.033 0.013 - -
We can get the object of any pipeline via their id as well:
[11]:
pipeline = automl.get_pipeline(1) print(pipeline.name) print(pipeline.parameters)
Logistic Regression Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder + Standard Scaler {'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'DateTime Featurization Component': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': None, 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Logistic Regression Classifier': {'penalty': 'l2', 'C': 1.0, 'n_jobs': -1, 'multi_class': 'auto', 'solver': 'lbfgs'}}
If you specifically want to get the best pipeline, there is a convenient accessor for that. The pipeline returned is already fitted on the input X, y data that we passed to AutoMLSearch. To turn off this default behavior, set train_best_pipeline=False when initializing AutoMLSearch.
train_best_pipeline=False
[12]:
best_pipeline = automl.best_pipeline print(best_pipeline.name) print(best_pipeline.parameters) best_pipeline.predict(X)
Logistic Regression Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder + Standard Scaler {'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'DateTime Featurization Component': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': None, 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'XGBoost Classifier': {'eta': 0.1, 'max_depth': 6, 'min_child_weight': 1, 'n_estimators': 100}}
<DataColumn: fraud (Physical Type = boolean) (Logical Type = Boolean) (Semantic Tags = set())>
There are two ways to save results from AutoMLSearch.
You can save the AutoMLSearch object itself, calling .save(<filepath>) to do so. This will allow you to save the AutoMLSearch state and reload all pipelines from this.
.save(<filepath>)
If you want to save a pipeline from AutoMLSearch for future use, you can pickle the resulting pipeline. While pipeline classes themselves have a .save(<filepath>) method, you can also pickle the resulting pipeline in order to save it.
** Stacked Ensembling pipelines cannot currently be pickled
[13]:
# saving the best pipeline using .save() # best_pipeline.save("file_path_here") # saving the best pipeline using pickle import pickle pickled_pipeline = pickle.dumps(best_pipeline) best_unpickled_pipeline = pickle.loads(pickled_pipeline) best_unpickled_pipeline
GeneratedPipelineBinary(parameters={'Imputer':{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'DateTime Featurization Component':{'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False}, 'One Hot Encoder':{'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': None, 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'XGBoost Classifier':{'eta': 0.1, 'max_depth': 6, 'min_child_weight': 1, 'n_estimators': 100},})
The AutoML search algorithm first trains each component in the pipeline with their default values. After the first iteration, it then tweaks the parameters of these components using the pre-defined hyperparameter ranges that these components have. To limit the search over certain hyperparameter ranges, you can specify a pipeline_parameters argument with your pipeline parameters. These parameters will also limit the hyperparameter search space. Hyperparameter ranges can be found through the API reference for each component. Parameter arguments must be specified as dictionaries, but the associated values can be single values, list/tuples, or skopt.space Real, Integer, Categorical values.
pipeline_parameters
skopt.space
[14]:
from evalml import AutoMLSearch from evalml.demos import load_fraud from skopt.space import Categorical from evalml.model_family import ModelFamily import woodwork as ww X, y = load_fraud(n_rows=1000) # example of setting parameter to just one value pipeline_hyperparameters = {'Imputer': { 'numeric_impute_strategy': 'mean' }} # limit the numeric impute strategy to include only `median` and `most_frequent` # `mean` is the default value for this argument, but it doesn't need to be included in the specified hyperparameter range for this to work pipeline_hyperparameters = {'Imputer': { 'numeric_impute_strategy': ['median', 'most_frequent'] }} # example using skopt.space.Categorical pipeline_hyperparameters = {'Imputer': { 'numeric_impute_strategy': Categorical(['median', 'most_frequent']) }} # using this pipeline parameter means that our Imputer components in the pipelines will only search through 'median' and 'most_frequent' stretegies for 'numeric_impute_strategy' automl = AutoMLSearch(X_train=X, y_train=y, problem_type='binary', pipeline_parameters=pipeline_hyperparameters) automl.search() automl.best_pipeline.hyperparameters
Number of Features Boolean 1 Categorical 6 Numeric 5 Number of training examples: 1000 Targets False 85.90% True 14.10% Name: fraud, dtype: object Using default limit of max_batches=1. Generating pipelines to search over... ***************************** * Beginning pipeline search * ***************************** Optimizing for Log Loss Binary. Lower score is better. Searching up to 1 batches for a total of 9 pipelines. Allowed model families: xgboost, lightgbm, linear_model, decision_tree, extra_trees, random_forest, catboost
Batch 1: (1/9) Mode Baseline Binary Classification P... Elapsed:00:00 Starting cross validation Finished cross validation - mean Log Loss Binary: 4.870 Batch 1: (2/9) Logistic Regression Classifier w/ Imp... Elapsed:00:00 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.382 Batch 1: (3/9) Random Forest Classifier w/ Imputer +... Elapsed:00:02 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.252 Batch 1: (4/9) XGBoost Classifier w/ Imputer + DateT... Elapsed:00:04 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.246 Batch 1: (5/9) CatBoost Classifier w/ Imputer + Date... Elapsed:00:06 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.538 Batch 1: (6/9) Elastic Net Classifier w/ Imputer + D... Elapsed:00:07 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.408 Batch 1: (7/9) Extra Trees Classifier w/ Imputer + D... Elapsed:00:09 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.357 Batch 1: (8/9) LightGBM Classifier w/ Imputer + Date... Elapsed:00:11 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.352 High coefficient of variation (cv >= 0.2) within cross validation scores. LightGBM Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder may not perform as estimated on unseen data. Batch 1: (9/9) Decision Tree Classifier w/ Imputer +... Elapsed:00:13 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.483 High coefficient of variation (cv >= 0.2) within cross validation scores. Decision Tree Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder may not perform as estimated on unseen data. Search finished after 00:15 Best pipeline: XGBoost Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder Best pipeline Log Loss Binary: 0.245965
{'Imputer': {'categorical_impute_strategy': ['most_frequent'], 'numeric_impute_strategy': Categorical(categories=('median', 'most_frequent'), prior=None)}, 'DateTime Featurization Component': {}, 'One Hot Encoder': {}, 'XGBoost Classifier': {'eta': Real(low=1e-06, high=1, prior='uniform', transform='identity'), 'max_depth': Integer(low=1, high=10, prior='uniform', transform='identity'), 'min_child_weight': Real(low=1, high=10, prior='uniform', transform='identity'), 'n_estimators': Integer(low=1, high=1000, prior='uniform', transform='identity')}}
The AutoMLSearch class records detailed results information under the results field, including information about the cross-validation scoring and parameters.
results
[15]:
automl.results
{'pipeline_results': {0: {'id': 0, 'pipeline_name': 'Mode Baseline Binary Classification Pipeline', 'pipeline_class': evalml.pipelines.classification.baseline_binary.ModeBaselineBinaryPipeline, 'pipeline_summary': 'Baseline Classifier', 'parameters': {'Baseline Classifier': {'strategy': 'mode'}}, 'score': 4.869977201906586, 'high_variance_cv': False, 'training_time': 0.20953726768493652, 'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary', 4.860246977726953), ('MCC Binary', 0.0), ('AUC', 0.5), ('Precision', 0.0), ('F1', 0.0), ('Balanced Accuracy Binary', 0.5), ('Accuracy Binary', 0.8592814371257484), ('# Training', 666), ('# Validation', 334)]), 'score': 4.860246977726953, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 4.874842313996403), ('MCC Binary', 0.0), ('AUC', 0.5), ('Precision', 0.0), ('F1', 0.0), ('Balanced Accuracy Binary', 0.5), ('Accuracy Binary', 0.8588588588588588), ('# Training', 667), ('# Validation', 333)]), 'score': 4.874842313996403, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 4.874842313996403), ('MCC Binary', 0.0), ('AUC', 0.5), ('Precision', 0.0), ('F1', 0.0), ('Balanced Accuracy Binary', 0.5), ('Accuracy Binary', 0.8588588588588588), ('# Training', 667), ('# Validation', 333)]), 'score': 4.874842313996403, 'binary_classification_threshold': 0.5}], 'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 0, 'MCC Binary': nan, 'AUC': 0, 'Precision': nan, 'F1': nan, 'Balanced Accuracy Binary': 0, 'Accuracy Binary': 0}, 'percent_better_than_baseline': 0, 'validation_score': 4.860246977726953}, 1: {'id': 1, 'pipeline_name': 'Logistic Regression Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder + Standard Scaler', 'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline, 'pipeline_summary': 'Logistic Regression Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder + Standard Scaler', 'parameters': {'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'DateTime Featurization Component': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': None, 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Logistic Regression Classifier': {'penalty': 'l2', 'C': 1.0, 'n_jobs': -1, 'multi_class': 'auto', 'solver': 'lbfgs'}}, 'score': 0.3824517825380312, 'high_variance_cv': False, 'training_time': 2.227890968322754, 'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary', 0.39602978616144296), ('MCC Binary', 0.34668583405177644), ('AUC', 0.7067239973311586), ('Precision', 0.631578947368421), ('F1', 0.3636363636363636), ('Balanced Accuracy Binary', 0.6154644525168655), ('Accuracy Binary', 0.874251497005988), ('# Training', 666), ('# Validation', 334)]), 'score': 0.39602978616144296, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.4056225268634727), ('MCC Binary', 0.23941337188423428), ('AUC', 0.7067400684421961), ('Precision', 0.45454545454545453), ('F1', 0.28985507246376807), ('Balanced Accuracy Binary', 0.5854039577443833), ('Accuracy Binary', 0.8528528528528528), ('# Training', 667), ('# Validation', 333)]), 'score': 0.4056225268634727, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.34570303458917806), ('MCC Binary', 0.29183959232423795), ('AUC', 0.7734712096414225), ('Precision', 0.6666666666666666), ('F1', 0.2711864406779661), ('Balanced Accuracy Binary', 0.5781133759857164), ('Accuracy Binary', 0.8708708708708709), ('# Training', 667), ('# Validation', 333)]), 'score': 0.34570303458917806, 'binary_classification_threshold': 0.5}], 'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 92.14674388232655, 'MCC Binary': nan, 'AUC': 45.795685027651814, 'Precision': nan, 'F1': nan, 'Balanced Accuracy Binary': 18.598785749797674, 'Accuracy Binary': 0.8139725559017473}, 'percent_better_than_baseline': 92.14674388232655, 'validation_score': 0.39602978616144296}, 2: {'id': 2, 'pipeline_name': 'Random Forest Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder', 'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline, 'pipeline_summary': 'Random Forest Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder', 'parameters': {'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'DateTime Featurization Component': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': None, 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}, 'score': 0.25233445502464286, 'high_variance_cv': False, 'training_time': 2.164407730102539, 'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary', 0.25526012051031655), ('MCC Binary', 0.7474984936795003), ('AUC', 0.8596634294610422), ('Precision', 1.0), ('F1', 0.7466666666666666), ('Balanced Accuracy Binary', 0.7978723404255319), ('Accuracy Binary', 0.9431137724550899), ('# Training', 666), ('# Validation', 334)]), 'score': 0.25526012051031655, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.26047409646960146), ('MCC Binary', 0.7474173647473783), ('AUC', 0.8171403065020086), ('Precision', 1.0), ('F1', 0.7466666666666666), ('Balanced Accuracy Binary', 0.7978723404255319), ('Accuracy Binary', 0.9429429429429429), ('# Training', 667), ('# Validation', 333)]), 'score': 0.26047409646960146, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.24126914809401068), ('MCC Binary', 0.8043143718855572), ('AUC', 0.8396072013093291), ('Precision', 1.0), ('F1', 0.810126582278481), ('Balanced Accuracy Binary', 0.8404255319148937), ('Accuracy Binary', 0.954954954954955), ('# Training', 667), ('# Validation', 333)]), 'score': 0.24126914809401068, 'binary_classification_threshold': 0.5}], 'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 94.81857009667613, 'MCC Binary': nan, 'AUC': 67.760729151492, 'Precision': nan, 'F1': nan, 'Balanced Accuracy Binary': 62.4113475177305, 'Accuracy Binary': 10.244959336261747}, 'percent_better_than_baseline': 94.81857009667613, 'validation_score': 0.25526012051031655}, 3: {'id': 3, 'pipeline_name': 'XGBoost Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder', 'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline, 'pipeline_summary': 'XGBoost Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder', 'parameters': {'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'DateTime Featurization Component': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': None, 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'XGBoost Classifier': {'eta': 0.1, 'max_depth': 6, 'min_child_weight': 1, 'n_estimators': 100}}, 'score': 0.2459646914051953, 'high_variance_cv': False, 'training_time': 2.041226387023926, 'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary', 0.2188653516692427), ('MCC Binary', 0.7607358893588453), ('AUC', 0.8457261472310771), ('Precision', 0.967741935483871), ('F1', 0.7692307692307693), ('Balanced Accuracy Binary', 0.8174067758914672), ('Accuracy Binary', 0.9461077844311377), ('# Training', 666), ('# Validation', 334)]), 'score': 0.2188653516692427, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.2992125606755998), ('MCC Binary', 0.7165182554614234), ('AUC', 0.7985418836482667), ('Precision', 0.90625), ('F1', 0.7341772151898734), ('Balanced Accuracy Binary', 0.8032658830531171), ('Accuracy Binary', 0.9369369369369369), ('# Training', 667), ('# Validation', 333)]), 'score': 0.2992125606755998, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.21981616187074343), ('MCC Binary', 0.7610249407325576), ('AUC', 0.84846005058771), ('Precision', 0.9142857142857143), ('F1', 0.7804878048780487), ('Balanced Accuracy Binary', 0.8351807766701383), ('Accuracy Binary', 0.9459459459459459), ('# Training', 667), ('# Validation', 333)]), 'score': 0.21981616187074343, 'binary_classification_threshold': 0.5}], 'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 94.94936667652365, 'MCC Binary': nan, 'AUC': 66.18187209780359, 'Precision': nan, 'F1': nan, 'Balanced Accuracy Binary': 63.723562374314824, 'Accuracy Binary': 9.778486422742397}, 'percent_better_than_baseline': 94.94936667652365, 'validation_score': 0.2188653516692427}, 4: {'id': 4, 'pipeline_name': 'CatBoost Classifier w/ Imputer + DateTime Featurization Component', 'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline, 'pipeline_summary': 'CatBoost Classifier w/ Imputer + DateTime Featurization Component', 'parameters': {'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'DateTime Featurization Component': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False}, 'CatBoost Classifier': {'n_estimators': 10, 'eta': 0.03, 'max_depth': 6, 'bootstrap_type': None, 'silent': True, 'allow_writing_files': False}}, 'score': 0.538465967277899, 'high_variance_cv': False, 'training_time': 0.966094970703125, 'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary', 0.531987385868022), ('MCC Binary', 0.7474984936795003), ('AUC', 0.8546593520646453), ('Precision', 1.0), ('F1', 0.7466666666666666), ('Balanced Accuracy Binary', 0.7978723404255319), ('Accuracy Binary', 0.9431137724550899), ('# Training', 666), ('# Validation', 334)]), 'score': 0.531987385868022, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.5402437720832401), ('MCC Binary', 0.6719193663425421), ('AUC', 0.8084734414521648), ('Precision', 1.0), ('F1', 0.6571428571428571), ('Balanced Accuracy Binary', 0.7446808510638298), ('Accuracy Binary', 0.9279279279279279), ('# Training', 667), ('# Validation', 333)]), 'score': 0.5402437720832401, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.5431667438824346), ('MCC Binary', 0.8043143718855572), ('AUC', 0.8094405594405595), ('Precision', 1.0), ('F1', 0.810126582278481), ('Balanced Accuracy Binary', 0.8404255319148937), ('Accuracy Binary', 0.954954954954955), ('# Training', 667), ('# Validation', 333)]), 'score': 0.5431667438824346, 'binary_classification_threshold': 0.5}], 'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 88.9431521965423, 'MCC Binary': nan, 'AUC': 64.83822353049132, 'Precision': nan, 'F1': nan, 'Balanced Accuracy Binary': 58.86524822695034, 'Accuracy Binary': 9.662304313391665}, 'percent_better_than_baseline': 88.9431521965423, 'validation_score': 0.531987385868022}, 5: {'id': 5, 'pipeline_name': 'Elastic Net Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder + Standard Scaler', 'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline, 'pipeline_summary': 'Elastic Net Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder + Standard Scaler', 'parameters': {'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'DateTime Featurization Component': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': None, 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Elastic Net Classifier': {'alpha': 0.5, 'l1_ratio': 0.5, 'n_jobs': -1, 'max_iter': 1000, 'penalty': 'elasticnet', 'loss': 'log'}}, 'score': 0.4084788011821419, 'high_variance_cv': False, 'training_time': 2.1921446323394775, 'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary', 0.4095832612718903), ('MCC Binary', 0.0), ('AUC', 0.5), ('Precision', 0.0), ('F1', 0.0), ('Balanced Accuracy Binary', 0.5), ('Accuracy Binary', 0.8592814371257484), ('# Training', 666), ('# Validation', 334)]), 'score': 0.4095832612718903, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.4083022043442566), ('MCC Binary', 0.0), ('AUC', 0.5), ('Precision', 0.0), ('F1', 0.0), ('Balanced Accuracy Binary', 0.5), ('Accuracy Binary', 0.8588588588588588), ('# Training', 667), ('# Validation', 333)]), 'score': 0.4083022043442566, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.4075509379302789), ('MCC Binary', 0.0), ('AUC', 0.5), ('Precision', 0.0), ('F1', 0.0), ('Balanced Accuracy Binary', 0.5), ('Accuracy Binary', 0.8588588588588588), ('# Training', 667), ('# Validation', 333)]), 'score': 0.4075509379302789, 'binary_classification_threshold': 0.5}], 'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 91.61230567933207, 'MCC Binary': nan, 'AUC': 0, 'Precision': nan, 'F1': nan, 'Balanced Accuracy Binary': 0, 'Accuracy Binary': 0}, 'percent_better_than_baseline': 91.61230567933207, 'validation_score': 0.4095832612718903}, 6: {'id': 6, 'pipeline_name': 'Extra Trees Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder', 'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline, 'pipeline_summary': 'Extra Trees Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder', 'parameters': {'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'DateTime Featurization Component': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': None, 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Extra Trees Classifier': {'n_estimators': 100, 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_jobs': -1}}, 'score': 0.35683886218960126, 'high_variance_cv': False, 'training_time': 2.0275673866271973, 'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary', 0.3531391633892876), ('MCC Binary', 0.0), ('AUC', 0.8072503521387798), ('Precision', 0.0), ('F1', 0.0), ('Balanced Accuracy Binary', 0.5), ('Accuracy Binary', 0.8592814371257484), ('# Training', 666), ('# Validation', 334)]), 'score': 0.3531391633892876, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.3588500477508707), ('MCC Binary', 0.0), ('AUC', 0.7897634280613003), ('Precision', 0.0), ('F1', 0.0), ('Balanced Accuracy Binary', 0.5), ('Accuracy Binary', 0.8588588588588588), ('# Training', 667), ('# Validation', 333)]), 'score': 0.3588500477508707, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.35852737542864566), ('MCC Binary', 0.0), ('AUC', 0.8048653474185388), ('Precision', 0.0), ('F1', 0.0), ('Balanced Accuracy Binary', 0.5), ('Accuracy Binary', 0.8588588588588588), ('# Training', 667), ('# Validation', 333)]), 'score': 0.35852737542864566, 'binary_classification_threshold': 0.5}], 'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 92.67267900042941, 'MCC Binary': nan, 'AUC': 60.1252751745746, 'Precision': nan, 'F1': nan, 'Balanced Accuracy Binary': 0, 'Accuracy Binary': 0}, 'percent_better_than_baseline': 92.67267900042941, 'validation_score': 0.3531391633892876}, 7: {'id': 7, 'pipeline_name': 'LightGBM Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder', 'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline, 'pipeline_summary': 'LightGBM Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder', 'parameters': {'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'DateTime Featurization Component': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': None, 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'LightGBM Classifier': {'boosting_type': 'gbdt', 'learning_rate': 0.1, 'n_estimators': 100, 'max_depth': 0, 'num_leaves': 31, 'min_child_samples': 20, 'n_jobs': -1, 'bagging_freq': 0, 'bagging_fraction': 0.9}}, 'score': 0.3522743138094068, 'high_variance_cv': True, 'training_time': 1.9921033382415771, 'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary', 0.3157031461443878), ('MCC Binary', 0.7607358893588453), ('AUC', 0.8560308399436578), ('Precision', 0.967741935483871), ('F1', 0.7692307692307693), ('Balanced Accuracy Binary', 0.8174067758914672), ('Accuracy Binary', 0.9461077844311377), ('# Training', 666), ('# Validation', 334)]), 'score': 0.3157031461443878, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.4524744387873468), ('MCC Binary', 0.7165182554614234), ('AUC', 0.7794227049546198), ('Precision', 0.90625), ('F1', 0.7341772151898734), ('Balanced Accuracy Binary', 0.8032658830531171), ('Accuracy Binary', 0.9369369369369369), ('# Training', 667), ('# Validation', 333)]), 'score': 0.4524744387873468, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.28864535649648576), ('MCC Binary', 0.7610249407325576), ('AUC', 0.872042850766255), ('Precision', 0.9142857142857143), ('F1', 0.7804878048780487), ('Balanced Accuracy Binary', 0.8351807766701383), ('Accuracy Binary', 0.9459459459459459), ('# Training', 667), ('# Validation', 333)]), 'score': 0.28864535649648576, 'binary_classification_threshold': 0.5}], 'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 92.76640733202012, 'MCC Binary': nan, 'AUC': 67.16642637763553, 'Precision': nan, 'F1': nan, 'Balanced Accuracy Binary': 63.723562374314824, 'Accuracy Binary': 9.778486422742397}, 'percent_better_than_baseline': 92.76640733202012, 'validation_score': 0.3157031461443878}, 8: {'id': 8, 'pipeline_name': 'Decision Tree Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder', 'pipeline_class': evalml.pipelines.utils.make_pipeline.<locals>.GeneratedPipeline, 'pipeline_summary': 'Decision Tree Classifier w/ Imputer + DateTime Featurization Component + One Hot Encoder', 'parameters': {'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'DateTime Featurization Component': {'features_to_extract': ['year', 'month', 'day_of_week', 'hour'], 'encode_as_categories': False}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': None, 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Decision Tree Classifier': {'criterion': 'gini', 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0}}, 'score': 0.4828046888599471, 'high_variance_cv': True, 'training_time': 1.505995750427246, 'cv_data': [{'all_objective_scores': OrderedDict([('Log Loss Binary', 0.3541905956044188), ('MCC Binary', 0.6701630346788403), ('AUC', 0.7739639706427459), ('Precision', 0.96), ('F1', 0.6666666666666666), ('Balanced Accuracy Binary', 0.7535769886574246), ('Accuracy Binary', 0.9281437125748503), ('# Training', 666), ('# Validation', 334)]), 'score': 0.3541905956044188, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.6977250643948881), ('MCC Binary', 0.1917498745727438), ('AUC', 0.47604523136438026), ('Precision', 1.0), ('F1', 0.08163265306122448), ('Balanced Accuracy Binary', 0.5212765957446809), ('Accuracy Binary', 0.8648648648648649), ('# Training', 667), ('# Validation', 333)]), 'score': 0.6977250643948881, 'binary_classification_threshold': 0.5}, {'all_objective_scores': OrderedDict([('Log Loss Binary', 0.3964984065805345), ('MCC Binary', 0.7313072646477947), ('AUC', 0.8358131230471656), ('Precision', 0.9655172413793104), ('F1', 0.7368421052631579), ('Balanced Accuracy Binary', 0.7961240886772801), ('Accuracy Binary', 0.93993993993994), ('# Training', 667), ('# Validation', 333)]), 'score': 0.3964984065805345, 'binary_classification_threshold': 0.5}], 'percent_better_than_baseline_all_objectives': {'Log Loss Binary': 90.08609960903041, 'MCC Binary': nan, 'AUC': 39.05482167028611, 'Precision': nan, 'F1': nan, 'Balanced Accuracy Binary': 38.06517820529238, 'Accuracy Binary': 6.051587647713523}, 'percent_better_than_baseline': 90.08609960903041, 'validation_score': 0.3541905956044188}}, 'search_order': [0, 1, 2, 3, 4, 5, 6, 7, 8], 'errors': []}
Stacking is an ensemble machine learning algorithm that involves training a model to best combine the predictions of several base learning algorithms. First, each base learning algorithms is trained using the given data. Then, the combining algorithm or meta-learner is trained on the predictions made by those base learning algorithms to make a final prediction.
AutoML enables stacking using the ensembling flag during initalization; this is set to False by default. The stacking ensemble pipeline runs in its own batch after a whole cycle of training has occurred (each allowed pipeline trains for one batch). Note that this means a large number of iterations may need to run before the stacking ensemble runs. It is also important to note that only the first CV fold is calculated for stacking ensembles because the model internally uses CV folds.
ensembling
False
[16]:
X, y = evalml.demos.load_breast_cancer() automl_with_ensembling = AutoMLSearch(X_train=X, y_train=y, problem_type="binary", allowed_model_families=[ModelFamily.RANDOM_FOREST, ModelFamily.LINEAR_MODEL], max_batches=5, ensembling=True) automl_with_ensembling.search()
Numerical binary classification target classes must be [0, 1], got [benign, malignant] instead Generating pipelines to search over... Ensembling will run every 4 batches. ***************************** * Beginning pipeline search * ***************************** Optimizing for Log Loss Binary. Lower score is better. Searching up to 5 batches for a total of 20 pipelines. Allowed model families: linear_model, random_forest
Batch 1: (1/20) Mode Baseline Binary Classification P... Elapsed:00:00 Starting cross validation Finished cross validation - mean Log Loss Binary: 12.868 Batch 1: (2/20) Logistic Regression Classifier w/ Imp... Elapsed:00:00 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.074 High coefficient of variation (cv >= 0.2) within cross validation scores. Logistic Regression Classifier w/ Imputer + Standard Scaler may not perform as estimated on unseen data. Batch 1: (3/20) Random Forest Classifier w/ Imputer Elapsed:00:01 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.132 Batch 1: (4/20) Elastic Net Classifier w/ Imputer + S... Elapsed:00:02 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.505 Batch 2: (5/20) Logistic Regression Classifier w/ Imp... Elapsed:00:03 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.093 High coefficient of variation (cv >= 0.2) within cross validation scores. Logistic Regression Classifier w/ Imputer + Standard Scaler may not perform as estimated on unseen data. Batch 2: (6/20) Logistic Regression Classifier w/ Imp... Elapsed:00:03 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.087 High coefficient of variation (cv >= 0.2) within cross validation scores. Logistic Regression Classifier w/ Imputer + Standard Scaler may not perform as estimated on unseen data. Batch 2: (7/20) Logistic Regression Classifier w/ Imp... Elapsed:00:04 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.082 High coefficient of variation (cv >= 0.2) within cross validation scores. Logistic Regression Classifier w/ Imputer + Standard Scaler may not perform as estimated on unseen data. Batch 2: (8/20) Logistic Regression Classifier w/ Imp... Elapsed:00:05 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.076 High coefficient of variation (cv >= 0.2) within cross validation scores. Logistic Regression Classifier w/ Imputer + Standard Scaler may not perform as estimated on unseen data. Batch 2: (9/20) Logistic Regression Classifier w/ Imp... Elapsed:00:06 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.093 High coefficient of variation (cv >= 0.2) within cross validation scores. Logistic Regression Classifier w/ Imputer + Standard Scaler may not perform as estimated on unseen data. Batch 3: (10/20) Random Forest Classifier w/ Imputer Elapsed:00:07 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.128 Batch 3: (11/20) Random Forest Classifier w/ Imputer Elapsed:00:12 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.171 Batch 3: (12/20) Random Forest Classifier w/ Imputer Elapsed:00:16 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.172 Batch 3: (13/20) Random Forest Classifier w/ Imputer Elapsed:00:19 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.128 Batch 3: (14/20) Random Forest Classifier w/ Imputer Elapsed:00:24 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.133 Batch 4: (15/20) Elastic Net Classifier w/ Imputer + S... Elapsed:00:29 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.308 Batch 4: (16/20) Elastic Net Classifier w/ Imputer + S... Elapsed:00:30 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.452 Batch 4: (17/20) Elastic Net Classifier w/ Imputer + S... Elapsed:00:30 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.661 Batch 4: (18/20) Elastic Net Classifier w/ Imputer + S... Elapsed:00:31 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.216 Batch 4: (19/20) Elastic Net Classifier w/ Imputer + S... Elapsed:00:32 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.661 Batch 5: (20/20) Stacked Ensemble Classification Pipeline Elapsed:00:33 Starting cross validation Finished cross validation - mean Log Loss Binary: 0.106 Search finished after 00:42 Best pipeline: Logistic Regression Classifier w/ Imputer + Standard Scaler Best pipeline Log Loss Binary: 0.073551
We can view more information about the stacking ensemble pipeline (which was the best performing pipeline) by calling .describe().
.describe()
[17]:
automl_with_ensembling.best_pipeline.describe()
*************************************************************** * Logistic Regression Classifier w/ Imputer + Standard Scaler * *************************************************************** Problem Type: binary Model Family: Linear Number of features: 30 Pipeline Steps ============== 1. Imputer * categorical_impute_strategy : most_frequent * numeric_impute_strategy : mean * categorical_fill_value : None * numeric_fill_value : None 2. Standard Scaler 3. Logistic Regression Classifier * penalty : l2 * C : 1.0 * n_jobs : -1 * multi_class : auto * solver : lbfgs