Exploring search results¶
After finishing a pipeline search, we can inspect the results. First, let’s build a search of 10 different pipelines to explore.
[1]:
import evalml
from evalml import AutoClassificationSearch
X, y = evalml.demos.load_breast_cancer()
automl = AutoClassificationSearch(objective="f1",
max_pipelines=5)
automl.search(X, y)
*****************************
* Beginning pipeline search *
*****************************
Optimizing for F1.
Greater score is better.
Searching up to 5 pipelines.
Allowed model families: catboost, xgboost, random_forest, linear_model
✔ Mode Baseline Binary Classification... 0%| | Elapsed:00:00
✔ Cat Boost Binary Classification Pip... 20%|██ | Elapsed:00:20
✔ Logistic Regression Binary Pipeline: 40%|████ | Elapsed:00:22
✔ Random Forest Binary Classification... 60%|██████ | Elapsed:00:23
✔ XGBoost Binary Classification Pipel... 80%|████████ | Elapsed:00:24
✔ Optimization finished 80%|████████ | Elapsed:00:24
View Rankings¶
A summary of all the pipelines built can be returned as a pandas DataFrame. It is sorted by score. EvalML knows based on our objective function whether higher or lower is better.
[2]:
automl.rankings
[2]:
id | pipeline_name | score | high_variance_cv | parameters | |
---|---|---|---|---|---|
0 | 2 | Logistic Regression Binary Pipeline | 0.982019 | False | {'One Hot Encoder': {'top_n': 10}, 'Simple Imp... |
1 | 1 | Cat Boost Binary Classification Pipeline | 0.976169 | False | {'Simple Imputer': {'impute_strategy': 'most_f... |
2 | 4 | XGBoost Binary Classification Pipeline | 0.970716 | False | {'One Hot Encoder': {'top_n': 10}, 'Simple Imp... |
3 | 3 | Random Forest Binary Classification Pipeline | 0.968074 | False | {'One Hot Encoder': {'top_n': 10}, 'Simple Imp... |
4 | 0 | Mode Baseline Binary Classification Pipeline | 0.771060 | False | {'strategy': 'random_weighted'} |
Describe Pipeline¶
Each pipeline is given an id
. We can get more information about any particular pipeline using that id
. Here, we will get more information about the pipeline with id = 0
.
[3]:
automl.describe_pipeline(1)
********************************************
* Cat Boost Binary Classification Pipeline *
********************************************
Problem Type: Binary Classification
Model Family: CatBoost
Number of features: 30
Pipeline Steps
==============
1. Simple Imputer
* impute_strategy : most_frequent
* fill_value : None
2. CatBoost Classifier
* n_estimators : 1000
* eta : 0.03
* max_depth : 6
Training
========
Training for Binary Classification problems.
Total training time (including CV): 20.8 seconds
Cross Validation
----------------
F1 Accuracy Binary Balanced Accuracy Binary Precision AUC Log Loss Binary MCC Binary # Training # Testing
0 0.962 0.953 0.954 0.974 0.987 0.148 0.900 379.000 190.000
1 0.983 0.979 0.972 0.967 0.995 0.085 0.955 379.000 190.000
2 0.983 0.979 0.974 0.975 0.997 0.067 0.955 380.000 189.000
mean 0.976 0.970 0.967 0.972 0.993 0.100 0.937 - -
std 0.013 0.015 0.011 0.004 0.005 0.043 0.032 - -
coef of var 0.013 0.016 0.012 0.004 0.005 0.427 0.034 - -
Get Pipeline¶
We can get the object of any pipeline via their id
as well:
[4]:
automl.get_pipeline(1)
[4]:
<evalml.pipelines.classification.catboost_binary.CatBoostBinaryClassificationPipeline at 0x7f4664ec53c8>
Get best pipeline¶
If we specifically want to get the best pipeline, there is a convenient access
[5]:
automl.best_pipeline
[5]:
<evalml.pipelines.classification.logistic_regression_binary.LogisticRegressionBinaryPipeline at 0x7f4664f18710>
Feature Importances¶
We can get the feature importances of the resulting pipeline
[6]:
pipeline = automl.get_pipeline(1)
pipeline.feature_importances
[6]:
feature | importance | |
---|---|---|
0 | worst texture | 11.023433 |
1 | worst area | 9.133809 |
2 | worst radius | 8.412493 |
3 | mean concave points | 8.321510 |
4 | worst concave points | 7.129320 |
5 | mean texture | 6.039252 |
6 | worst perimeter | 5.919564 |
7 | worst concavity | 5.786680 |
8 | worst smoothness | 3.957557 |
9 | area error | 3.534828 |
10 | worst symmetry | 3.071672 |
11 | radius error | 2.783052 |
12 | mean concavity | 2.629071 |
13 | compactness error | 2.393736 |
14 | perimeter error | 1.716863 |
15 | mean compactness | 1.635428 |
16 | worst compactness | 1.599189 |
17 | smoothness error | 1.535518 |
18 | concave points error | 1.481802 |
19 | mean smoothness | 1.470453 |
20 | mean radius | 1.321665 |
21 | texture error | 1.298512 |
22 | mean symmetry | 1.240927 |
23 | mean area | 1.228987 |
24 | mean perimeter | 1.076955 |
25 | concavity error | 0.982101 |
26 | worst fractal dimension | 0.967438 |
27 | mean fractal dimension | 0.826398 |
28 | fractal dimension error | 0.823615 |
29 | symmetry error | 0.658173 |
We can also create a bar plot of the feature importances
[7]:
pipeline.graph_feature_importance()
Precision-Recall Curve¶
For binary classification, you can view the precision-recall curve of a classifier
[8]:
# get the predicted probabilities associated with the "true" label
y_pred_proba = pipeline.predict_proba(X)[:, 1]
evalml.pipelines.graph_utils.graph_precision_recall_curve(y, y_pred_proba)
ROC Curve¶
For binary classification, you can view the ROC curve of a classifier
[9]:
# get the predicted probabilities associated with the "true" label
y_pred_proba = pipeline.predict_proba(X)[:, 1]
evalml.pipelines.graph_utils.graph_roc_curve(y, y_pred_proba)
Confusion Matrix¶
For binary or multiclass classification, you can view a confusion matrix of the classifier’s predictions
[10]:
y_pred = pipeline.predict(X)
evalml.pipelines.graph_utils.graph_confusion_matrix(y, y_pred)
Access raw results¶
You can also get access to all the underlying data, like this:
[11]:
automl.results
[11]:
{'pipeline_results': {0: {'id': 0,
'pipeline_name': 'Mode Baseline Binary Classification Pipeline',
'pipeline_class': evalml.pipelines.classification.baseline_binary.ModeBaselineBinaryPipeline,
'pipeline_summary': 'Baseline Classifier',
'parameters': {'strategy': 'random_weighted'},
'score': 0.7710601157203097,
'high_variance_cv': False,
'training_time': 0.03633546829223633,
'cv_data': [{'all_objective_scores': OrderedDict([('F1',
0.7702265372168284),
('Accuracy Binary', 0.6263157894736842),
('Balanced Accuracy Binary', 0.5),
('Precision', 0.6263157894736842),
('AUC', 0.5),
('Log Loss Binary', 0.6608932451679239),
('MCC Binary', 0.0),
('# Training', 379),
('# Testing', 190)]),
'score': 0.7702265372168284},
{'all_objective_scores': OrderedDict([('F1', 0.7702265372168284),
('Accuracy Binary', 0.6263157894736842),
('Balanced Accuracy Binary', 0.5),
('Precision', 0.6263157894736842),
('AUC', 0.5),
('Log Loss Binary', 0.6608932451679239),
('MCC Binary', 0.0),
('# Training', 379),
('# Testing', 190)]),
'score': 0.7702265372168284},
{'all_objective_scores': OrderedDict([('F1', 0.7727272727272727),
('Accuracy Binary', 0.6296296296296297),
('Balanced Accuracy Binary', 0.5),
('Precision', 0.6296296296296297),
('AUC', 0.5),
('Log Loss Binary', 0.6591759924082954),
('MCC Binary', 0.0),
('# Training', 380),
('# Testing', 189)]),
'score': 0.7727272727272727}]},
1: {'id': 1,
'pipeline_name': 'Cat Boost Binary Classification Pipeline',
'pipeline_class': evalml.pipelines.classification.catboost_binary.CatBoostBinaryClassificationPipeline,
'pipeline_summary': 'CatBoost Classifier w/ Simple Imputer',
'parameters': {'Simple Imputer': {'impute_strategy': 'most_frequent',
'fill_value': None},
'CatBoost Classifier': {'n_estimators': 1000,
'eta': 0.03,
'max_depth': 6}},
'score': 0.9761688451243576,
'high_variance_cv': False,
'training_time': 20.784214735031128,
'cv_data': [{'all_objective_scores': OrderedDict([('F1',
0.9617021276595743),
('Accuracy Binary', 0.9526315789473684),
('Balanced Accuracy Binary', 0.9536631554030062),
('Precision', 0.9741379310344828),
('AUC', 0.9874541365842111),
('Log Loss Binary', 0.14774257954380435),
('MCC Binary', 0.9001633057441626),
('# Training', 379),
('# Testing', 190)]),
'score': 0.9617021276595743},
{'all_objective_scores': OrderedDict([('F1', 0.9834710743801653),
('Accuracy Binary', 0.9789473684210527),
('Balanced Accuracy Binary', 0.971830985915493),
('Precision', 0.967479674796748),
('AUC', 0.9946739259083914),
('Log Loss Binary', 0.08460273019201768),
('MCC Binary', 0.9554966130892879),
('# Training', 379),
('# Testing', 190)]),
'score': 0.9834710743801653},
{'all_objective_scores': OrderedDict([('F1', 0.9833333333333334),
('Accuracy Binary', 0.9788359788359788),
('Balanced Accuracy Binary', 0.9743697478991598),
('Precision', 0.9752066115702479),
('AUC', 0.9973589435774309),
('Log Loss Binary', 0.06679970055788743),
('MCC Binary', 0.9546019995535027),
('# Training', 380),
('# Testing', 189)]),
'score': 0.9833333333333334}]},
2: {'id': 2,
'pipeline_name': 'Logistic Regression Binary Pipeline',
'pipeline_class': evalml.pipelines.classification.logistic_regression_binary.LogisticRegressionBinaryPipeline,
'pipeline_summary': 'Logistic Regression Classifier w/ One Hot Encoder + Simple Imputer + Standard Scaler',
'parameters': {'One Hot Encoder': {'top_n': 10},
'Simple Imputer': {'impute_strategy': 'most_frequent', 'fill_value': None},
'Logistic Regression Classifier': {'penalty': 'l2', 'C': 1.0}},
'score': 0.982018787419635,
'high_variance_cv': False,
'training_time': 1.3472530841827393,
'cv_data': [{'all_objective_scores': OrderedDict([('F1', 0.979253112033195),
('Accuracy Binary', 0.9736842105263158),
('Balanced Accuracy Binary', 0.9676293052432241),
('Precision', 0.9672131147540983),
('AUC', 0.9906497810391762),
('Log Loss Binary', 0.09825657399977614),
('MCC Binary', 0.943843520216036),
('# Training', 379),
('# Testing', 190)]),
'score': 0.979253112033195},
{'all_objective_scores': OrderedDict([('F1', 0.9752066115702479),
('Accuracy Binary', 0.968421052631579),
('Balanced Accuracy Binary', 0.9605870517220974),
('Precision', 0.959349593495935),
('AUC', 0.9988164279796425),
('Log Loss Binary', 0.05792932780492265),
('MCC Binary', 0.9327267201397125),
('# Training', 379),
('# Testing', 190)]),
'score': 0.9752066115702479},
{'all_objective_scores': OrderedDict([('F1', 0.9915966386554622),
('Accuracy Binary', 0.9894179894179894),
('Balanced Accuracy Binary', 0.988655462184874),
('Precision', 0.9915966386554622),
('AUC', 0.9968787515006002),
('Log Loss Binary', 0.06446799374034665),
('MCC Binary', 0.9773109243697479),
('# Training', 380),
('# Testing', 189)]),
'score': 0.9915966386554622}]},
3: {'id': 3,
'pipeline_name': 'Random Forest Binary Classification Pipeline',
'pipeline_class': evalml.pipelines.classification.random_forest_binary.RFBinaryClassificationPipeline,
'pipeline_summary': 'Random Forest Classifier w/ One Hot Encoder + Simple Imputer',
'parameters': {'One Hot Encoder': {'top_n': 10},
'Simple Imputer': {'impute_strategy': 'most_frequent', 'fill_value': None},
'Random Forest Classifier': {'n_estimators': 100, 'max_depth': 6}},
'score': 0.9680735152717395,
'high_variance_cv': False,
'training_time': 1.7742538452148438,
'cv_data': [{'all_objective_scores': OrderedDict([('F1',
0.9617021276595743),
('Accuracy Binary', 0.9526315789473684),
('Balanced Accuracy Binary', 0.9536631554030062),
('Precision', 0.9741379310344828),
('AUC', 0.9844952065333176),
('Log Loss Binary', 0.15369056167628),
('MCC Binary', 0.9001633057441626),
('# Training', 379),
('# Testing', 190)]),
'score': 0.9617021276595743},
{'all_objective_scores': OrderedDict([('F1', 0.963265306122449),
('Accuracy Binary', 0.9526315789473684),
('Balanced Accuracy Binary', 0.9394602911587171),
('Precision', 0.9365079365079365),
('AUC', 0.9908273168422297),
('Log Loss Binary', 0.12245669921123793),
('MCC Binary', 0.8996571384709533),
('# Training', 379),
('# Testing', 190)]),
'score': 0.963265306122449},
{'all_objective_scores': OrderedDict([('F1', 0.979253112033195),
('Accuracy Binary', 0.9735449735449735),
('Balanced Accuracy Binary', 0.9672268907563025),
('Precision', 0.9672131147540983),
('AUC', 0.9975990396158464),
('Log Loss Binary', 0.11890545454349591),
('MCC Binary', 0.9433286178446474),
('# Training', 380),
('# Testing', 189)]),
'score': 0.979253112033195}]},
4: {'id': 4,
'pipeline_name': 'XGBoost Binary Classification Pipeline',
'pipeline_class': evalml.pipelines.classification.xgboost_binary.XGBoostBinaryPipeline,
'pipeline_summary': 'XGBoost Classifier w/ One Hot Encoder + Simple Imputer',
'parameters': {'One Hot Encoder': {'top_n': 10},
'Simple Imputer': {'impute_strategy': 'most_frequent', 'fill_value': None},
'XGBoost Classifier': {'eta': 0.1,
'max_depth': 6,
'min_child_weight': 1,
'n_estimators': 100}},
'score': 0.9707162184435432,
'high_variance_cv': False,
'training_time': 0.7069911956787109,
'cv_data': [{'all_objective_scores': OrderedDict([('F1',
0.9617021276595743),
('Accuracy Binary', 0.9526315789473684),
('Balanced Accuracy Binary', 0.9536631554030062),
('Precision', 0.9741379310344828),
('AUC', 0.9863889217658894),
('Log Loss Binary', 0.16201562031423428),
('MCC Binary', 0.9001633057441626),
('# Training', 379),
('# Testing', 190)]),
'score': 0.9617021276595743},
{'all_objective_scores': OrderedDict([('F1', 0.9711934156378601),
('Accuracy Binary', 0.9631578947368421),
('Balanced Accuracy Binary', 0.9535447982009706),
('Precision', 0.9516129032258065),
('AUC', 0.9945555687063559),
('Log Loss Binary', 0.080714067422454),
('MCC Binary', 0.9216584956231404),
('# Training', 379),
('# Testing', 190)]),
'score': 0.9711934156378601},
{'all_objective_scores': OrderedDict([('F1', 0.979253112033195),
('Accuracy Binary', 0.9735449735449735),
('Balanced Accuracy Binary', 0.9672268907563025),
('Precision', 0.9672131147540983),
('AUC', 0.9971188475390156),
('Log Loss Binary', 0.07802530307330131),
('MCC Binary', 0.9433286178446474),
('# Training', 380),
('# Testing', 189)]),
'score': 0.979253112033195}]}},
'search_order': [0, 1, 2, 3, 4]}