Regression Example¶

[1]:

import evalml
from evalml import AutoRegressionSearch
from evalml.demos import load_diabetes
from evalml.pipelines import PipelineBase, get_pipelines


X, y = evalml.demos.load_diabetes()

automl = AutoRegressionSearch(objective="R2", max_pipelines=5)

automl.search(X, y)

*****************************
* Beginning pipeline search *
*****************************

Optimizing for R2. Greater score is better.

Searching up to 5 pipelines.
Possible model types: random_forest, catboost, linear_model

✔ Random Forest Regressor w/ One Hot ...    20%|██        | Elapsed:00:09
✔ Random Forest Regressor w/ One Hot ...    40%|████      | Elapsed:00:16
✔ Linear Regressor w/ One Hot Encoder...    60%|██████    | Elapsed:00:16
✔ Random Forest Regressor w/ One Hot ...    80%|████████  | Elapsed:00:25
✔ CatBoost Regressor w/ Simple Imputer:    100%|██████████| Elapsed:00:26
✔ Optimization finished                    100%|██████████| Elapsed:00:26

[2]:

automl.rankings

[2]:

	id	pipeline_class_name	score	high_variance_cv	parameters
0	2	LinearRegressionPipeline	0.488703	False	{'impute_strategy': 'mean', 'normalize': True,...
1	0	RFRegressionPipeline	0.422322	False	{'n_estimators': 569, 'max_depth': 22, 'impute...
2	3	RFRegressionPipeline	0.383134	False	{'n_estimators': 609, 'max_depth': 7, 'impute_...
3	1	RFRegressionPipeline	0.381204	False	{'n_estimators': 369, 'max_depth': 10, 'impute...
4	4	CatBoostRegressionPipeline	0.250449	False	{'impute_strategy': 'most_frequent', 'n_estima...

[3]:

automl.best_pipeline

[3]:

<evalml.pipelines.regression.linear_regression.LinearRegressionPipeline at 0x7fc084451cc0>

[4]:

automl.get_pipeline(0)

[4]:

<evalml.pipelines.regression.random_forest.RFRegressionPipeline at 0x7fc08558bf60>

[5]:

automl.describe_pipeline(0)

************************************************************************************************
* Random Forest Regressor w/ One Hot Encoder + Simple Imputer + RF Regressor Select From Model *
************************************************************************************************

Problem Types: Regression
Model Type: Random Forest
Objective to Optimize: R2 (greater is better)
Number of features: 8

Pipeline Steps
==============
1. One Hot Encoder
2. Simple Imputer
         * impute_strategy : most_frequent
3. RF Regressor Select From Model
         * percent_features : 0.8593661614465293
         * threshold : -inf
4. Random Forest Regressor
         * n_estimators : 569
         * max_depth : 22

Training
========
Training for Regression problems.
Total training time (including CV): 10.0 seconds

Cross Validation
----------------
               R2    MAE      MSE  MedianAE  MaxError  ExpVariance # Training # Testing
0           0.427 46.033 3276.018    39.699   161.858        0.428    294.000   148.000
1           0.450 48.953 3487.566    44.344   160.513        0.451    295.000   147.000
2           0.390 47.401 3477.117    41.297   171.420        0.390    295.000   147.000
mean        0.422 47.462 3413.567    41.780   164.597        0.423          -         -
std         0.031  1.461  119.235     2.360     5.947        0.031          -         -
coef of var 0.072  0.031    0.035     0.056     0.036        0.073          -         -