Regression Example¶

[1]:

import evalml
from evalml.demos import load_diabetes
from evalml.pipelines import PipelineBase, get_pipelines


X, y = evalml.demos.load_diabetes()

clf = evalml.AutoRegressor(objective="R2", max_pipelines=5)

clf.fit(X, y)

*****************************
* Beginning pipeline search *
*****************************

Optimizing for R2. Greater score is better.

Searching up to 5 pipelines.
Possible model types: linear_model, random_forest

✔ Random Forest Regressor w/ One Hot ...     0%|          | Elapsed:00:09
✔ Random Forest Regressor w/ One Hot ...    20%|██        | Elapsed:00:14
✔ Linear Regressor w/ One Hot Encoder...    40%|████      | Elapsed:00:14
✔ Random Forest Regressor w/ One Hot ...    40%|████      | Elapsed:00:23
✔ Random Forest Regressor w/ One Hot ...    80%|████████  | Elapsed:00:33
✔ Random Forest Regressor w/ One Hot ...   100%|██████████| Elapsed:00:33

✔ Optimization finished

[2]:

clf.rankings

[2]:

	id	pipeline_name	score	high_variance_cv	parameters
0	2	LinearRegressionPipeline	0.488703	False	{'impute_strategy': 'mean', 'normalize': True,...
1	0	RFRegressionPipeline	0.422322	False	{'n_estimators': 569, 'max_depth': 22, 'impute...
2	4	RFRegressionPipeline	0.391463	False	{'n_estimators': 715, 'max_depth': 7, 'impute_...
3	3	RFRegressionPipeline	0.383134	False	{'n_estimators': 609, 'max_depth': 7, 'impute_...
4	1	RFRegressionPipeline	0.381204	False	{'n_estimators': 369, 'max_depth': 10, 'impute...

[3]:

clf.best_pipeline

[3]:

<evalml.pipelines.regression.linear_regression.LinearRegressionPipeline at 0x7fddee4d0048>

[4]:

clf.get_pipeline(0)

[4]:

<evalml.pipelines.regression.random_forest.RFRegressionPipeline at 0x7fddee4e8898>

[5]:

clf.describe_pipeline(0)

************************************************************************************************
* Random Forest Regressor w/ One Hot Encoder + Simple Imputer + RF Regressor Select From Model *
************************************************************************************************

Problem Types: Regression
Model Type: Random Forest
Objective to Optimize: R2 (greater is better)
Number of features: 8

Pipeline Steps
==============
1. One Hot Encoder
2. Simple Imputer
         * impute_strategy : most_frequent
3. RF Regressor Select From Model
         * percent_features : 0.8593661614465293
         * threshold : -inf
4. Random Forest Regressor
         * n_estimators : 569
         * max_depth : 22

Training
========
Training for Regression problems.
Total training time (including CV): 9.1 seconds

Cross Validation
----------------
               R2    MAE      MSE  MedianAE  MaxError  ExpVariance # Training # Testing
0           0.427 46.033 3276.018    39.699   161.858        0.428    294.000   148.000
1           0.450 48.953 3487.566    44.344   160.513        0.451    295.000   147.000
2           0.390 47.401 3477.117    41.297   171.420        0.390    295.000   147.000
mean        0.422 47.462 3413.567    41.780   164.597        0.423          -         -
std         0.031  1.461  119.235     2.360     5.947        0.031          -         -
coef of var 0.072  0.031    0.035     0.056     0.036        0.073          -         -