Regression Example

[1]:
import evalml
from evalml import AutoRegressionSearch
from evalml.demos import load_diabetes
from evalml.pipelines import PipelineBase, get_pipelines


X, y = evalml.demos.load_diabetes()

automl = AutoRegressionSearch(objective="R2", max_pipelines=5)

automl.search(X, y)
*****************************
* Beginning pipeline search *
*****************************

Optimizing for R2. Greater score is better.

Searching up to 5 pipelines.
Possible model types: random_forest, catboost, linear_model

✔ Random Forest Regressor w/ One Hot ...    20%|██        | Elapsed:00:09
✔ Random Forest Regressor w/ One Hot ...    40%|████      | Elapsed:00:16
✔ Linear Regressor w/ One Hot Encoder...    60%|██████    | Elapsed:00:16
✔ Random Forest Regressor w/ One Hot ...    80%|████████  | Elapsed:00:25
✔ CatBoost Regressor w/ Simple Imputer:    100%|██████████| Elapsed:00:26
✔ Optimization finished                    100%|██████████| Elapsed:00:26
[2]:
automl.rankings
[2]:
id pipeline_class_name score high_variance_cv parameters
0 2 LinearRegressionPipeline 0.488703 False {'impute_strategy': 'mean', 'normalize': True,...
1 0 RFRegressionPipeline 0.422322 False {'n_estimators': 569, 'max_depth': 22, 'impute...
2 3 RFRegressionPipeline 0.383134 False {'n_estimators': 609, 'max_depth': 7, 'impute_...
3 1 RFRegressionPipeline 0.381204 False {'n_estimators': 369, 'max_depth': 10, 'impute...
4 4 CatBoostRegressionPipeline 0.250449 False {'impute_strategy': 'most_frequent', 'n_estima...
[3]:
automl.best_pipeline
[3]:
<evalml.pipelines.regression.linear_regression.LinearRegressionPipeline at 0x7fc084451cc0>
[4]:
automl.get_pipeline(0)
[4]:
<evalml.pipelines.regression.random_forest.RFRegressionPipeline at 0x7fc08558bf60>
[5]:
automl.describe_pipeline(0)
************************************************************************************************
* Random Forest Regressor w/ One Hot Encoder + Simple Imputer + RF Regressor Select From Model *
************************************************************************************************

Problem Types: Regression
Model Type: Random Forest
Objective to Optimize: R2 (greater is better)
Number of features: 8

Pipeline Steps
==============
1. One Hot Encoder
2. Simple Imputer
         * impute_strategy : most_frequent
3. RF Regressor Select From Model
         * percent_features : 0.8593661614465293
         * threshold : -inf
4. Random Forest Regressor
         * n_estimators : 569
         * max_depth : 22

Training
========
Training for Regression problems.
Total training time (including CV): 10.0 seconds

Cross Validation
----------------
               R2    MAE      MSE  MedianAE  MaxError  ExpVariance # Training # Testing
0           0.427 46.033 3276.018    39.699   161.858        0.428    294.000   148.000
1           0.450 48.953 3487.566    44.344   160.513        0.451    295.000   147.000
2           0.390 47.401 3477.117    41.297   171.420        0.390    295.000   147.000
mean        0.422 47.462 3413.567    41.780   164.597        0.423          -         -
std         0.031  1.461  119.235     2.360     5.947        0.031          -         -
coef of var 0.072  0.031    0.035     0.056     0.036        0.073          -         -