Regression Example

[1]:
import evalml
from evalml.demos import load_diabetes
from evalml.pipelines import PipelineBase, get_pipelines


X, y = evalml.demos.load_diabetes()

clf = evalml.AutoRegressor(objective="R2", max_pipelines=5)

clf.fit(X, y)
*****************************
* Beginning pipeline search *
*****************************

Optimizing for R2. Greater score is better.

Searching up to 5 pipelines.
Possible model types: linear_model, random_forest

✔ Random Forest Regressor w/ One Hot ...     0%|          | Elapsed:00:05
✔ Random Forest Regressor w/ One Hot ...    20%|██        | Elapsed:00:09
✔ Linear Regressor w/ One Hot Encoder...    40%|████      | Elapsed:00:09
✔ Random Forest Regressor w/ One Hot ...    40%|████      | Elapsed:00:15
✔ Random Forest Regressor w/ One Hot ...    80%|████████  | Elapsed:00:21
✔ Random Forest Regressor w/ One Hot ...   100%|██████████| Elapsed:00:21

✔ Optimization finished
[2]:
clf.rankings
[2]:
id pipeline_name score high_variance_cv parameters
0 2 LinearRegressionPipeline 0.488703 False {'impute_strategy': 'mean', 'normalize': True,...
1 0 RFRegressionPipeline 0.422322 False {'n_estimators': 569, 'max_depth': 22, 'impute...
2 4 RFRegressionPipeline 0.391463 False {'n_estimators': 715, 'max_depth': 7, 'impute_...
3 3 RFRegressionPipeline 0.383134 False {'n_estimators': 609, 'max_depth': 7, 'impute_...
4 1 RFRegressionPipeline 0.381204 False {'n_estimators': 369, 'max_depth': 10, 'impute...
[3]:
clf.best_pipeline
[3]:
<evalml.pipelines.regression.linear_regression.LinearRegressionPipeline at 0x1308f16d0>
[4]:
clf.get_pipeline(0)
[4]:
<evalml.pipelines.regression.random_forest.RFRegressionPipeline at 0x12d737610>
[5]:
clf.describe_pipeline(0)
************************************************************************************************
* Random Forest Regressor w/ One Hot Encoder + Simple Imputer + RF Regressor Select From Model *
************************************************************************************************

Problem Types: Regression
Model Type: Random Forest
Objective to Optimize: R2 (greater is better)
Number of features: 8

Pipeline Steps
==============
1. One Hot Encoder
2. Simple Imputer
         * impute_strategy : most_frequent
3. RF Regressor Select From Model
         * percent_features : 0.8593661614465293
         * threshold : -inf
4. Random Forest Regressor
         * n_estimators : 569
         * max_depth : 22

Training
========
Training for Regression problems.
Total training time (including CV): 5.6 seconds

Cross Validation
----------------
               R2    MAE      MSE  MSLE  MedianAE  MaxError  ExpVariance # Training # Testing
0           0.427 46.033 3276.018 0.194    39.699   161.858        0.428    294.000   148.000
1           0.450 48.953 3487.566 0.193    44.344   160.513        0.451    295.000   147.000
2           0.390 47.401 3477.117 0.193    41.297   171.420        0.390    295.000   147.000
mean        0.422 47.462 3413.567 0.193    41.780   164.597        0.423          -         -
std         0.031  1.461  119.235 0.000     2.360     5.947        0.031          -         -
coef of var 0.072  0.031    0.035 0.002     0.056     0.036        0.073          -         -