Regression Example

[1]:
import evalml
from evalml import AutoRegressionSearch
from evalml.demos import load_diabetes
from evalml.pipelines import PipelineBase, get_pipelines


X, y = evalml.demos.load_diabetes()

automl = AutoRegressionSearch(objective="R2", max_pipelines=5)

automl.search(X, y)
*****************************
* Beginning pipeline search *
*****************************

Optimizing for R2. Greater score is better.

Searching up to 5 pipelines.
✔ Linear Regression Pipeline:               20%|██        | Elapsed:00:00
✔ Random Forest Regression Pipeline:        40%|████      | Elapsed:00:05
✔ Random Forest Regression Pipeline:        60%|██████    | Elapsed:00:14
✔ Random Forest Regression Pipeline:        80%|████████  | Elapsed:00:23
✔ Linear Regression Pipeline:              100%|██████████| Elapsed:00:23
✔ Optimization finished                    100%|██████████| Elapsed:00:23
[2]:
automl.rankings
[2]:
id pipeline_name score high_variance_cv parameters
0 2 Random Forest Regression Pipeline 0.388582 False {'impute_strategy': 'most_frequent', 'percent_...
1 1 Random Forest Regression Pipeline 0.362982 False {'impute_strategy': 'median', 'percent_feature...
2 3 Random Forest Regression Pipeline 0.361598 False {'impute_strategy': 'median', 'percent_feature...
3 0 Linear Regression Pipeline -3.441755 False {'impute_strategy': 'median', 'fit_intercept':...
4 4 Linear Regression Pipeline -3.441755 False {'impute_strategy': 'most_frequent', 'fit_inte...
[3]:
automl.best_pipeline
[3]:
<evalml.pipelines.regression.random_forest.RFRegressionPipeline at 0x7f19bc3c8290>
[4]:
automl.get_pipeline(0)
[4]:
<evalml.pipelines.regression.linear_regression.LinearRegressionPipeline at 0x7f19bc3acc90>
[5]:
automl.describe_pipeline(0)
******************************
* Linear Regression Pipeline *
******************************

Supported Problem Types: Regression
Model Family: Linear Model
Objective to Optimize: R2 (greater is better)
Number of features: 10

Pipeline Steps
==============
1. One Hot Encoder
         * top_n : 10
2. Simple Imputer
         * impute_strategy : median
         * fill_value : None
3. Standard Scaler
4. Linear Regressor
         * fit_intercept : False
         * normalize : True

Training
========
Training for Regression problems.
Total training time (including CV): 0.1 seconds

Cross Validation
----------------
                R2     MAE       MSE  MedianAE  MaxError  ExpVariance # Training # Testing
0           -3.872 157.576 27852.855   156.885   312.634        0.471    294.000   148.000
1           -3.114 151.160 26101.724   147.264   271.512        0.487    295.000   147.000
2           -3.340 148.098 24722.766   149.071   294.176        0.510    295.000   147.000
mean        -3.442 152.278 26225.782   151.073   292.774        0.490          -         -
std          0.389   4.837  1568.727     5.113    20.597        0.020          -         -
coef of var -0.113   0.032     0.060     0.034     0.070        0.040          -         -