Regression Example¶
[1]:
import evalml
from evalml import AutoRegressionSearch
from evalml.demos import load_diabetes
from evalml.pipelines import PipelineBase, get_pipelines
X, y = evalml.demos.load_diabetes()
automl = AutoRegressionSearch(objective="R2", max_pipelines=5)
automl.search(X, y)
*****************************
* Beginning pipeline search *
*****************************
Optimizing for R2. Greater score is better.
Searching up to 5 pipelines.
✔ Linear Regression Pipeline: 20%|██ | Elapsed:00:00
✔ Random Forest Regression Pipeline: 40%|████ | Elapsed:00:05
✔ Random Forest Regression Pipeline: 60%|██████ | Elapsed:00:14
✔ Random Forest Regression Pipeline: 80%|████████ | Elapsed:00:23
✔ Linear Regression Pipeline: 100%|██████████| Elapsed:00:23
✔ Optimization finished 100%|██████████| Elapsed:00:23
[2]:
automl.rankings
[2]:
id | pipeline_name | score | high_variance_cv | parameters | |
---|---|---|---|---|---|
0 | 2 | Random Forest Regression Pipeline | 0.388582 | False | {'impute_strategy': 'most_frequent', 'percent_... |
1 | 1 | Random Forest Regression Pipeline | 0.362982 | False | {'impute_strategy': 'median', 'percent_feature... |
2 | 3 | Random Forest Regression Pipeline | 0.361598 | False | {'impute_strategy': 'median', 'percent_feature... |
3 | 0 | Linear Regression Pipeline | -3.441755 | False | {'impute_strategy': 'median', 'fit_intercept':... |
4 | 4 | Linear Regression Pipeline | -3.441755 | False | {'impute_strategy': 'most_frequent', 'fit_inte... |
[3]:
automl.best_pipeline
[3]:
<evalml.pipelines.regression.random_forest.RFRegressionPipeline at 0x7f19bc3c8290>
[4]:
automl.get_pipeline(0)
[4]:
<evalml.pipelines.regression.linear_regression.LinearRegressionPipeline at 0x7f19bc3acc90>
[5]:
automl.describe_pipeline(0)
******************************
* Linear Regression Pipeline *
******************************
Supported Problem Types: Regression
Model Family: Linear Model
Objective to Optimize: R2 (greater is better)
Number of features: 10
Pipeline Steps
==============
1. One Hot Encoder
* top_n : 10
2. Simple Imputer
* impute_strategy : median
* fill_value : None
3. Standard Scaler
4. Linear Regressor
* fit_intercept : False
* normalize : True
Training
========
Training for Regression problems.
Total training time (including CV): 0.1 seconds
Cross Validation
----------------
R2 MAE MSE MedianAE MaxError ExpVariance # Training # Testing
0 -3.872 157.576 27852.855 156.885 312.634 0.471 294.000 148.000
1 -3.114 151.160 26101.724 147.264 271.512 0.487 295.000 147.000
2 -3.340 148.098 24722.766 149.071 294.176 0.510 295.000 147.000
mean -3.442 152.278 26225.782 151.073 292.774 0.490 - -
std 0.389 4.837 1568.727 5.113 20.597 0.020 - -
coef of var -0.113 0.032 0.060 0.034 0.070 0.040 - -