Regression Example¶
[1]:
import evalml
from evalml.demos import load_diabetes
from evalml.pipelines import PipelineBase, get_pipelines
X, y = evalml.demos.load_diabetes()
clf = evalml.AutoRegressor(objective="R2", max_pipelines=5)
clf.fit(X, y)
*****************************
* Beginning pipeline search *
*****************************
Optimizing for R2. Greater score is better.
Searching up to 5 pipelines.
Possible model types: linear_model, random_forest
✔ Random Forest Regressor w/ One Hot ... 0%| | Elapsed:00:05
✔ Random Forest Regressor w/ One Hot ... 20%|██ | Elapsed:00:09
✔ Linear Regressor w/ One Hot Encoder... 40%|████ | Elapsed:00:09
✔ Random Forest Regressor w/ One Hot ... 40%|████ | Elapsed:00:15
✔ Random Forest Regressor w/ One Hot ... 80%|████████ | Elapsed:00:21
✔ Random Forest Regressor w/ One Hot ... 100%|██████████| Elapsed:00:21
✔ Optimization finished
[2]:
clf.rankings
[2]:
id | pipeline_name | score | high_variance_cv | parameters | |
---|---|---|---|---|---|
0 | 2 | LinearRegressionPipeline | 0.488703 | False | {'impute_strategy': 'mean', 'normalize': True,... |
1 | 0 | RFRegressionPipeline | 0.422322 | False | {'n_estimators': 569, 'max_depth': 22, 'impute... |
2 | 4 | RFRegressionPipeline | 0.391463 | False | {'n_estimators': 715, 'max_depth': 7, 'impute_... |
3 | 3 | RFRegressionPipeline | 0.383134 | False | {'n_estimators': 609, 'max_depth': 7, 'impute_... |
4 | 1 | RFRegressionPipeline | 0.381204 | False | {'n_estimators': 369, 'max_depth': 10, 'impute... |
[3]:
clf.best_pipeline
[3]:
<evalml.pipelines.regression.linear_regression.LinearRegressionPipeline at 0x1308f16d0>
[4]:
clf.get_pipeline(0)
[4]:
<evalml.pipelines.regression.random_forest.RFRegressionPipeline at 0x12d737610>
[5]:
clf.describe_pipeline(0)
************************************************************************************************
* Random Forest Regressor w/ One Hot Encoder + Simple Imputer + RF Regressor Select From Model *
************************************************************************************************
Problem Types: Regression
Model Type: Random Forest
Objective to Optimize: R2 (greater is better)
Number of features: 8
Pipeline Steps
==============
1. One Hot Encoder
2. Simple Imputer
* impute_strategy : most_frequent
3. RF Regressor Select From Model
* percent_features : 0.8593661614465293
* threshold : -inf
4. Random Forest Regressor
* n_estimators : 569
* max_depth : 22
Training
========
Training for Regression problems.
Total training time (including CV): 5.6 seconds
Cross Validation
----------------
R2 MAE MSE MSLE MedianAE MaxError ExpVariance # Training # Testing
0 0.427 46.033 3276.018 0.194 39.699 161.858 0.428 294.000 148.000
1 0.450 48.953 3487.566 0.193 44.344 160.513 0.451 295.000 147.000
2 0.390 47.401 3477.117 0.193 41.297 171.420 0.390 295.000 147.000
mean 0.422 47.462 3413.567 0.193 41.780 164.597 0.423 - -
std 0.031 1.461 119.235 0.000 2.360 5.947 0.031 - -
coef of var 0.072 0.031 0.035 0.002 0.056 0.036 0.073 - -