Regression Example¶
[1]:
import evalml
from evalml import AutoRegressionSearch
from evalml.demos import load_diabetes
from evalml.pipelines import PipelineBase, get_pipelines
X, y = evalml.demos.load_diabetes()
automl = AutoRegressionSearch(objective="R2", max_pipelines=5)
automl.search(X, y)
*****************************
* Beginning pipeline search *
*****************************
Optimizing for R2.
Greater score is better.
Searching up to 5 pipelines.
Allowed model families: xgboost, linear_model, random_forest, catboost
✔ Mean Baseline Regression Pipeline: 0%| | Elapsed:00:00
✔ Cat Boost Regression Pipeline: 20%|██ | Elapsed:00:03
✔ Linear Regression Pipeline: 40%|████ | Elapsed:00:03
✔ Random Forest Regression Pipeline: 60%|██████ | Elapsed:00:04
✔ XGBoost Regression Pipeline: 80%|████████ | Elapsed:00:04
✔ Optimization finished 80%|████████ | Elapsed:00:04
[2]:
automl.rankings
[2]:
id | pipeline_name | score | high_variance_cv | parameters | |
---|---|---|---|---|---|
0 | 2 | Linear Regression Pipeline | 0.488703 | False | {'One Hot Encoder': {'top_n': 10}, 'Simple Imp... |
1 | 3 | Random Forest Regression Pipeline | 0.447924 | False | {'One Hot Encoder': {'top_n': 10}, 'Simple Imp... |
2 | 1 | Cat Boost Regression Pipeline | 0.446477 | False | {'Simple Imputer': {'impute_strategy': 'most_f... |
3 | 4 | XGBoost Regression Pipeline | 0.331082 | False | {'One Hot Encoder': {'top_n': 10}, 'Simple Imp... |
4 | 0 | Mean Baseline Regression Pipeline | -0.004217 | False | {'strategy': 'mean'} |
[3]:
automl.best_pipeline
[3]:
<evalml.pipelines.regression.linear_regression.LinearRegressionPipeline at 0x7f024ddd3a58>
[4]:
automl.get_pipeline(0)
[4]:
<evalml.pipelines.regression.baseline_regression.MeanBaselineRegressionPipeline at 0x7f024ddd3978>
[5]:
automl.describe_pipeline(0)
*************************************
* Mean Baseline Regression Pipeline *
*************************************
Problem Type: Regression
Model Family: Baseline
Number of features: 10
Pipeline Steps
==============
1. Baseline Regressor
* strategy : mean
Training
========
Training for Regression problems.
Total training time (including CV): 0.0 seconds
Cross Validation
----------------
R2 Root Mean Squared Error MAE MSE MedianAE MaxError ExpVariance # Training # Testing
0 -0.007 75.863 63.324 5755.216 57.190 186.810 -0.000 294.000 148.000
1 -0.000 79.654 68.759 6344.747 67.966 193.966 0.000 295.000 147.000
2 -0.006 75.705 65.485 5731.187 63.817 170.817 -0.000 295.000 147.000
mean -0.004 77.074 65.856 5943.717 62.991 183.864 -0.000 - -
std 0.004 2.236 2.736 347.510 5.435 11.852 0.000 - -
coef of var -0.866 0.029 0.042 0.058 0.086 0.064 -0.866 - -