Regression Example¶
[1]:
import evalml
from evalml import AutoRegressionSearch
from evalml.demos import load_diabetes
from evalml.pipelines import PipelineBase, get_pipelines
X, y = evalml.demos.load_diabetes()
automl = AutoRegressionSearch(objective="R2", max_pipelines=5)
automl.search(X, y)
*****************************
* Beginning pipeline search *
*****************************
Optimizing for R2. Greater score is better.
Searching up to 5 pipelines.
✔ XGBoost Regression Pipeline: 20%|██ | Elapsed:00:03
✔ Cat Boost Regression Pipeline: 40%|████ | Elapsed:00:12
✔ Random Forest Regression Pipeline: 60%|██████ | Elapsed:00:15
✔ XGBoost Regression Pipeline: 80%|████████ | Elapsed:00:21
✔ XGBoost Regression Pipeline: 100%|██████████| Elapsed:00:26
✔ Optimization finished 100%|██████████| Elapsed:00:26
[2]:
automl.rankings
[2]:
id | pipeline_name | score | high_variance_cv | parameters | |
---|---|---|---|---|---|
0 | 1 | Cat Boost Regression Pipeline | 0.397415 | False | {'impute_strategy': 'most_frequent', 'n_estima... |
1 | 0 | XGBoost Regression Pipeline | 0.245869 | True | {'impute_strategy': 'most_frequent', 'percent_... |
3 | 2 | Random Forest Regression Pipeline | 0.051449 | True | {'impute_strategy': 'most_frequent', 'percent_... |
[3]:
automl.best_pipeline
[3]:
<evalml.pipelines.regression.catboost.CatBoostRegressionPipeline at 0x7fa9afd32550>
[4]:
automl.get_pipeline(0)
[4]:
<evalml.pipelines.regression.xgboost_regression.XGBoostRegressionPipeline at 0x7fa9b055ce10>
[5]:
automl.describe_pipeline(0)
*******************************
* XGBoost Regression Pipeline *
*******************************
Problem Type: Regression
Model Family: XGBoost
Number of features: 8
Pipeline Steps
==============
1. One Hot Encoder
* top_n : 10
2. Simple Imputer
* impute_strategy : most_frequent
* fill_value : None
3. RF Regressor Select From Model
* percent_features : 0.8487792213962843
* threshold : -inf
4. XGBoost Regressor
* eta : 0.38438170729269994
* max_depth : 7
* min_child_weight : 1.5104167958569887
* n_estimators : 397
Training
========
Training for Regression problems.
Total training time (including CV): 3.7 seconds
Cross Validation
----------------
Warning! High variance within cross validation scores. Model may not perform as estimated on unseen data.
R2 MAE MSE MedianAE MaxError ExpVariance # Training # Testing
0 0.265 51.909 4204.782 45.175 174.089 0.266 294.000 148.000
1 0.339 50.432 4190.876 40.601 162.048 0.340 295.000 147.000
2 0.134 56.410 4935.882 47.643 206.828 0.135 295.000 147.000
mean 0.246 52.917 4443.847 44.473 180.989 0.247 - -
std 0.104 3.114 426.172 3.573 23.174 0.104 - -
coef of var 0.424 0.059 0.096 0.080 0.128 0.419 - -