Regression Example¶

[1]:

import evalml
from evalml import AutoRegressionSearch
from evalml.demos import load_diabetes
from evalml.pipelines import PipelineBase, get_pipelines


X, y = evalml.demos.load_diabetes()

automl = AutoRegressionSearch(objective="R2", max_pipelines=5)

automl.search(X, y)

*****************************
* Beginning pipeline search *
*****************************

Optimizing for R2. Greater score is better.

Searching up to 5 pipelines.

✔ XGBoost Regression Pipeline:              20%|██        | Elapsed:00:03
✔ Cat Boost Regression Pipeline:            40%|████      | Elapsed:00:12
✔ Random Forest Regression Pipeline:        60%|██████    | Elapsed:00:15
✔ XGBoost Regression Pipeline:              80%|████████  | Elapsed:00:21
✔ XGBoost Regression Pipeline:             100%|██████████| Elapsed:00:26
✔ Optimization finished                    100%|██████████| Elapsed:00:26

[2]:

automl.rankings

[2]:

	id	pipeline_name	score	high_variance_cv	parameters
0	1	Cat Boost Regression Pipeline	0.397415	False	{'impute_strategy': 'most_frequent', 'n_estima...
1	0	XGBoost Regression Pipeline	0.245869	True	{'impute_strategy': 'most_frequent', 'percent_...
3	2	Random Forest Regression Pipeline	0.051449	True	{'impute_strategy': 'most_frequent', 'percent_...

[3]:

automl.best_pipeline

[3]:

<evalml.pipelines.regression.catboost.CatBoostRegressionPipeline at 0x7fa9afd32550>

[4]:

automl.get_pipeline(0)

[4]:

<evalml.pipelines.regression.xgboost_regression.XGBoostRegressionPipeline at 0x7fa9b055ce10>

[5]:

automl.describe_pipeline(0)

*******************************
* XGBoost Regression Pipeline *
*******************************

Problem Type: Regression
Model Family: XGBoost
Number of features: 8

Pipeline Steps
==============
1. One Hot Encoder
         * top_n : 10
2. Simple Imputer
         * impute_strategy : most_frequent
         * fill_value : None
3. RF Regressor Select From Model
         * percent_features : 0.8487792213962843
         * threshold : -inf
4. XGBoost Regressor
         * eta : 0.38438170729269994
         * max_depth : 7
         * min_child_weight : 1.5104167958569887
         * n_estimators : 397

Training
========
Training for Regression problems.
Total training time (including CV): 3.7 seconds

Cross Validation
----------------
Warning! High variance within cross validation scores. Model may not perform as estimated on unseen data.
               R2    MAE      MSE  MedianAE  MaxError  ExpVariance # Training # Testing
0           0.265 51.909 4204.782    45.175   174.089        0.266    294.000   148.000
1           0.339 50.432 4190.876    40.601   162.048        0.340    295.000   147.000
2           0.134 56.410 4935.882    47.643   206.828        0.135    295.000   147.000
mean        0.246 52.917 4443.847    44.473   180.989        0.247          -         -
std         0.104  3.114  426.172     3.573    23.174        0.104          -         -
coef of var 0.424  0.059    0.096     0.080     0.128        0.419          -         -