estimators#
EvalML estimator components.
Subpackages#
Package Contents#
Classes Summary#
Autoregressive Integrated Moving Average Model. The three parameters (p, d, q) are the AR order, the degree of differencing, and the MA order. More information here: https://www.statsmodels.org/devel/generated/statsmodels.tsa.arima.model.ARIMA.html. |
|
Classifier that predicts using the specified strategy. |
|
Baseline regressor that uses a simple strategy to make predictions. This is useful as a simple baseline regressor to compare with other regressors. |
|
CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features. |
|
CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features. |
|
Decision Tree Classifier. |
|
Decision Tree Regressor. |
|
Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator. |
|
Elastic Net Regressor. |
|
A component that fits and predicts given data. |
|
Holt-Winters Exponential Smoothing Forecaster. |
|
Extra Trees Classifier. |
|
Extra Trees Regressor. |
|
K-Nearest Neighbors Classifier. |
|
LightGBM Classifier. |
|
LightGBM Regressor. |
|
Linear Regressor. |
|
Logistic Regression Classifier. |
|
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well. |
|
Random Forest Classifier. |
|
Random Forest Regressor. |
|
Support Vector Machine Classifier. |
|
Support Vector Machine Regressor. |
|
Time series estimator that predicts using the naive forecasting approach. |
|
Vowpal Wabbit Binary Classifier. |
|
Vowpal Wabbit Multiclass Classifier. |
|
Vowpal Wabbit Regressor. |
|
XGBoost Classifier. |
|
XGBoost Regressor. |
Contents#
- class evalml.pipelines.components.estimators.ARIMARegressor(time_index: Optional[Hashable] = None, trend: Optional[str] = None, start_p: int = 2, d: int = 0, start_q: int = 2, max_p: int = 5, max_d: int = 2, max_q: int = 5, seasonal: bool = True, sp: int = 1, n_jobs: int = - 1, random_seed: Union[int, float] = 0, maxiter: int = 10, use_covariates: bool = True, **kwargs)[source]#
Autoregressive Integrated Moving Average Model. The three parameters (p, d, q) are the AR order, the degree of differencing, and the MA order. More information here: https://www.statsmodels.org/devel/generated/statsmodels.tsa.arima.model.ARIMA.html.
Currently ARIMARegressor isn’t supported via conda install. It’s recommended that it be installed via PyPI.
- Parameters
time_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.
trend (str) – Controls the deterministic trend. Options are [‘n’, ‘c’, ‘t’, ‘ct’] where ‘c’ is a constant term, ‘t’ indicates a linear trend, and ‘ct’ is both. Can also be an iterable when defining a polynomial, such as [1, 1, 0, 1].
start_p (int) – Minimum Autoregressive order. Defaults to 2.
d (int) – Minimum Differencing degree. Defaults to 0.
start_q (int) – Minimum Moving Average order. Defaults to 2.
max_p (int) – Maximum Autoregressive order. Defaults to 5.
max_d (int) – Maximum Differencing degree. Defaults to 2.
max_q (int) – Maximum Moving Average order. Defaults to 5.
seasonal (boolean) – Whether to fit a seasonal model to ARIMA. Defaults to True.
sp (int or str) – Period for seasonal differencing, specifically the number of periods in each season. If “detect”, this model will automatically detect this parameter (given the time series is a standard frequency) and will fall back to 1 (no seasonality) if it cannot be detected. Defaults to 1.
n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “start_p”: Integer(1, 3), “d”: Integer(0, 2), “start_q”: Integer(1, 3), “max_p”: Integer(3, 10), “max_d”: Integer(2, 5), “max_q”: Integer(3, 10), “seasonal”: [True, False],}
max_cols
7
max_rows
1000
model_family
ModelFamily.ARIMA
modifies_features
True
modifies_target
False
name
ARIMA Regressor
supported_problem_types
[ProblemTypes.TIME_SERIES_REGRESSION]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns array of 0's with a length of 1 as feature_importance is not defined for ARIMA regressor.
Fits ARIMA regressor to data.
Find the prediction intervals using the fitted ARIMARegressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using fitted ARIMA regressor.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self) numpy.ndarray #
Returns array of 0’s with a length of 1 as feature_importance is not defined for ARIMA regressor.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)[source]#
Fits ARIMA regressor to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
- Raises
ValueError – If y was not passed in.
- get_prediction_intervals(self, X: pandas.DataFrame, y: pandas.Series = None, coverage: List[float] = None) Dict[str, pandas.Series] [source]#
Find the prediction intervals using the fitted ARIMARegressor.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Optional.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) pandas.Series [source]#
Make predictions using fitted ARIMA regressor.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data.
- Returns
Predicted values.
- Return type
pd.Series
- Raises
ValueError – If X was passed to fit but not passed in predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.BaselineClassifier(strategy='mode', random_seed=0, **kwargs)[source]#
Classifier that predicts using the specified strategy.
This is useful as a simple baseline classifier to compare with other classifiers.
- Parameters
strategy (str) – Method used to predict. Valid options are “mode”, “random” and “random_weighted”. Defaults to “mode”.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.BASELINE
modifies_features
True
modifies_target
False
name
Baseline Classifier
supported_problem_types
[ProblemTypes.BINARY, ProblemTypes.MULTICLASS]
training_only
False
Methods
Returns class labels. Will return None before fitting.
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature. Since baseline classifiers do not use input features to calculate predictions, returns an array of zeroes.
Fits baseline classifier component to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using the baseline classification strategy.
Make prediction probabilities using the baseline classification strategy.
Saves component at file path.
Updates the parameter dictionary of the component.
- property classes_(self)#
Returns class labels. Will return None before fitting.
- Returns
Class names
- Return type
list[str] or list(float)
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Returns importance associated with each feature. Since baseline classifiers do not use input features to calculate predictions, returns an array of zeroes.
- Returns
An array of zeroes
- Return type
pd.Series
- fit(self, X, y=None)[source]#
Fits baseline classifier component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
- Raises
ValueError – If y is None.
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X)[source]#
Make predictions using the baseline classification strategy.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- predict_proba(self, X)[source]#
Make prediction probabilities using the baseline classification strategy.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted probability values.
- Return type
pd.DataFrame
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.BaselineRegressor(strategy='mean', random_seed=0, **kwargs)[source]#
Baseline regressor that uses a simple strategy to make predictions. This is useful as a simple baseline regressor to compare with other regressors.
- Parameters
strategy (str) – Method used to predict. Valid options are “mean”, “median”. Defaults to “mean”.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.BASELINE
modifies_features
True
modifies_target
False
name
Baseline Regressor
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature. Since baseline regressors do not use input features to calculate predictions, returns an array of zeroes.
Fits baseline regression component to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using the baseline regression strategy.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Returns importance associated with each feature. Since baseline regressors do not use input features to calculate predictions, returns an array of zeroes.
- Returns
An array of zeroes.
- Return type
np.ndarray (float)
- fit(self, X, y=None)[source]#
Fits baseline regression component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
- Raises
ValueError – If input y is None.
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X)[source]#
Make predictions using the baseline regression strategy.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.CatBoostClassifier(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=True, allow_writing_files=False, random_seed=0, n_jobs=- 1, **kwargs)[source]#
CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.
For more information, check out https://catboost.ai/
- Parameters
n_estimators (float) – The maximum number of trees to build. Defaults to 10.
eta (float) – The learning rate. Defaults to 0.03.
max_depth (int) – The maximum tree depth for base learners. Defaults to 6.
bootstrap_type (string) – Defines the method for sampling the weights of objects. Available methods are ‘Bayesian’, ‘Bernoulli’, ‘MVS’. Defaults to None.
silent (boolean) – Whether to use the “silent” logging mode. Defaults to True.
allow_writing_files (boolean) – Whether to allow writing snapshot files while training. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(4, 100), “eta”: Real(0.000001, 1), “max_depth”: Integer(4, 10),}
model_family
ModelFamily.CATBOOST
modifies_features
True
modifies_target
False
name
CatBoost Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance of fitted CatBoost classifier.
Fits CatBoost classifier component to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using the fitted CatBoost classifier.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Feature importance of fitted CatBoost classifier.
- fit(self, X, y=None)[source]#
Fits CatBoost classifier component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X)[source]#
Make predictions using the fitted CatBoost classifier.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.DataFrame
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.CatBoostRegressor(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=False, allow_writing_files=False, random_seed=0, n_jobs=- 1, **kwargs)[source]#
CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.
For more information, check out https://catboost.ai/
- Parameters
n_estimators (float) – The maximum number of trees to build. Defaults to 10.
eta (float) – The learning rate. Defaults to 0.03.
max_depth (int) – The maximum tree depth for base learners. Defaults to 6.
bootstrap_type (string) – Defines the method for sampling the weights of objects. Available methods are ‘Bayesian’, ‘Bernoulli’, ‘MVS’. Defaults to None.
silent (boolean) – Whether to use the “silent” logging mode. Defaults to True.
allow_writing_files (boolean) – Whether to allow writing snapshot files while training. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(4, 100), “eta”: Real(0.000001, 1), “max_depth”: Integer(4, 10),}
model_family
ModelFamily.CATBOOST
modifies_features
True
modifies_target
False
name
CatBoost Regressor
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance of fitted CatBoost regressor.
Fits CatBoost regressor component to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Feature importance of fitted CatBoost regressor.
- fit(self, X, y=None)[source]#
Fits CatBoost regressor component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.DecisionTreeClassifier(criterion='gini', max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs)[source]#
Decision Tree Classifier.
- Parameters
criterion ({"gini", "entropy"}) – The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. Defaults to “gini”.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Defaults to 2.
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “criterion”: [“gini”, “entropy”], “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.DECISION_TREE
modifies_features
True
modifies_target
False
name
Decision Tree Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self) pandas.Series #
Returns importance associated with each feature.
- Returns
Importance associated with each feature.
- Return type
np.ndarray
- Raises
MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.DecisionTreeRegressor(criterion='mse', max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs)[source]#
Decision Tree Regressor.
- Parameters
criterion ({"mse", "friedman_mse", "mae", "poisson"}) –
The function to measure the quality of a split. Supported criteria are:
”mse” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node
”friedman_mse”, which uses mean squared error with Friedman”s improvement score for potential splits
”mae” for the mean absolute error, which minimizes the L1 loss using the median of each terminal node,
”poisson” which uses reduction in Poisson deviance to find splits.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Defaults to 2.
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “criterion”: [“mse”, “friedman_mse”, “mae”], “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.DECISION_TREE
modifies_features
True
modifies_target
False
name
Decision Tree Regressor
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self) pandas.Series #
Returns importance associated with each feature.
- Returns
Importance associated with each feature.
- Return type
np.ndarray
- Raises
MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.ElasticNetClassifier(penalty='elasticnet', C=1.0, l1_ratio=0.15, multi_class='auto', solver='saga', n_jobs=- 1, random_seed=0, **kwargs)[source]#
Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator.
- Parameters
penalty ({"l1", "l2", "elasticnet", "none"}) – The norm used in penalization. Defaults to “elasticnet”.
C (float) – Inverse of regularization strength. Must be a positive float. Defaults to 1.0.
l1_ratio (float) – The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’. Setting l1_ratio=0 is equivalent to using penalty=’l2’, while setting l1_ratio=1 is equivalent to using penalty=’l1’. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15.
multi_class ({"auto", "ovr", "multinomial"}) – If the option chosen is “ovr”, then a binary problem is fit for each label. For “multinomial” the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. “multinomial” is unavailable when solver=”liblinear”. “auto” selects “ovr” if the data is binary, or if solver=”liblinear”, and otherwise selects “multinomial”. Defaults to “auto”.
solver ({"newton-cg", "lbfgs", "liblinear", "sag", "saga"}) –
Algorithm to use in the optimization problem. For small datasets, “liblinear” is a good choice, whereas “sag” and “saga” are faster for large ones. For multiclass problems, only “newton-cg”, “sag”, “saga” and “lbfgs” handle multinomial loss; “liblinear” is limited to one-versus-rest schemes.
”newton-cg”, “lbfgs”, “sag” and “saga” handle L2 or no penalty
”liblinear” and “saga” also handle L1 penalty
”saga” also supports “elasticnet” penalty
”liblinear” does not support setting penalty=’none’
Defaults to “saga”.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “C”: Real(0.01, 10), “l1_ratio”: Real(0, 1)}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Elastic Net Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance for fitted ElasticNet classifier.
Fits ElasticNet classifier component to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Feature importance for fitted ElasticNet classifier.
- fit(self, X, y)[source]#
Fits ElasticNet classifier component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.ElasticNetRegressor(alpha=0.0001, l1_ratio=0.15, max_iter=1000, normalize=False, random_seed=0, **kwargs)[source]#
Elastic Net Regressor.
- Parameters
alpha (float) – Constant that multiplies the penalty terms. Defaults to 0.0001.
l1_ratio (float) – The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’. Setting l1_ratio=0 is equivalent to using penalty=’l2’, while setting l1_ratio=1 is equivalent to using penalty=’l1’. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15.
max_iter (int) – The maximum number of iterations. Defaults to 1000.
normalize (boolean) – If True, the regressors will be normalized before regression by subtracting the mean and dividing by the l2-norm. Defaults to False.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “alpha”: Real(0, 1), “l1_ratio”: Real(0, 1),}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Elastic Net Regressor
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance for fitted ElasticNet regressor.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Feature importance for fitted ElasticNet regressor.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.Estimator(parameters: dict = None, component_obj: Type[evalml.pipelines.components.ComponentBase] = None, random_seed: Union[int, float] = 0, **kwargs)[source]#
A component that fits and predicts given data.
To implement a new Estimator, define your own class which is a subclass of Estimator, including a name and a list of acceptable ranges for any parameters to be tuned during the automl search (hyperparameters). Define an __init__ method which sets up any necessary state and objects. Make sure your __init__ only uses standard keyword arguments and calls super().__init__() with a parameters dict. You may also override the fit, transform, fit_transform and other methods in this class if appropriate.
To see some examples, check out the definitions of any Estimator component subclass.
- Parameters
parameters (dict) – Dictionary of parameters for the component. Defaults to None.
component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
ModelFamily.NONE
Returns string name of this component.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Problem types this estimator supports.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self) pandas.Series #
Returns importance associated with each feature.
- Returns
Importance associated with each feature.
- Return type
np.ndarray
- Raises
MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)[source]#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] [source]#
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- property model_family(cls)#
Returns ModelFamily of this component.
- property name(cls)#
Returns string name of this component.
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series [source]#
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series [source]#
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- property supported_problem_types(cls)#
Problem types this estimator supports.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.ExponentialSmoothingRegressor(trend: Optional[str] = None, damped_trend: bool = False, seasonal: Optional[str] = None, sp: int = 2, n_jobs: int = - 1, random_seed: Union[int, float] = 0, **kwargs)[source]#
Holt-Winters Exponential Smoothing Forecaster.
Currently ExponentialSmoothingRegressor isn’t supported via conda install. It’s recommended that it be installed via PyPI.
- Parameters
trend (str) – Type of trend component. Defaults to None.
damped_trend (bool) – If the trend component should be damped. Defaults to False.
seasonal (str) – Type of seasonal component. Takes one of {“additive”, None}. Can also be multiplicative if
0 (none of the target data is) –
None. (but AutoMLSearch wiill not tune for this. Defaults to) –
sp (int) – The number of seasonal periods to consider. Defaults to 2.
n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “trend”: [None, “additive”], “damped_trend”: [True, False], “seasonal”: [None, “additive”], “sp”: Integer(2, 8),}
model_family
ModelFamily.EXPONENTIAL_SMOOTHING
modifies_features
True
modifies_target
False
name
Exponential Smoothing Regressor
supported_problem_types
[ProblemTypes.TIME_SERIES_REGRESSION]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns array of 0's with a length of 1 as feature_importance is not defined for Exponential Smoothing regressor.
Fits Exponential Smoothing Regressor to data.
Find the prediction intervals using the fitted ExponentialSmoothingRegressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using fitted Exponential Smoothing regressor.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self) pandas.Series #
Returns array of 0’s with a length of 1 as feature_importance is not defined for Exponential Smoothing regressor.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)[source]#
Fits Exponential Smoothing Regressor to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features]. Ignored.
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
- Raises
ValueError – If y was not passed in.
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] [source]#
Find the prediction intervals using the fitted ExponentialSmoothingRegressor.
Calculates the prediction intervals by using a simulation of the time series following a specified state space model.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Optional.
coverage (List[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) pandas.Series [source]#
Make predictions using fitted Exponential Smoothing regressor.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features]. Ignored except to set forecast horizon.
y (pd.Series) – Target data.
- Returns
Predicted values.
- Return type
pd.Series
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.ExtraTreesClassifier(n_estimators=100, max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, n_jobs=- 1, random_seed=0, **kwargs)[source]#
Extra Trees Classifier.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
2. (Defaults to) –
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.EXTRA_TREES
modifies_features
True
modifies_target
False
name
Extra Trees Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self) pandas.Series #
Returns importance associated with each feature.
- Returns
Importance associated with each feature.
- Return type
np.ndarray
- Raises
MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.ExtraTreesRegressor(n_estimators: int = 100, max_features: str = 'auto', max_depth: int = 6, min_samples_split: int = 2, min_weight_fraction_leaf: float = 0.0, n_jobs: int = - 1, random_seed: Union[int, float] = 0, **kwargs)[source]#
Extra Trees Regressor.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
2. (Defaults to) –
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.EXTRA_TREES
modifies_features
True
modifies_target
False
name
Extra Trees Regressor
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits estimator to data.
Find the prediction intervals using the fitted ExtraTreesRegressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self) pandas.Series #
Returns importance associated with each feature.
- Returns
Importance associated with each feature.
- Return type
np.ndarray
- Raises
MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] [source]#
Find the prediction intervals using the fitted ExtraTreesRegressor.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Optional.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, random_seed=0, **kwargs)[source]#
K-Nearest Neighbors Classifier.
- Parameters
n_neighbors (int) – Number of neighbors to use by default. Defaults to 5.
weights ({‘uniform’, ‘distance’} or callable) –
Weight function used in prediction. Can be:
‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
Defaults to “uniform”.
algorithm ({‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}) –
Algorithm used to compute the nearest neighbors:
‘ball_tree’ will use BallTree
‘kd_tree’ will use KDTree
‘brute’ will use a brute-force search.
‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method. Defaults to “auto”. Note: fitting on sparse input will override the setting of this parameter, using brute force.
leaf_size (int) – Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. Defaults to 30.
p (int) – Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. Defaults to 2.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_neighbors”: Integer(2, 12), “weights”: [“uniform”, “distance”], “algorithm”: [“auto”, “ball_tree”, “kd_tree”, “brute”], “leaf_size”: Integer(10, 30), “p”: Integer(1, 5),}
model_family
ModelFamily.K_NEIGHBORS
modifies_features
True
modifies_target
False
name
KNN Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns array of 0's matching the input number of features as feature_importance is not defined for KNN classifiers.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Returns array of 0’s matching the input number of features as feature_importance is not defined for KNN classifiers.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.LightGBMClassifier(boosting_type='gbdt', learning_rate=0.1, n_estimators=100, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=- 1, random_seed=0, **kwargs)[source]#
LightGBM Classifier.
- Parameters
boosting_type (string) – Type of boosting to use. Defaults to “gbdt”. - ‘gbdt’ uses traditional Gradient Boosting Decision Tree - “dart”, uses Dropouts meet Multiple Additive Regression Trees - “goss”, uses Gradient-based One-Side Sampling - “rf”, uses Random Forest
learning_rate (float) – Boosting learning rate. Defaults to 0.1.
n_estimators (int) – Number of boosted trees to fit. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners, <=0 means no limit. Defaults to 0.
num_leaves (int) – Maximum tree leaves for base learners. Defaults to 31.
min_child_samples (int) – Minimum number of data needed in a child (leaf). Defaults to 20.
bagging_fraction (float) – LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9.
bagging_freq (int) – Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0.
n_jobs (int or None) – Number of threads to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “learning_rate”: Real(0.000001, 1), “boosting_type”: [“gbdt”, “dart”, “goss”, “rf”], “n_estimators”: Integer(10, 100), “max_depth”: Integer(0, 10), “num_leaves”: Integer(2, 100), “min_child_samples”: Integer(1, 100), “bagging_fraction”: Real(0.000001, 1), “bagging_freq”: Integer(0, 1),}
model_family
ModelFamily.LIGHTGBM
modifies_features
True
modifies_target
False
name
LightGBM Classifier
SEED_MAX
SEED_BOUNDS.max_bound
SEED_MIN
0
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits LightGBM classifier component to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using the fitted LightGBM classifier.
Make prediction probabilities using the fitted LightGBM classifier.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self) pandas.Series #
Returns importance associated with each feature.
- Returns
Importance associated with each feature.
- Return type
np.ndarray
- Raises
MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.
- fit(self, X, y=None)[source]#
Fits LightGBM classifier component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X)[source]#
Make predictions using the fitted LightGBM classifier.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.DataFrame
- predict_proba(self, X)[source]#
Make prediction probabilities using the fitted LightGBM classifier.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted probability values.
- Return type
pd.DataFrame
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.LightGBMRegressor(boosting_type='gbdt', learning_rate=0.1, n_estimators=20, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=- 1, random_seed=0, **kwargs)[source]#
LightGBM Regressor.
- Parameters
boosting_type (string) – Type of boosting to use. Defaults to “gbdt”. - ‘gbdt’ uses traditional Gradient Boosting Decision Tree - “dart”, uses Dropouts meet Multiple Additive Regression Trees - “goss”, uses Gradient-based One-Side Sampling - “rf”, uses Random Forest
learning_rate (float) – Boosting learning rate. Defaults to 0.1.
n_estimators (int) – Number of boosted trees to fit. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners, <=0 means no limit. Defaults to 0.
num_leaves (int) – Maximum tree leaves for base learners. Defaults to 31.
min_child_samples (int) – Minimum number of data needed in a child (leaf). Defaults to 20.
bagging_fraction (float) – LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9.
bagging_freq (int) – Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0.
n_jobs (int or None) – Number of threads to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “learning_rate”: Real(0.000001, 1), “boosting_type”: [“gbdt”, “dart”, “goss”, “rf”], “n_estimators”: Integer(10, 100), “max_depth”: Integer(0, 10), “num_leaves”: Integer(2, 100), “min_child_samples”: Integer(1, 100), “bagging_fraction”: Real(0.000001, 1), “bagging_freq”: Integer(0, 1),}
model_family
ModelFamily.LIGHTGBM
modifies_features
True
modifies_target
False
name
LightGBM Regressor
SEED_MAX
SEED_BOUNDS.max_bound
SEED_MIN
0
supported_problem_types
[ProblemTypes.REGRESSION]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits LightGBM regressor to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using fitted LightGBM regressor.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self) pandas.Series #
Returns importance associated with each feature.
- Returns
Importance associated with each feature.
- Return type
np.ndarray
- Raises
MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.
- fit(self, X, y=None)[source]#
Fits LightGBM regressor to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X)[source]#
Make predictions using fitted LightGBM regressor.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.LinearRegressor(fit_intercept=True, normalize=False, n_jobs=- 1, random_seed=0, **kwargs)[source]#
Linear Regressor.
- Parameters
fit_intercept (boolean) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered). Defaults to True.
normalize (boolean) – If True, the regressors will be normalized before regression by subtracting the mean and dividing by the l2-norm. This parameter is ignored when fit_intercept is set to False. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “fit_intercept”: [True, False], “normalize”: [True, False]}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Linear Regressor
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance for fitted linear regressor.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Feature importance for fitted linear regressor.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.LogisticRegressionClassifier(penalty='l2', C=1.0, multi_class='auto', solver='lbfgs', n_jobs=- 1, random_seed=0, **kwargs)[source]#
Logistic Regression Classifier.
- Parameters
penalty ({"l1", "l2", "elasticnet", "none"}) – The norm used in penalization. Defaults to “l2”.
C (float) – Inverse of regularization strength. Must be a positive float. Defaults to 1.0.
multi_class ({"auto", "ovr", "multinomial"}) – If the option chosen is “ovr”, then a binary problem is fit for each label. For “multinomial” the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. “multinomial” is unavailable when solver=”liblinear”. “auto” selects “ovr” if the data is binary, or if solver=”liblinear”, and otherwise selects “multinomial”. Defaults to “auto”.
solver ({"newton-cg", "lbfgs", "liblinear", "sag", "saga"}) –
Algorithm to use in the optimization problem. For small datasets, “liblinear” is a good choice, whereas “sag” and “saga” are faster for large ones. For multiclass problems, only “newton-cg”, “sag”, “saga” and “lbfgs” handle multinomial loss; “liblinear” is limited to one-versus-rest schemes.
”newton-cg”, “lbfgs”, “sag” and “saga” handle L2 or no penalty
”liblinear” and “saga” also handle L1 penalty
”saga” also supports “elasticnet” penalty
”liblinear” does not support setting penalty=’none’
Defaults to “lbfgs”.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “penalty”: [“l2”], “C”: Real(0.01, 10),}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Logistic Regression Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance for fitted logistic regression classifier.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Feature importance for fitted logistic regression classifier.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.ProphetRegressor(time_index: Optional[Hashable] = None, changepoint_prior_scale: float = 0.05, seasonality_prior_scale: int = 10, holidays_prior_scale: int = 10, seasonality_mode: str = 'additive', stan_backend: str = 'CMDSTANPY', interval_width: float = 0.95, random_seed: Union[int, float] = 0, **kwargs)[source]#
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
More information here: https://facebook.github.io/prophet/
- Parameters
time_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.
changepoint_prior_scale (float) – Determines the strength of the sparse prior for fitting on rate changes. Increasing this value increases the flexibility of the trend. Defaults to 0.05.
seasonality_prior_scale (int) – Similar to changepoint_prior_scale. Adjusts the extent to which the seasonality model will fit the data. Defaults to 10.
holidays_prior_scale (int) – Similar to changepoint_prior_scale. Adjusts the extent to which holidays will fit the data. Defaults to 10.
seasonality_mode (str) – Determines how this component fits the seasonality. Options are “additive” and “multiplicative”. Defaults to “additive”.
stan_backend (str) – Determines the backend that should be used to run Prophet. Options are “CMDSTANPY” and “PYSTAN”. Defaults to “CMDSTANPY”.
interval_width (float) – Determines the confidence of the prediction interval range when calling get_prediction_intervals. Accepts values in the range (0,1). Defaults to 0.95.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “changepoint_prior_scale”: Real(0.001, 0.5), “seasonality_prior_scale”: Real(0.01, 10), “holidays_prior_scale”: Real(0.01, 10), “seasonality_mode”: [“additive”, “multiplicative”],}
model_family
ModelFamily.PROPHET
modifies_features
True
modifies_target
False
name
Prophet Regressor
supported_problem_types
[ProblemTypes.TIME_SERIES_REGRESSION]
training_only
False
Methods
Build the Prophet data to pass fit and predict on.
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns array of 0's with len(1) as feature_importance is not defined for Prophet regressor.
Fits Prophet regressor component to data.
Get parameters for the Prophet regressor.
Find the prediction intervals using the fitted ProphetRegressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using fitted Prophet regressor.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- static build_prophet_df(X: pandas.DataFrame, y: Optional[pandas.Series] = None, time_index: str = 'ds') pandas.DataFrame [source]#
Build the Prophet data to pass fit and predict on.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls) dict #
Returns the default parameters for this component.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self) numpy.ndarray #
Returns array of 0’s with len(1) as feature_importance is not defined for Prophet regressor.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)[source]#
Fits Prophet regressor component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] [source]#
Find the prediction intervals using the fitted ProphetRegressor.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (List[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) pandas.Series [source]#
Make predictions using fitted Prophet regressor.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
- Returns
Predicted values.
- Return type
pd.Series
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.RandomForestClassifier(n_estimators=100, max_depth=6, n_jobs=- 1, random_seed=0, **kwargs)[source]#
Random Forest Classifier.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_depth”: Integer(1, 10),}
model_family
ModelFamily.RANDOM_FOREST
modifies_features
True
modifies_target
False
name
Random Forest Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self) pandas.Series #
Returns importance associated with each feature.
- Returns
Importance associated with each feature.
- Return type
np.ndarray
- Raises
MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.RandomForestRegressor(n_estimators: int = 100, max_depth: int = 6, n_jobs: int = - 1, random_seed: Union[int, float] = 0, **kwargs)[source]#
Random Forest Regressor.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_depth”: Integer(1, 32),}
model_family
ModelFamily.RANDOM_FOREST
modifies_features
True
modifies_target
False
name
Random Forest Regressor
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits estimator to data.
Find the prediction intervals using the fitted RandomForestRegressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self) pandas.Series #
Returns importance associated with each feature.
- Returns
Importance associated with each feature.
- Return type
np.ndarray
- Raises
MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] [source]#
Find the prediction intervals using the fitted RandomForestRegressor.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Optional.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.SVMClassifier(C=1.0, kernel='rbf', gamma='auto', probability=True, random_seed=0, **kwargs)[source]#
Support Vector Machine Classifier.
- Parameters
C (float) – The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0.
kernel ({"poly", "rbf", "sigmoid"}) – Specifies the kernel type to be used in the algorithm. Defaults to “rbf”.
gamma ({"scale", "auto"} or float) – Kernel coefficient for “rbf”, “poly” and “sigmoid”. Defaults to “auto”. - If gamma=’scale’ is passed then it uses 1 / (n_features * X.var()) as value of gamma - If “auto” (default), uses 1 / n_features
probability (boolean) – Whether to enable probability estimates. Defaults to True.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “C”: Real(0, 10), “kernel”: [“poly”, “rbf”, “sigmoid”], “gamma”: [“scale”, “auto”],}
model_family
ModelFamily.SVM
modifies_features
True
modifies_target
False
name
SVM Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance only works with linear kernels.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Feature importance only works with linear kernels.
If the kernel isn’t linear, we return a numpy array of zeros.
- Returns
Feature importance of fitted SVM classifier or a numpy array of zeroes if the kernel is not linear.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.SVMRegressor(C=1.0, kernel='rbf', gamma='auto', random_seed=0, **kwargs)[source]#
Support Vector Machine Regressor.
- Parameters
C (float) – The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0.
kernel ({"poly", "rbf", "sigmoid"}) – Specifies the kernel type to be used in the algorithm. Defaults to “rbf”.
gamma ({"scale", "auto"} or float) – Kernel coefficient for “rbf”, “poly” and “sigmoid”. Defaults to “auto”. - If gamma=’scale’ is passed then it uses 1 / (n_features * X.var()) as value of gamma - If “auto” (default), uses 1 / n_features
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “C”: Real(0, 10), “kernel”: [“poly”, “rbf”, “sigmoid”], “gamma”: [“scale”, “auto”],}
model_family
ModelFamily.SVM
modifies_features
True
modifies_target
False
name
SVM Regressor
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance of fitted SVM regresor.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Feature importance of fitted SVM regresor.
Only works with linear kernels. If the kernel isn’t linear, we return a numpy array of zeros.
- Returns
The feature importance of the fitted SVM regressor, or an array of zeroes if the kernel is not linear.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.TimeSeriesBaselineEstimator(gap=1, forecast_horizon=1, random_seed=0, **kwargs)[source]#
Time series estimator that predicts using the naive forecasting approach.
This is useful as a simple baseline estimator for time series problems.
- Parameters
gap (int) – Gap between prediction date and target date and must be a positive integer. If gap is 0, target date will be shifted ahead by 1 time period. Defaults to 1.
forecast_horizon (int) – Number of time steps the model is expected to predict.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.BASELINE
modifies_features
True
modifies_target
False
name
Time Series Baseline Estimator
supported_problem_types
[ ProblemTypes.TIME_SERIES_REGRESSION, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits time series baseline estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using fitted time series baseline estimator.
Make prediction probabilities using fitted time series baseline estimator.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Returns importance associated with each feature.
Since baseline estimators do not use input features to calculate predictions, returns an array of zeroes.
- Returns
An array of zeroes.
- Return type
np.ndarray (float)
- fit(self, X, y=None)[source]#
Fits time series baseline estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
- Raises
ValueError – If input y is None.
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X)[source]#
Make predictions using fitted time series baseline estimator.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
ValueError – If input y is None.
- predict_proba(self, X)[source]#
Make prediction probabilities using fitted time series baseline estimator.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted probability values.
- Return type
pd.DataFrame
- Raises
ValueError – If input y is None.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.VowpalWabbitBinaryClassifier(loss_function='logistic', learning_rate=0.5, decay_learning_rate=1.0, power_t=0.5, passes=1, random_seed=0, **kwargs)[source]#
Vowpal Wabbit Binary Classifier.
- Parameters
loss_function (str) – Specifies the loss function to use. One of {“squared”, “classic”, “hinge”, “logistic”, “quantile”}. Defaults to “logistic”.
learning_rate (float) – Boosting learning rate. Defaults to 0.5.
decay_learning_rate (float) – Decay factor for learning_rate. Defaults to 1.0.
power_t (float) – Power on learning rate decay. Defaults to 0.5.
passes (int) – Number of training passes. Defaults to 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
None
model_family
ModelFamily.VOWPAL_WABBIT
modifies_features
True
modifies_target
False
name
Vowpal Wabbit Binary Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.TIME_SERIES_BINARY,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance for Vowpal Wabbit classifiers. This is not implemented.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Feature importance for Vowpal Wabbit classifiers. This is not implemented.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.VowpalWabbitMulticlassClassifier(loss_function='logistic', learning_rate=0.5, decay_learning_rate=1.0, power_t=0.5, passes=1, random_seed=0, **kwargs)[source]#
Vowpal Wabbit Multiclass Classifier.
- Parameters
loss_function (str) – Specifies the loss function to use. One of {“squared”, “classic”, “hinge”, “logistic”, “quantile”}. Defaults to “logistic”.
learning_rate (float) – Boosting learning rate. Defaults to 0.5.
decay_learning_rate (float) – Decay factor for learning_rate. Defaults to 1.0.
power_t (float) – Power on learning rate decay. Defaults to 0.5.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
None
model_family
ModelFamily.VOWPAL_WABBIT
modifies_features
True
modifies_target
False
name
Vowpal Wabbit Multiclass Classifier
supported_problem_types
[ ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance for Vowpal Wabbit classifiers. This is not implemented.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Feature importance for Vowpal Wabbit classifiers. This is not implemented.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.VowpalWabbitRegressor(learning_rate=0.5, decay_learning_rate=1.0, power_t=0.5, passes=1, random_seed=0, **kwargs)[source]#
Vowpal Wabbit Regressor.
- Parameters
learning_rate (float) – Boosting learning rate. Defaults to 0.5.
decay_learning_rate (float) – Decay factor for learning_rate. Defaults to 1.0.
power_t (float) – Power on learning rate decay. Defaults to 0.5.
passes (int) – Number of training passes. Defaults to 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
None
model_family
ModelFamily.VOWPAL_WABBIT
modifies_features
True
modifies_target
False
name
Vowpal Wabbit Regressor
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance for Vowpal Wabbit regressor.
Fits estimator to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Feature importance for Vowpal Wabbit regressor.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#
Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series #
Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.XGBoostClassifier(eta=0.1, max_depth=6, min_child_weight=1, n_estimators=100, random_seed=0, eval_metric='logloss', n_jobs=12, **kwargs)[source]#
XGBoost Classifier.
- Parameters
eta (float) – Boosting learning rate. Defaults to 0.1.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
min_child_weight (float) – Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0
n_estimators (int) – Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100.
random_seed (int) – Seed for the random number generator. Defaults to 0.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to 12.
Attributes
hyperparameter_ranges
{ “eta”: Real(0.000001, 1), “max_depth”: Integer(1, 10), “min_child_weight”: Real(1, 10), “n_estimators”: Integer(1, 1000),}
model_family
ModelFamily.XGBOOST
modifies_features
True
modifies_target
False
name
XGBoost Classifier
SEED_MAX
None
SEED_MIN
None
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance of fitted XGBoost classifier.
Fits XGBoost classifier component to data.
Find the prediction intervals using the fitted regressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using the fitted XGBoost classifier.
Make predictions using the fitted CatBoost classifier.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self)#
Feature importance of fitted XGBoost classifier.
- fit(self, X, y=None)[source]#
Fits XGBoost classifier component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] #
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X)[source]#
Make predictions using the fitted XGBoost classifier.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.DataFrame
- predict_proba(self, X)[source]#
Make predictions using the fitted CatBoost classifier.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.DataFrame
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.estimators.XGBoostRegressor(eta: float = 0.1, max_depth: int = 6, min_child_weight: int = 1, n_estimators: int = 100, random_seed: Union[int, float] = 0, n_jobs: int = 12, **kwargs)[source]#
XGBoost Regressor.
- Parameters
eta (float) – Boosting learning rate. Defaults to 0.1.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
min_child_weight (float) – Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0
n_estimators (int) – Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100.
random_seed (int) – Seed for the random number generator. Defaults to 0.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to 12.
Attributes
hyperparameter_ranges
{ “eta”: Real(0.000001, 1), “max_depth”: Integer(1, 20), “min_child_weight”: Real(1, 10), “n_estimators”: Integer(1, 1000),}
model_family
ModelFamily.XGBOOST
modifies_features
True
modifies_target
False
name
XGBoost Regressor
SEED_MAX
None
SEED_MIN
None
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance of fitted XGBoost regressor.
Fits XGBoost regressor component to data.
Find the prediction intervals using the fitted ProphetRegressor.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using fitted XGBoost regressor.
Make probability estimates for labels.
Saves component at file path.
Updates the parameter dictionary of the component.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- property feature_importance(self) pandas.Series #
Feature importance of fitted XGBoost regressor.
- fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)[source]#
Fits XGBoost regressor component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None) Dict[str, pandas.Series] [source]#
Find the prediction intervals using the fitted ProphetRegressor.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data. Ignored.
coverage (List[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- predict(self, X: pandas.DataFrame) pandas.Series [source]#
Make predictions using fitted XGBoost regressor.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- predict_proba(self, X: pandas.DataFrame) pandas.Series #
Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.