components#

EvalML component classes.

Package Contents#

Classes Summary#

ARIMARegressor

Autoregressive Integrated Moving Average Model. The three parameters (p, d, q) are the AR order, the degree of differencing, and the MA order. More information here: https://www.statsmodels.org/devel/generated/statsmodels.tsa.arima.model.ARIMA.html.

BaselineClassifier

Classifier that predicts using the specified strategy.

BaselineRegressor

Baseline regressor that uses a simple strategy to make predictions. This is useful as a simple baseline regressor to compare with other regressors.

CatBoostClassifier

CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.

CatBoostRegressor

CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.

ComponentBase

Base class for all components.

ComponentBaseMeta

Metaclass that overrides creating a new component by wrapping methods with validators and setters.

DateTimeFeaturizer

Transformer that can automatically extract features from datetime columns.

DecisionTreeClassifier

Decision Tree Classifier.

DecisionTreeRegressor

Decision Tree Regressor.

DFSTransformer

Featuretools DFS component that generates features for the input features.

DropColumns

Drops specified columns in input data.

DropNaNRowsTransformer

Transformer to drop rows with NaN values.

DropNullColumns

Transformer to drop features whose percentage of NaN values exceeds a specified threshold.

DropRowsTransformer

Transformer to drop rows specified by row indices.

ElasticNetClassifier

Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator.

ElasticNetRegressor

Elastic Net Regressor.

EmailFeaturizer

Transformer that can automatically extract features from emails.

Estimator

A component that fits and predicts given data.

ExponentialSmoothingRegressor

Holt-Winters Exponential Smoothing Forecaster.

ExtraTreesClassifier

Extra Trees Classifier.

ExtraTreesRegressor

Extra Trees Regressor.

FeatureSelector

Selects top features based on importance weights.

Imputer

Imputes missing data according to a specified imputation strategy.

KNeighborsClassifier

K-Nearest Neighbors Classifier.

LabelEncoder

A transformer that encodes target labels using values between 0 and num_classes - 1.

LightGBMClassifier

LightGBM Classifier.

LightGBMRegressor

LightGBM Regressor.

LinearDiscriminantAnalysis

Reduces the number of features by using Linear Discriminant Analysis.

LinearRegressor

Linear Regressor.

LogisticRegressionClassifier

Logistic Regression Classifier.

LogTransformer

Applies a log transformation to the target data.

LSA

Transformer to calculate the Latent Semantic Analysis Values of text input.

MultiseriesTimeSeriesBaselineRegressor

Multiseries time series regressor that predicts using the naive forecasting approach.

NaturalLanguageFeaturizer

Transformer that can automatically featurize text columns using featuretools' nlp_primitives.

OneHotEncoder

A transformer that encodes categorical features in a one-hot numeric array.

OrdinalEncoder

A transformer that encodes ordinal features as an array of ordinal integers representing the relative order of categories.

Oversampler

SMOTE Oversampler component. Will automatically select whether to use SMOTE, SMOTEN, or SMOTENC based on inputs to the component.

PCA

Reduces the number of features by using Principal Component Analysis (PCA).

PerColumnImputer

Imputes missing data according to a specified imputation strategy per column.

PolynomialDecomposer

Removes trends and seasonality from time series by fitting a polynomial and moving average to the data.

ProphetRegressor

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

RandomForestClassifier

Random Forest Classifier.

RandomForestRegressor

Random Forest Regressor.

ReplaceNullableTypes

Transformer to replace features with the new nullable dtypes with a dtype that is compatible in EvalML.

RFClassifierRFESelector

Selects relevant features using recursive feature elimination with a Random Forest Classifier.

RFClassifierSelectFromModel

Selects top features based on importance weights using a Random Forest classifier.

RFRegressorRFESelector

Selects relevant features using recursive feature elimination with a Random Forest Regressor.

RFRegressorSelectFromModel

Selects top features based on importance weights using a Random Forest regressor.

SelectByType

Selects columns by specified Woodwork logical type or semantic tag in input data.

SelectColumns

Selects specified columns in input data.

SimpleImputer

Imputes missing data according to a specified imputation strategy. Natural language columns are ignored.

StackedEnsembleBase

Stacked Ensemble Base Class.

StackedEnsembleClassifier

Stacked Ensemble Classifier.

StackedEnsembleRegressor

Stacked Ensemble Regressor.

StandardScaler

A transformer that standardizes input features by removing the mean and scaling to unit variance.

STLDecomposer

Removes trends and seasonality from time series using the STL algorithm.

SVMClassifier

Support Vector Machine Classifier.

SVMRegressor

Support Vector Machine Regressor.

TargetEncoder

A transformer that encodes categorical features into target encodings.

TargetImputer

Imputes missing target data according to a specified imputation strategy.

TimeSeriesBaselineEstimator

Time series estimator that predicts using the naive forecasting approach.

TimeSeriesFeaturizer

Transformer that delays input features and target variable for time series problems.

TimeSeriesImputer

Imputes missing data according to a specified timeseries-specific imputation strategy.

TimeSeriesRegularizer

Transformer that regularizes an inconsistently spaced datetime column.

Transformer

A component that may or may not need fitting that transforms data. These components are used before an estimator.

Undersampler

Initializes an undersampling transformer to downsample the majority classes in the dataset.

URLFeaturizer

Transformer that can automatically extract features from URL.

VARMAXRegressor

Vector Autoregressive Moving Average with eXogenous regressors model. The two parameters (p, q) are the AR order and the MA order. More information here: https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.varmax.VARMAX.html.

XGBoostClassifier

XGBoost Classifier.

XGBoostRegressor

XGBoost Regressor.

Contents#

class evalml.pipelines.components.ARIMARegressor(time_index: Optional[Hashable] = None, trend: Optional[str] = None, start_p: int = 2, d: int = 0, start_q: int = 2, max_p: int = 5, max_d: int = 2, max_q: int = 5, seasonal: bool = True, sp: int = 1, n_jobs: int = -1, random_seed: Union[int, float] = 0, maxiter: int = 10, use_covariates: bool = True, **kwargs)[source]#

Autoregressive Integrated Moving Average Model. The three parameters (p, d, q) are the AR order, the degree of differencing, and the MA order. More information here: https://www.statsmodels.org/devel/generated/statsmodels.tsa.arima.model.ARIMA.html.

Currently ARIMARegressor isn’t supported via conda install. It’s recommended that it be installed via PyPI.

Parameters
  • time_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.

  • trend (str) – Controls the deterministic trend. Options are [‘n’, ‘c’, ‘t’, ‘ct’] where ‘c’ is a constant term, ‘t’ indicates a linear trend, and ‘ct’ is both. Can also be an iterable when defining a polynomial, such as [1, 1, 0, 1].

  • start_p (int) – Minimum Autoregressive order. Defaults to 2.

  • d (int) – Minimum Differencing degree. Defaults to 0.

  • start_q (int) – Minimum Moving Average order. Defaults to 2.

  • max_p (int) – Maximum Autoregressive order. Defaults to 5.

  • max_d (int) – Maximum Differencing degree. Defaults to 2.

  • max_q (int) – Maximum Moving Average order. Defaults to 5.

  • seasonal (boolean) – Whether to fit a seasonal model to ARIMA. Defaults to True.

  • sp (int or str) – Period for seasonal differencing, specifically the number of periods in each season. If “detect”, this model will automatically detect this parameter (given the time series is a standard frequency) and will fall back to 1 (no seasonality) if it cannot be detected. Defaults to 1.

  • n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “start_p”: Integer(1, 3), “d”: Integer(0, 2), “start_q”: Integer(1, 3), “max_p”: Integer(3, 10), “max_d”: Integer(2, 5), “max_q”: Integer(3, 10), “seasonal”: [True, False],}

max_cols

7

max_rows

1000

model_family

ModelFamily.ARIMA

modifies_features

True

modifies_target

False

name

ARIMA Regressor

supported_problem_types

[ProblemTypes.TIME_SERIES_REGRESSION]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns array of 0's with a length of 1 as feature_importance is not defined for ARIMA regressor.

fit

Fits ARIMA regressor to data.

get_prediction_intervals

Find the prediction intervals using the fitted ARIMARegressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using fitted ARIMA regressor.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) numpy.ndarray#

Returns array of 0’s with a length of 1 as feature_importance is not defined for ARIMA regressor.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)[source]#

Fits ARIMA regressor to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If y was not passed in.

get_prediction_intervals(self, X: pandas.DataFrame, y: pandas.Series = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series][source]#

Find the prediction intervals using the fitted ARIMARegressor.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Optional.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Not used for ARIMA regressor.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) pandas.Series[source]#

Make predictions using fitted ARIMA regressor.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data.

Returns

Predicted values.

Return type

pd.Series

Raises

ValueError – If X was passed to fit but not passed in predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.BaselineClassifier(strategy='mode', random_seed=0, **kwargs)[source]#

Classifier that predicts using the specified strategy.

This is useful as a simple baseline classifier to compare with other classifiers.

Parameters
  • strategy (str) – Method used to predict. Valid options are “mode”, “random” and “random_weighted”. Defaults to “mode”.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

model_family

ModelFamily.BASELINE

modifies_features

True

modifies_target

False

name

Baseline Classifier

supported_problem_types

[ProblemTypes.BINARY, ProblemTypes.MULTICLASS]

training_only

False

Methods

classes_

Returns class labels. Will return None before fitting.

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns importance associated with each feature. Since baseline classifiers do not use input features to calculate predictions, returns an array of zeroes.

fit

Fits baseline classifier component to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using the baseline classification strategy.

predict_proba

Make prediction probabilities using the baseline classification strategy.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

property classes_(self)#

Returns class labels. Will return None before fitting.

Returns

Class names

Return type

list[str] or list(float)

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Returns importance associated with each feature. Since baseline classifiers do not use input features to calculate predictions, returns an array of zeroes.

Returns

An array of zeroes

Return type

pd.Series

fit(self, X, y=None)[source]#

Fits baseline classifier component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If y is None.

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X)[source]#

Make predictions using the baseline classification strategy.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

predict_proba(self, X)[source]#

Make prediction probabilities using the baseline classification strategy.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted probability values.

Return type

pd.DataFrame

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.BaselineRegressor(strategy='mean', random_seed=0, **kwargs)[source]#

Baseline regressor that uses a simple strategy to make predictions. This is useful as a simple baseline regressor to compare with other regressors.

Parameters
  • strategy (str) – Method used to predict. Valid options are “mean”, “median”. Defaults to “mean”.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

model_family

ModelFamily.BASELINE

modifies_features

True

modifies_target

False

name

Baseline Regressor

supported_problem_types

[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns importance associated with each feature. Since baseline regressors do not use input features to calculate predictions, returns an array of zeroes.

fit

Fits baseline regression component to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using the baseline regression strategy.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Returns importance associated with each feature. Since baseline regressors do not use input features to calculate predictions, returns an array of zeroes.

Returns

An array of zeroes.

Return type

np.ndarray (float)

fit(self, X, y=None)[source]#

Fits baseline regression component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If input y is None.

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X)[source]#

Make predictions using the baseline regression strategy.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.CatBoostClassifier(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=True, allow_writing_files=False, random_seed=0, n_jobs=-1, **kwargs)[source]#

CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.

For more information, check out https://catboost.ai/

Parameters
  • n_estimators (float) – The maximum number of trees to build. Defaults to 10.

  • eta (float) – The learning rate. Defaults to 0.03.

  • max_depth (int) – The maximum tree depth for base learners. Defaults to 6.

  • bootstrap_type (string) – Defines the method for sampling the weights of objects. Available methods are ‘Bayesian’, ‘Bernoulli’, ‘MVS’. Defaults to None.

  • silent (boolean) – Whether to use the “silent” logging mode. Defaults to True.

  • allow_writing_files (boolean) – Whether to allow writing snapshot files while training. Defaults to False.

  • n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “n_estimators”: Integer(4, 100), “eta”: Real(0.000001, 1), “max_depth”: Integer(4, 10),}

model_family

ModelFamily.CATBOOST

modifies_features

True

modifies_target

False

name

CatBoost Classifier

supported_problem_types

[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Feature importance of fitted CatBoost classifier.

fit

Fits CatBoost classifier component to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using the fitted CatBoost classifier.

predict_proba

Make prediction probabilities using the fitted CatBoost classifier.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Feature importance of fitted CatBoost classifier.

fit(self, X, y=None)[source]#

Fits CatBoost classifier component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X)[source]#

Make predictions using the fitted CatBoost classifier.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

predict_proba(self, X)[source]#

Make prediction probabilities using the fitted CatBoost classifier.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted probability values.

Return type

pd.DataFrame

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.CatBoostRegressor(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=False, allow_writing_files=False, random_seed=0, n_jobs=-1, **kwargs)[source]#

CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.

For more information, check out https://catboost.ai/

Parameters
  • n_estimators (float) – The maximum number of trees to build. Defaults to 10.

  • eta (float) – The learning rate. Defaults to 0.03.

  • max_depth (int) – The maximum tree depth for base learners. Defaults to 6.

  • bootstrap_type (string) – Defines the method for sampling the weights of objects. Available methods are ‘Bayesian’, ‘Bernoulli’, ‘MVS’. Defaults to None.

  • silent (boolean) – Whether to use the “silent” logging mode. Defaults to True.

  • allow_writing_files (boolean) – Whether to allow writing snapshot files while training. Defaults to False.

  • n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “n_estimators”: Integer(4, 100), “eta”: Real(0.000001, 1), “max_depth”: Integer(4, 10),}

model_family

ModelFamily.CATBOOST

modifies_features

True

modifies_target

False

name

CatBoost Regressor

supported_problem_types

[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION, ProblemTypes.MULTISERIES_TIME_SERIES_REGRESSION,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Feature importance of fitted CatBoost regressor.

fit

Fits CatBoost regressor component to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using the fitted CatBoost regressor.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Feature importance of fitted CatBoost regressor.

fit(self, X, y=None)[source]#

Fits CatBoost regressor component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X)[source]#

Make predictions using the fitted CatBoost regressor.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.DataFrame

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.ComponentBase(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]#

Base class for all components.

Parameters
  • parameters (dict) – Dictionary of parameters for the component. Defaults to None.

  • component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

load

Loads component at file path.

modifies_features

Returns whether this component modifies (subsets or transforms) the features variable during transform.

modifies_target

Returns whether this component modifies (subsets or transforms) the target variable during transform.

name

Returns string name of this component.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

training_only

Returns whether or not this component should be evaluated during training-time only, or during both training and prediction time.

update_parameters

Updates the parameter dictionary of the component.

clone(self)[source]#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)[source]#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

static load(file_path)[source]#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

property modifies_features(cls)#

Returns whether this component modifies (subsets or transforms) the features variable during transform.

For Estimator objects, this attribute determines if the return value from predict or predict_proba should be used as features or targets.

property modifies_target(cls)#

Returns whether this component modifies (subsets or transforms) the target variable during transform.

For Estimator objects, this attribute determines if the return value from predict or predict_proba should be used as features or targets.

property name(cls)#

Returns string name of this component.

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)[source]#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

property training_only(cls)#

Returns whether or not this component should be evaluated during training-time only, or during both training and prediction time.

update_parameters(self, update_dict, reset_fit=True)[source]#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.ComponentBaseMeta[source]#

Metaclass that overrides creating a new component by wrapping methods with validators and setters.

Attributes

FIT_METHODS

[‘fit’, ‘fit_transform’]

METHODS_TO_CHECK

[‘predict’, ‘predict_proba’, ‘transform’, ‘inverse_transform’, ‘get_trend_dataframe’]

PROPERTIES_TO_CHECK

[‘feature_importance’]

Methods

check_for_fit

check_for_fit wraps a method that validates if self._is_fitted is True.

register

Register a virtual subclass of an ABC.

set_fit

Wrapper for the fit method.

classmethod check_for_fit(cls, method)[source]#

check_for_fit wraps a method that validates if self._is_fitted is True.

It raises an exception if False and calls and returns the wrapped method if True.

Parameters

method (callable) – Method to wrap.

Returns

The wrapped method.

Raises

ComponentNotYetFittedError – If component is not yet fitted.

register(cls, subclass)#

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

classmethod set_fit(cls, method)#

Wrapper for the fit method.

class evalml.pipelines.components.DateTimeFeaturizer(features_to_extract=None, encode_as_categories=False, time_index=None, random_seed=0, **kwargs)[source]#

Transformer that can automatically extract features from datetime columns.

Parameters
  • features_to_extract (list) – List of features to extract. Valid options include “year”, “month”, “day_of_week”, “hour”. Defaults to None.

  • encode_as_categories (bool) – Whether day-of-week and month features should be encoded as pandas “category” dtype. This allows OneHotEncoders to encode these features. Defaults to False.

  • time_index (str) – Name of the column containing the datetime information used to order the data. Ignored.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

DateTime Featurizer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fit the datetime featurizer component.

fit_transform

Fits on X and transforms X.

get_feature_names

Gets the categories of each datetime feature.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data X by creating new features using existing DateTime columns, and then dropping those DateTime columns.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fit the datetime featurizer component.

Parameters
  • X (pd.DataFrame) – Input features.

  • y (pd.Series, optional) – Target data. Ignored.

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

get_feature_names(self)[source]#

Gets the categories of each datetime feature.

Returns

Dictionary, where each key-value pair is a column name and a dictionary

mapping the unique feature values to their integer encoding.

Return type

dict

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by creating new features using existing DateTime columns, and then dropping those DateTime columns.

Parameters
  • X (pd.DataFrame) – Input features.

  • y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.DecisionTreeClassifier(criterion='gini', max_features='sqrt', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs)[source]#

Decision Tree Classifier.

Parameters
  • criterion ({"gini", "entropy"}) – The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. Defaults to “gini”.

  • max_features (int, float or {"sqrt", "log2"}) –

    The number of features to consider when looking for the best split:

    • If int, then consider max_features features at each split.

    • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

    • If “sqrt”, then max_features=sqrt(n_features).

    • If “log2”, then max_features=log2(n_features).

    • If None, then max_features = n_features.

    The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • max_depth (int) – The maximum depth of the tree. Defaults to 6.

  • min_samples_split (int or float) –

    The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.

    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

    Defaults to 2.

  • min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “criterion”: [“gini”, “entropy”], “max_features”: [“sqrt”, “log2”], “max_depth”: Integer(4, 10),}

model_family

ModelFamily.DECISION_TREE

modifies_features

True

modifies_target

False

name

Decision Tree Classifier

supported_problem_types

[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns importance associated with each feature.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) pandas.Series#

Returns importance associated with each feature.

Returns

Importance associated with each feature.

Return type

np.ndarray

Raises

MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.DecisionTreeRegressor(criterion='squared_error', max_features='sqrt', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs)[source]#

Decision Tree Regressor.

Parameters
  • criterion ({"squared_error", "friedman_mse", "absolute_error", "poisson"}) –

    The function to measure the quality of a split. Supported criteria are:

    • ”squared_error” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node

    • ”friedman_mse”, which uses mean squared error with Friedman”s improvement score for potential splits

    • ”absolute_error” for the mean absolute error, which minimizes the L1 loss using the median of each terminal node,

    • ”poisson” which uses reduction in Poisson deviance to find splits.

  • max_features (int, float or {"sqrt", "log2"}) –

    The number of features to consider when looking for the best split:

    • If int, then consider max_features features at each split.

    • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

    • If “sqrt”, then max_features=sqrt(n_features).

    • If “log2”, then max_features=log2(n_features).

    • If None, then max_features = n_features.

    The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • max_depth (int) – The maximum depth of the tree. Defaults to 6.

  • min_samples_split (int or float) –

    The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.

    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

    Defaults to 2.

  • min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “criterion”: [“squared_error”, “friedman_mse”, “absolute_error”], “max_features”: [“sqrt”, “log2”], “max_depth”: Integer(4, 10),}

model_family

ModelFamily.DECISION_TREE

modifies_features

True

modifies_target

False

name

Decision Tree Regressor

supported_problem_types

[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION, ProblemTypes.MULTISERIES_TIME_SERIES_REGRESSION,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns importance associated with each feature.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) pandas.Series#

Returns importance associated with each feature.

Returns

Importance associated with each feature.

Return type

np.ndarray

Raises

MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.DFSTransformer(index='index', features=None, random_seed=0, **kwargs)[source]#

Featuretools DFS component that generates features for the input features.

Parameters
  • index (string) – The name of the column that contains the indices. If no column with this name exists, then featuretools.EntitySet() creates a column with this name to serve as the index column. Defaults to ‘index’.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

  • features (list) – List of features to run DFS on. Defaults to None. Features will only be computed if the columns used by the feature exist in the input and if the feature itself is not in input. If features is an empty list, no transformation will occur to inputted data.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

DFS Transformer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

contains_pre_existing_features

Determines whether or not features from a DFS Transformer match pipeline input features.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the DFSTransformer Transformer component.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Computes the feature matrix for the input X using featuretools' dfs algorithm.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

static contains_pre_existing_features(dfs_features: Optional[List[featuretools.feature_base.FeatureBase]], input_feature_names: List[str], target: Optional[str] = None)[source]#

Determines whether or not features from a DFS Transformer match pipeline input features.

Parameters
  • dfs_features (Optional[List[FeatureBase]]) – List of features output from a DFS Transformer.

  • input_feature_names (List[str]) – List of input features into the DFS Transformer.

  • target (Optional[str]) – The target whose values we are trying to predict. This is used to know which column to ignore if the target column is present in the list of features in the DFS Transformer’s parameters.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the DFSTransformer Transformer component.

Parameters
  • X (pd.DataFrame, np.array) – The input data to transform, of shape [n_samples, n_features].

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Computes the feature matrix for the input X using featuretools’ dfs algorithm.

Parameters
  • X (pd.DataFrame or np.ndarray) – The input training data to transform. Has shape [n_samples, n_features]

  • y (pd.Series, optional) – Ignored.

Returns

Feature matrix

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.DropColumns(columns=None, random_seed=0, **kwargs)[source]#

Drops specified columns in input data.

Parameters
  • columns (list(string)) – List of column names, used to determine which columns to drop.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

Drop Columns Transformer

needs_fitting

False

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the transformer by checking if column names are present in the dataset.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data X by dropping columns.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits the transformer by checking if column names are present in the dataset.

Parameters
  • X (pd.DataFrame) – Data to check.

  • y (pd.Series, ignored) – Targets.

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by dropping columns.

Parameters
  • X (pd.DataFrame) – Data to transform.

  • y (pd.Series, optional) – Targets.

Returns

Transformed X.

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.DropNaNRowsTransformer(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]#

Transformer to drop rows with NaN values.

Parameters

random_seed (int) – Seed for the random number generator. Is not used by this component. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

True

name

Drop NaN Rows Transformer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data using fitted component.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data using fitted component.

Parameters
  • X (pd.DataFrame) – Features.

  • y (pd.Series, optional) – Target data.

Returns

Data with NaN rows dropped.

Return type

(pd.DataFrame, pd.Series)

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.DropNullColumns(pct_null_threshold=1.0, random_seed=0, **kwargs)[source]#

Transformer to drop features whose percentage of NaN values exceeds a specified threshold.

Parameters
  • pct_null_threshold (float) – The percentage of NaN values in an input feature to drop. Must be a value between [0, 1] inclusive. If equal to 0.0, will drop columns with any null values. If equal to 1.0, will drop columns with all null values. Defaults to 0.95.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

Drop Null Columns Transformer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data X by dropping columns that exceed the threshold of null values.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by dropping columns that exceed the threshold of null values.

Parameters
  • X (pd.DataFrame) – Data to transform

  • y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.DropRowsTransformer(indices_to_drop=None, random_seed=0)[source]#

Transformer to drop rows specified by row indices.

Parameters
  • indices_to_drop (list) – List of indices to drop in the input data. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Is not used by this component. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

True

name

Drop Rows Transformer

training_only

True

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data using fitted component.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If indices to drop do not exist in input features or target.

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data using fitted component.

Parameters
  • X (pd.DataFrame) – Features.

  • y (pd.Series, optional) – Target data.

Returns

Data with row indices dropped.

Return type

(pd.DataFrame, pd.Series)

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.ElasticNetClassifier(penalty='elasticnet', C=1.0, l1_ratio=0.15, multi_class='auto', solver='saga', n_jobs=-1, random_seed=0, **kwargs)[source]#

Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator.

Parameters
  • penalty ({"l1", "l2", "elasticnet", "none"}) – The norm used in penalization. Defaults to “elasticnet”.

  • C (float) – Inverse of regularization strength. Must be a positive float. Defaults to 1.0.

  • l1_ratio (float) – The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’. Setting l1_ratio=0 is equivalent to using penalty=’l2’, while setting l1_ratio=1 is equivalent to using penalty=’l1’. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15.

  • multi_class ({"auto", "ovr", "multinomial"}) – If the option chosen is “ovr”, then a binary problem is fit for each label. For “multinomial” the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. “multinomial” is unavailable when solver=”liblinear”. “auto” selects “ovr” if the data is binary, or if solver=”liblinear”, and otherwise selects “multinomial”. Defaults to “auto”.

  • solver ({"newton-cg", "lbfgs", "liblinear", "sag", "saga"}) –

    Algorithm to use in the optimization problem. For small datasets, “liblinear” is a good choice, whereas “sag” and “saga” are faster for large ones. For multiclass problems, only “newton-cg”, “sag”, “saga” and “lbfgs” handle multinomial loss; “liblinear” is limited to one-versus-rest schemes.

    • ”newton-cg”, “lbfgs”, “sag” and “saga” handle L2 or no penalty

    • ”liblinear” and “saga” also handle L1 penalty

    • ”saga” also supports “elasticnet” penalty

    • ”liblinear” does not support setting penalty=’none’

    Defaults to “saga”.

  • n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “C”: Real(0.01, 10), “l1_ratio”: Real(0, 1)}

model_family

ModelFamily.LINEAR_MODEL

modifies_features

True

modifies_target

False

name

Elastic Net Classifier

supported_problem_types

[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Feature importance for fitted ElasticNet classifier.

fit

Fits ElasticNet classifier component to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Feature importance for fitted ElasticNet classifier.

fit(self, X, y)[source]#

Fits ElasticNet classifier component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.ElasticNetRegressor(alpha=0.0001, l1_ratio=0.15, max_iter=1000, random_seed=0, **kwargs)[source]#

Elastic Net Regressor.

Parameters
  • alpha (float) – Constant that multiplies the penalty terms. Defaults to 0.0001.

  • l1_ratio (float) – The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’. Setting l1_ratio=0 is equivalent to using penalty=’l2’, while setting l1_ratio=1 is equivalent to using penalty=’l1’. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15.

  • max_iter (int) – The maximum number of iterations. Defaults to 1000.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “alpha”: Real(0, 1), “l1_ratio”: Real(0, 1),}

model_family

ModelFamily.LINEAR_MODEL

modifies_features

True

modifies_target

False

name

Elastic Net Regressor

supported_problem_types

[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION, ProblemTypes.MULTISERIES_TIME_SERIES_REGRESSION,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Feature importance for fitted ElasticNet regressor.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Feature importance for fitted ElasticNet regressor.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.EmailFeaturizer(random_seed=0, **kwargs)[source]#

Transformer that can automatically extract features from emails.

Parameters

random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

Email Featurizer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data X.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)#

Transforms data X.

Parameters
  • X (pd.DataFrame) – Data to transform.

  • y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.Estimator(parameters: dict = None, component_obj: Type[evalml.pipelines.components.ComponentBase] = None, random_seed: Union[int, float] = 0, **kwargs)[source]#

A component that fits and predicts given data.

To implement a new Estimator, define your own class which is a subclass of Estimator, including a name and a list of acceptable ranges for any parameters to be tuned during the automl search (hyperparameters). Define an __init__ method which sets up any necessary state and objects. Make sure your __init__ only uses standard keyword arguments and calls super().__init__() with a parameters dict. You may also override the fit, transform, fit_transform and other methods in this class if appropriate.

To see some examples, check out the definitions of any Estimator component subclass.

Parameters
  • parameters (dict) – Dictionary of parameters for the component. Defaults to None.

  • component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

modifies_features

True

modifies_target

False

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns importance associated with each feature.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

model_family

Returns ModelFamily of this component.

name

Returns string name of this component.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

supported_problem_types

Problem types this estimator supports.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) pandas.Series#

Returns importance associated with each feature.

Returns

Importance associated with each feature.

Return type

np.ndarray

Raises

MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)[source]#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series][source]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

property model_family(cls)#

Returns ModelFamily of this component.

property name(cls)#

Returns string name of this component.

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series[source]#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series[source]#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

property supported_problem_types(cls)#

Problem types this estimator supports.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.ExponentialSmoothingRegressor(trend: Optional[str] = None, damped_trend: bool = False, seasonal: Optional[str] = None, sp: int = 2, n_jobs: int = -1, random_seed: Union[int, float] = 0, **kwargs)[source]#

Holt-Winters Exponential Smoothing Forecaster.

Currently ExponentialSmoothingRegressor isn’t supported via conda install. It’s recommended that it be installed via PyPI.

Parameters
  • trend (str) – Type of trend component. Defaults to None.

  • damped_trend (bool) – If the trend component should be damped. Defaults to False.

  • seasonal (str) – Type of seasonal component. Takes one of {“additive”, None}. Can also be multiplicative if

  • 0 (none of the target data is) –

  • None. (but AutoMLSearch wiill not tune for this. Defaults to) –

  • sp (int) – The number of seasonal periods to consider. Defaults to 2.

  • n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “trend”: [None, “additive”], “damped_trend”: [True, False], “seasonal”: [None, “additive”], “sp”: Integer(2, 8),}

model_family

ModelFamily.EXPONENTIAL_SMOOTHING

modifies_features

True

modifies_target

False

name

Exponential Smoothing Regressor

supported_problem_types

[ProblemTypes.TIME_SERIES_REGRESSION]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns array of 0's with a length of 1 as feature_importance is not defined for Exponential Smoothing regressor.

fit

Fits Exponential Smoothing Regressor to data.

get_prediction_intervals

Find the prediction intervals using the fitted ExponentialSmoothingRegressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using fitted Exponential Smoothing regressor.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) pandas.Series#

Returns array of 0’s with a length of 1 as feature_importance is not defined for Exponential Smoothing regressor.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)[source]#

Fits Exponential Smoothing Regressor to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features]. Ignored.

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If y was not passed in.

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series][source]#

Find the prediction intervals using the fitted ExponentialSmoothingRegressor.

Calculates the prediction intervals by using a simulation of the time series following a specified state space model.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Optional.

  • coverage (List[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Not used for Exponential Smoothing regressor.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) pandas.Series[source]#

Make predictions using fitted Exponential Smoothing regressor.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features]. Ignored except to set forecast horizon.

  • y (pd.Series) – Target data.

Returns

Predicted values.

Return type

pd.Series

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.ExtraTreesClassifier(n_estimators=100, max_features='sqrt', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, n_jobs=-1, random_seed=0, **kwargs)[source]#

Extra Trees Classifier.

Parameters
  • n_estimators (float) – The number of trees in the forest. Defaults to 100.

  • max_features (int, float or {"sqrt", "log2"}) –

    The number of features to consider when looking for the best split:

    • If int, then consider max_features features at each split.

    • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

    • If “sqrt”, then max_features=sqrt(n_features).

    • If “log2”, then max_features=log2(n_features).

    • If None, then max_features = n_features.

    The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • max_depth (int) – The maximum depth of the tree. Defaults to 6.

  • min_samples_split (int or float) –

    The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.

    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

  • 2. (Defaults to) –

  • min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.

  • n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “n_estimators”: Integer(10, 1000), “max_features”: [“sqrt”, “log2”], “max_depth”: Integer(4, 10),}

model_family

ModelFamily.EXTRA_TREES

modifies_features

True

modifies_target

False

name

Extra Trees Classifier

supported_problem_types

[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns importance associated with each feature.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) pandas.Series#

Returns importance associated with each feature.

Returns

Importance associated with each feature.

Return type

np.ndarray

Raises

MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.ExtraTreesRegressor(n_estimators: int = 100, max_features: str = 'sqrt', max_depth: int = 6, min_samples_split: int = 2, min_weight_fraction_leaf: float = 0.0, n_jobs: int = -1, random_seed: Union[int, float] = 0, **kwargs)[source]#

Extra Trees Regressor.

Parameters
  • n_estimators (float) – The number of trees in the forest. Defaults to 100.

  • max_features (int, float or {"sqrt", "log2"}) –

    The number of features to consider when looking for the best split:

    • If int, then consider max_features features at each split.

    • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

    • If “sqrt”, then max_features=sqrt(n_features).

    • If “log2”, then max_features=log2(n_features).

    • If None, then max_features = n_features.

    The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • max_depth (int) – The maximum depth of the tree. Defaults to 6.

  • min_samples_split (int or float) –

    The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.

    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

  • 2. (Defaults to) –

  • min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.

  • n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “n_estimators”: Integer(10, 1000), “max_features”: [“sqrt”, “log2”], “max_depth”: Integer(4, 10),}

model_family

ModelFamily.EXTRA_TREES

modifies_features

True

modifies_target

False

name

Extra Trees Regressor

supported_problem_types

[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION, ProblemTypes.MULTISERIES_TIME_SERIES_REGRESSION,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns importance associated with each feature.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted ExtraTreesRegressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) pandas.Series#

Returns importance associated with each feature.

Returns

Importance associated with each feature.

Return type

np.ndarray

Raises

MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series][source]#

Find the prediction intervals using the fitted ExtraTreesRegressor.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Optional.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.FeatureSelector(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]#

Selects top features based on importance weights.

Parameters
  • parameters (dict) – Dictionary of parameters for the component. Defaults to None.

  • component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

modifies_features

True

modifies_target

False

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

fit_transform

Fit and transform data using the feature selector.

get_names

Get names of selected features.

load

Loads component at file path.

name

Returns string name of this component.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)[source]#

Fit and transform data using the feature selector.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

get_names(self)[source]#

Get names of selected features.

Returns

List of the names of features selected.

Return type

list[str]

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

property name(cls)#

Returns string name of this component.

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

Parameters
  • X (pd.DataFrame) – Data to transform.

  • y (pd.Series, optional) – Target data. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If feature selector does not have a transform method or a component_obj that implements transform

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.Imputer(categorical_impute_strategy='most_frequent', categorical_fill_value=None, numeric_impute_strategy='mean', numeric_fill_value=None, boolean_impute_strategy='most_frequent', boolean_fill_value=None, random_seed=0, **kwargs)[source]#

Imputes missing data according to a specified imputation strategy.

Parameters
  • categorical_impute_strategy (string) – Impute strategy to use for string, object, boolean, categorical dtypes. Valid values include “most_frequent” and “constant”.

  • numeric_impute_strategy (string) – Impute strategy to use for numeric columns. Valid values include “mean”, “median”, “most_frequent”, and “constant”.

  • boolean_impute_strategy (string) – Impute strategy to use for boolean columns. Valid values include “most_frequent” and “constant”.

  • categorical_fill_value (string) – When categorical_impute_strategy == “constant”, fill_value is used to replace missing data. The default value of None will fill with the string “missing_value”.

  • numeric_fill_value (int, float) – When numeric_impute_strategy == “constant”, fill_value is used to replace missing data. The default value of None will fill with 0.

  • boolean_fill_value (bool) – When boolean_impute_strategy == “constant”, fill_value is used to replace missing data. The default value of None will fill with True.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “categorical_impute_strategy”: [“most_frequent”], “numeric_impute_strategy”: [“mean”, “median”, “most_frequent”, “knn”], “boolean_impute_strategy”: [“most_frequent”]}

modifies_features

True

modifies_target

False

name

Imputer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits imputer to data. 'None' values are converted to np.nan before imputation and are treated as the same.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data X by imputing missing values.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are treated as the same.

Parameters
  • X (pd.DataFrame, np.ndarray) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by imputing missing values.

Parameters
  • X (pd.DataFrame) – Data to transform

  • y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, random_seed=0, **kwargs)[source]#

K-Nearest Neighbors Classifier.

Parameters
  • n_neighbors (int) – Number of neighbors to use by default. Defaults to 5.

  • weights ({‘uniform’, ‘distance’} or callable) –

    Weight function used in prediction. Can be:

    • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.

    • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

    • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

    Defaults to “uniform”.

  • algorithm ({‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}) –

    Algorithm used to compute the nearest neighbors:

    • ‘ball_tree’ will use BallTree

    • ‘kd_tree’ will use KDTree

    • ‘brute’ will use a brute-force search.

    ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method. Defaults to “auto”. Note: fitting on sparse input will override the setting of this parameter, using brute force.

  • leaf_size (int) – Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. Defaults to 30.

  • p (int) – Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. Defaults to 2.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “n_neighbors”: Integer(2, 12), “weights”: [“uniform”, “distance”], “algorithm”: [“auto”, “ball_tree”, “kd_tree”, “brute”], “leaf_size”: Integer(10, 30), “p”: Integer(1, 5),}

model_family

ModelFamily.K_NEIGHBORS

modifies_features

True

modifies_target

False

name

KNN Classifier

supported_problem_types

[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns array of 0's matching the input number of features as feature_importance is not defined for KNN classifiers.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Returns array of 0’s matching the input number of features as feature_importance is not defined for KNN classifiers.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series[source]#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

predict_proba(self, X: pandas.DataFrame) pandas.Series[source]#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.LabelEncoder(positive_label=None, random_seed=0, **kwargs)[source]#

A transformer that encodes target labels using values between 0 and num_classes - 1.

Parameters
  • positive_label (int, str) – The label for the class that should be treated as positive (1) for binary classification problems. Ignored for multiclass problems. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0. Ignored.

Attributes

hyperparameter_ranges

{}

modifies_features

False

modifies_target

True

name

Label Encoder

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the label encoder.

fit_transform

Fit and transform data using the label encoder.

inverse_transform

Decodes the target data.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transform the target using the fitted label encoder.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)[source]#

Fits the label encoder.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features]. Ignored.

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If input y is None.

fit_transform(self, X, y)[source]#

Fit and transform data using the label encoder.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series) – The target training data of length [n_samples].

Returns

The original features and an encoded version of the target.

Return type

pd.DataFrame, pd.Series

inverse_transform(self, y)[source]#

Decodes the target data.

Parameters

y (pd.Series) – Target data.

Returns

The decoded version of the target.

Return type

pd.Series

Raises

ValueError – If input y is None.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transform the target using the fitted label encoder.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features]. Ignored.

  • y (pd.Series) – The target training data of length [n_samples].

Returns

The original features and an encoded version of the target.

Return type

pd.DataFrame, pd.Series

Raises

ValueError – If input y is None.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.LightGBMClassifier(boosting_type='gbdt', learning_rate=0.1, n_estimators=100, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=-1, random_seed=0, **kwargs)[source]#

LightGBM Classifier.

Parameters
  • boosting_type (string) – Type of boosting to use. Defaults to “gbdt”. - ‘gbdt’ uses traditional Gradient Boosting Decision Tree - “dart”, uses Dropouts meet Multiple Additive Regression Trees - “goss”, uses Gradient-based One-Side Sampling - “rf”, uses Random Forest

  • learning_rate (float) – Boosting learning rate. Defaults to 0.1.

  • n_estimators (int) – Number of boosted trees to fit. Defaults to 100.

  • max_depth (int) – Maximum tree depth for base learners, <=0 means no limit. Defaults to 0.

  • num_leaves (int) – Maximum tree leaves for base learners. Defaults to 31.

  • min_child_samples (int) – Minimum number of data needed in a child (leaf). Defaults to 20.

  • bagging_fraction (float) – LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9.

  • bagging_freq (int) – Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0.

  • n_jobs (int or None) – Number of threads to run in parallel. -1 uses all threads. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “learning_rate”: Real(0.000001, 1), “boosting_type”: [“gbdt”, “dart”, “goss”, “rf”], “n_estimators”: Integer(10, 100), “max_depth”: Integer(0, 10), “num_leaves”: Integer(2, 100), “min_child_samples”: Integer(1, 100), “bagging_fraction”: Real(0.000001, 1), “bagging_freq”: Integer(0, 1),}

model_family

ModelFamily.LIGHTGBM

modifies_features

True

modifies_target

False

name

LightGBM Classifier

SEED_MAX

SEED_BOUNDS.max_bound

SEED_MIN

0

supported_problem_types

[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns importance associated with each feature.

fit

Fits LightGBM classifier component to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using the fitted LightGBM classifier.

predict_proba

Make prediction probabilities using the fitted LightGBM classifier.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) pandas.Series#

Returns importance associated with each feature.

Returns

Importance associated with each feature.

Return type

np.ndarray

Raises

MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.

fit(self, X, y=None)[source]#

Fits LightGBM classifier component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X)[source]#

Make predictions using the fitted LightGBM classifier.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.DataFrame

predict_proba(self, X)[source]#

Make prediction probabilities using the fitted LightGBM classifier.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted probability values.

Return type

pd.DataFrame

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.LightGBMRegressor(boosting_type='gbdt', learning_rate=0.1, n_estimators=20, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=-1, random_seed=0, **kwargs)[source]#

LightGBM Regressor.

Parameters
  • boosting_type (string) – Type of boosting to use. Defaults to “gbdt”. - ‘gbdt’ uses traditional Gradient Boosting Decision Tree - “dart”, uses Dropouts meet Multiple Additive Regression Trees - “goss”, uses Gradient-based One-Side Sampling - “rf”, uses Random Forest

  • learning_rate (float) – Boosting learning rate. Defaults to 0.1.

  • n_estimators (int) – Number of boosted trees to fit. Defaults to 100.

  • max_depth (int) – Maximum tree depth for base learners, <=0 means no limit. Defaults to 0.

  • num_leaves (int) – Maximum tree leaves for base learners. Defaults to 31.

  • min_child_samples (int) – Minimum number of data needed in a child (leaf). Defaults to 20.

  • bagging_fraction (float) – LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9.

  • bagging_freq (int) – Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0.

  • n_jobs (int or None) – Number of threads to run in parallel. -1 uses all threads. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “learning_rate”: Real(0.000001, 1), “boosting_type”: [“gbdt”, “dart”, “goss”, “rf”], “n_estimators”: Integer(10, 100), “max_depth”: Integer(0, 10), “num_leaves”: Integer(2, 100), “min_child_samples”: Integer(1, 100), “bagging_fraction”: Real(0.000001, 1), “bagging_freq”: Integer(0, 1),}

model_family

ModelFamily.LIGHTGBM

modifies_features

True

modifies_target

False

name

LightGBM Regressor

SEED_MAX

SEED_BOUNDS.max_bound

SEED_MIN

0

supported_problem_types

[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns importance associated with each feature.

fit

Fits LightGBM regressor to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using fitted LightGBM regressor.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) pandas.Series#

Returns importance associated with each feature.

Returns

Importance associated with each feature.

Return type

np.ndarray

Raises

MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.

fit(self, X, y=None)[source]#

Fits LightGBM regressor to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X)[source]#

Make predictions using fitted LightGBM regressor.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.LinearDiscriminantAnalysis(n_components=None, random_seed=0, **kwargs)[source]#

Reduces the number of features by using Linear Discriminant Analysis.

Parameters
  • n_components (int) – The number of features to maintain after computation. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

Linear Discriminant Analysis Transformer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the LDA component.

fit_transform

Fit and transform data using the LDA component.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transform data using the fitted LDA component.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)[source]#

Fits the LDA component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If input data is not all numeric.

fit_transform(self, X, y=None)[source]#

Fit and transform data using the LDA component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

Raises

ValueError – If input data is not all numeric.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transform data using the fitted LDA component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

Raises

ValueError – If input data is not all numeric.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.LinearRegressor(fit_intercept=True, n_jobs=-1, random_seed=0, **kwargs)[source]#

Linear Regressor.

Parameters
  • fit_intercept (boolean) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered). Defaults to True.

  • n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all threads. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “fit_intercept”: [True, False],}

model_family

ModelFamily.LINEAR_MODEL

modifies_features

True

modifies_target

False

name

Linear Regressor

supported_problem_types

[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION, ProblemTypes.MULTISERIES_TIME_SERIES_REGRESSION,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Feature importance for fitted linear regressor.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Feature importance for fitted linear regressor.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.LogisticRegressionClassifier(penalty='l2', C=1.0, multi_class='auto', solver='lbfgs', n_jobs=-1, random_seed=0, **kwargs)[source]#

Logistic Regression Classifier.

Parameters
  • penalty ({"l1", "l2", "elasticnet", "none"}) – The norm used in penalization. Defaults to “l2”.

  • C (float) – Inverse of regularization strength. Must be a positive float. Defaults to 1.0.

  • multi_class ({"auto", "ovr", "multinomial"}) – If the option chosen is “ovr”, then a binary problem is fit for each label. For “multinomial” the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. “multinomial” is unavailable when solver=”liblinear”. “auto” selects “ovr” if the data is binary, or if solver=”liblinear”, and otherwise selects “multinomial”. Defaults to “auto”.

  • solver ({"newton-cg", "lbfgs", "liblinear", "sag", "saga"}) –

    Algorithm to use in the optimization problem. For small datasets, “liblinear” is a good choice, whereas “sag” and “saga” are faster for large ones. For multiclass problems, only “newton-cg”, “sag”, “saga” and “lbfgs” handle multinomial loss; “liblinear” is limited to one-versus-rest schemes.

    • ”newton-cg”, “lbfgs”, “sag” and “saga” handle L2 or no penalty

    • ”liblinear” and “saga” also handle L1 penalty

    • ”saga” also supports “elasticnet” penalty

    • ”liblinear” does not support setting penalty=’none’

    Defaults to “lbfgs”.

  • n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “penalty”: [“l2”], “C”: Real(0.01, 10),}

model_family

ModelFamily.LINEAR_MODEL

modifies_features

True

modifies_target

False

name

Logistic Regression Classifier

supported_problem_types

[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Feature importance for fitted logistic regression classifier.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Feature importance for fitted logistic regression classifier.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.LogTransformer(random_seed=0)[source]#

Applies a log transformation to the target data.

Attributes

hyperparameter_ranges

{}

modifies_features

False

modifies_target

True

name

Log Transformer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the LogTransformer.

fit_transform

Log transforms the target variable.

inverse_transform

Apply exponential to target data.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Log transforms the target variable.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the LogTransformer.

Parameters
  • X (pd.DataFrame or np.ndarray) – Ignored.

  • y (pd.Series, optional) – Ignored.

Returns

self

fit_transform(self, X, y=None)[source]#

Log transforms the target variable.

Parameters
  • X (pd.DataFrame, optional) – Ignored.

  • y (pd.Series) – Target variable to log transform.

Returns

The input features are returned without modification. The target

variable y is log transformed.

Return type

tuple of pd.DataFrame, pd.Series

inverse_transform(self, y)[source]#

Apply exponential to target data.

Parameters

y (pd.Series) – Target variable.

Returns

Target with exponential applied.

Return type

pd.Series

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Log transforms the target variable.

Parameters
  • X (pd.DataFrame, optional) – Ignored.

  • y (pd.Series) – Target data to log transform.

Returns

The input features are returned without modification. The target

variable y is log transformed.

Return type

tuple of pd.DataFrame, pd.Series

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.LSA(random_seed=0, **kwargs)[source]#

Transformer to calculate the Latent Semantic Analysis Values of text input.

Parameters

random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

LSA Transformer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the input data.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data X by applying the LSA pipeline.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the input data.

Parameters
  • X (pd.DataFrame) – The data to transform.

  • y (pd.Series, optional) – Ignored.

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by applying the LSA pipeline.

Parameters
  • X (pd.DataFrame) – The data to transform.

  • y (pd.Series, optional) – Ignored.

Returns

Transformed X. The original column is removed and replaced with two columns of the

format LSA(original_column_name)[feature_number], where feature_number is 0 or 1.

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.MultiseriesTimeSeriesBaselineRegressor(gap=1, forecast_horizon=1, random_seed=0, **kwargs)[source]#

Multiseries time series regressor that predicts using the naive forecasting approach.

This is useful as a simple baseline estimator for multiseries time series problems.

Parameters
  • gap (int) – Gap between prediction date and target date and must be a positive integer. If gap is 0, target date will be shifted ahead by 1 time period. Defaults to 1.

  • forecast_horizon (int) – Number of time steps the model is expected to predict.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

model_family

ModelFamily.BASELINE

modifies_features

True

modifies_target

False

name

Multiseries Time Series Baseline Regressor

supported_problem_types

[ ProblemTypes.MULTISERIES_TIME_SERIES_REGRESSION,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns importance associated with each feature.

fit

Fits multiseries time series baseline regressor to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using fitted multiseries time series baseline regressor.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Returns importance associated with each feature.

Since baseline estimators do not use input features to calculate predictions, returns an array of zeroes.

Returns

An array of zeroes.

Return type

np.ndarray (float)

fit(self, X, y=None)[source]#

Fits multiseries time series baseline regressor to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features * n_series].

  • y (pd.DataFrame) – The target training data of shape [n_samples, n_features * n_series].

Returns

self

Raises

ValueError – If input y is None or if y is not a DataFrame with multiple columns.

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X)[source]#

Make predictions using fitted multiseries time series baseline regressor.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.DataFrame

Raises

ValueError – If the lagged columns are not present in X.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.NaturalLanguageFeaturizer(random_seed=0, **kwargs)[source]#

Transformer that can automatically featurize text columns using featuretools’ nlp_primitives.

Since models cannot handle non-numeric data, any text must be broken down into features that provide useful information about that text. This component splits each text column into several informative features: Diversity Score, Mean Characters per Word, Polarity Score, LSA (Latent Semantic Analysis), Number of Characters, and Number of Words. Calling transform on this component will replace any text columns in the given dataset with these numeric columns.

Parameters

random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

Natural Language Featurizer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data X by creating new features using existing text columns.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits component to data.

Parameters
  • X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]

  • y (pd.Series) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by creating new features using existing text columns.

Parameters
  • X (pd.DataFrame) – The data to transform.

  • y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.OneHotEncoder(top_n=10, features_to_encode=None, categories=None, drop='if_binary', handle_unknown='ignore', handle_missing='error', random_seed=0, **kwargs)[source]#

A transformer that encodes categorical features in a one-hot numeric array.

Parameters
  • top_n (int) – Number of categories per column to encode. If None, all categories will be encoded. Otherwise, the n most frequent will be encoded and all others will be dropped. Defaults to 10.

  • features_to_encode (list[str]) – List of columns to encode. All other columns will remain untouched. If None, all appropriate columns will be encoded. Defaults to None.

  • categories (list) – A two dimensional list of categories, where categories[i] is a list of the categories for the column at index i. This can also be None, or “auto” if top_n is not None. Defaults to None.

  • drop (string, list) – Method (“first” or “if_binary”) to use to drop one category per feature. Can also be a list specifying which categories to drop for each feature. Defaults to ‘if_binary’.

  • handle_unknown (string) – Whether to ignore or error for unknown categories for a feature encountered during fit or transform. If either top_n or categories is used to limit the number of categories per column, this must be “ignore”. Defaults to “ignore”.

  • handle_missing (string) – Options for how to handle missing (NaN) values encountered during fit or transform. If this is set to “as_category” and NaN values are within the n most frequent, “nan” values will be encoded as their own column. If this is set to “error”, any missing values encountered will raise an error. Defaults to “error”.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

One Hot Encoder

training_only

False

Methods

categories

Returns a list of the unique categories to be encoded for the particular feature, in order.

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the one-hot encoder component.

fit_transform

Fits on X and transforms X.

get_feature_names

Return feature names for the categorical features after fitting.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

One-hot encode the input data.

update_parameters

Updates the parameter dictionary of the component.

categories(self, feature_name)[source]#

Returns a list of the unique categories to be encoded for the particular feature, in order.

Parameters

feature_name (str) – The name of any feature provided to one-hot encoder during fit.

Returns

The unique categories, in the same dtype as they were provided during fit.

Return type

np.ndarray

Raises

ValueError – If feature was not provided to one-hot encoder as a training feature.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the one-hot encoder component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If encoding a column failed.

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

get_feature_names(self)[source]#

Return feature names for the categorical features after fitting.

Feature names are formatted as {column name}_{category name}. In the event of a duplicate name, an integer will be added at the end of the feature name to distinguish it.

For example, consider a dataframe with a column called “A” and category “x_y” and another column called “A_x” with “y”. In this example, the feature names would be “A_x_y” and “A_x_y_1”.

Returns

The feature names after encoding, provided in the same order as input_features.

Return type

np.ndarray

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

One-hot encode the input data.

Parameters
  • X (pd.DataFrame) – Features to one-hot encode.

  • y (pd.Series) – Ignored.

Returns

Transformed data, where each categorical feature has been encoded into numerical columns using one-hot encoding.

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.OrdinalEncoder(features_to_encode=None, categories=None, handle_unknown='error', unknown_value=None, encoded_missing_value=None, random_seed=0, **kwargs)[source]#

A transformer that encodes ordinal features as an array of ordinal integers representing the relative order of categories.

Parameters
  • features_to_encode (list[str]) – List of columns to encode. All other columns will remain untouched. If None, all appropriate columns will be encoded. Defaults to None. The order of columns does not matter.

  • categories (dict[str, list[str]]) – A dictionary mapping column names to their categories in the dataframes passed in at fit and transform. The order of categories specified for a column does not matter. Any category found in the data that is not present in categories will be handled as an unknown value. To not have unknown values raise an error, set handle_unknown to “use_encoded_value”. Defaults to None.

  • handle_unknown ("error" or "use_encoded_value") – Whether to ignore or error for unknown categories for a feature encountered during fit or transform. When set to “error”, an error will be raised when an unknown category is found. When set to “use_encoded_value”, unknown categories will be encoded as the value given for the parameter unknown_value. Defaults to “error.”

  • unknown_value (int or np.nan) – The value to use for unknown categories seen during fit or transform. Required when the parameter handle_unknown is set to “use_encoded_value.” The value has to be distinct from the values used to encode any of the categories in fit. Defaults to None.

  • encoded_missing_value (int or np.nan) – The value to use for missing (null) values seen during fit or transform. Defaults to np.nan.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

Ordinal Encoder

training_only

False

Methods

categories

Returns a list of the unique categories to be encoded for the particular feature, in order.

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the ordinal encoder component.

fit_transform

Fits on X and transforms X.

get_feature_names

Return feature names for the ordinal features after fitting.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Ordinally encode the input data.

update_parameters

Updates the parameter dictionary of the component.

categories(self, feature_name)[source]#

Returns a list of the unique categories to be encoded for the particular feature, in order.

Parameters

feature_name (str) – The name of any feature provided to ordinal encoder during fit.

Returns

The unique categories, in the same dtype as they were provided during fit.

Return type

np.ndarray

Raises

ValueError – If feature was not provided to ordinal encoder as a training feature.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the ordinal encoder component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises
  • ValueError – If encoding a column failed.

  • TypeError – If non-Ordinal columns are specified in features_to_encode.

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

get_feature_names(self)[source]#

Return feature names for the ordinal features after fitting.

Feature names are formatted as {column name}_ordinal_encoding.

Returns

The feature names after encoding, provided in the same order as input_features.

Return type

np.ndarray

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Ordinally encode the input data.

Parameters
  • X (pd.DataFrame) – Features to encode.

  • y (pd.Series) – Ignored.

Returns

Transformed data, where each ordinal feature has been encoded into a numerical column where ordinal integers represent the relative order of categories.

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.Oversampler(sampling_ratio=0.25, sampling_ratio_dict=None, k_neighbors_default=5, n_jobs=-1, random_seed=0, **kwargs)[source]#

SMOTE Oversampler component. Will automatically select whether to use SMOTE, SMOTEN, or SMOTENC based on inputs to the component.

Parameters
  • sampling_ratio (float) – This is the goal ratio of the minority to majority class, with range (0, 1]. A value of 0.25 means we want a 1:4 ratio of the minority to majority class after oversampling. We will create the a sampling dictionary using this ratio, with the keys corresponding to the class and the values responding to the number of samples. Defaults to 0.25.

  • sampling_ratio_dict (dict) – A dictionary specifying the desired balanced ratio for each target value. For instance, in a binary case where class 1 is the minority, we could specify: sampling_ratio_dict={0: 0.5, 1: 1}, which means we would undersample class 0 to have twice the number of samples as class 1 (minority:majority ratio = 0.5), and don’t sample class 1. Overrides sampling_ratio if provided. Defaults to None.

  • k_neighbors_default (int) – The number of nearest neighbors used to construct synthetic samples. This is the default value used, but the actual k_neighbors value might be smaller if there are less samples. Defaults to 5.

  • n_jobs (int) – The number of CPU cores to use. Defaults to -1.

  • random_seed (int) – The seed to use for random sampling. Defaults to 0.

Attributes

hyperparameter_ranges

None

modifies_features

True

modifies_target

True

name

Oversampler

training_only

True

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits oversampler to data.

fit_transform

Fit and transform data using the sampler component.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms the input data by Oversampling the data.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)[source]#

Fits oversampler to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y)#

Fit and transform data using the sampler component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

(pd.DataFrame, pd.Series)

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms the input data by Oversampling the data.

Parameters
  • X (pd.DataFrame) – Training features.

  • y (pd.Series) – Target.

Returns

Transformed features and target.

Return type

pd.DataFrame, pd.Series

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.PCA(variance=0.95, n_components=None, random_seed=0, **kwargs)[source]#

Reduces the number of features by using Principal Component Analysis (PCA).

Parameters
  • variance (float) – The percentage of the original data variance that should be preserved when reducing the number of features. Defaults to 0.95.

  • n_components (int) – The number of features to maintain after computing SVD. Defaults to None, but will override variance variable if set.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

Real(0.25, 1)}:type: {“variance”

modifies_features

True

modifies_target

False

name

PCA Transformer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the PCA component.

fit_transform

Fit and transform data using the PCA component.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transform data using fitted PCA component.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the PCA component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If input data is not all numeric.

fit_transform(self, X, y=None)[source]#

Fit and transform data using the PCA component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

Raises

ValueError – If input data is not all numeric.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transform data using fitted PCA component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

Raises

ValueError – If input data is not all numeric.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.PerColumnImputer(impute_strategies=None, random_seed=0, **kwargs)[source]#

Imputes missing data according to a specified imputation strategy per column.

Parameters
  • impute_strategies (dict) – Column and {“impute_strategy”: strategy, “fill_value”:value} pairings. Valid values for impute strategy include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to None, which uses “most_frequent” for all columns. When impute_strategy == “constant”, fill_value is used to replace missing data. When None, uses 0 when imputing numerical data and “missing_value” for strings or object data types.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

Per Column Imputer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits imputers on input data.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms input data by imputing missing values.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits imputers on input data.

Parameters
  • X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features] to fit.

  • y (pd.Series, optional) – The target training data of length [n_samples]. Ignored.

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms input data by imputing missing values.

Parameters
  • X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features] to transform.

  • y (pd.Series, optional) – The target training data of length [n_samples]. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.PolynomialDecomposer(time_index: str = None, degree: int = 1, period: int = -1, random_seed: int = 0, **kwargs)[source]#

Removes trends and seasonality from time series by fitting a polynomial and moving average to the data.

Scikit-learn’s PolynomialForecaster is used to generate the additive trend portion of the target data. A polynomial

will be fit to the data during fit. That additive polynomial trend will be removed during fit so that statsmodel’s seasonal_decompose can determine the addititve seasonality of the data by using rolling averages over the series’ inferred periodicity.

For example, daily time series data will generate rolling averages over the first week of data, normalize out the mean and return those 7 averages repeated over the entire length of the given series. Those seven averages, repeated as many times as necessary to match the length of the given target data, will be used as the seasonal signal of the data.

Parameters
  • time_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.

  • degree (int) – Degree for the polynomial. If 1, linear model is fit to the data. If 2, quadratic model is fit, etc. Defaults to 1.

  • period (int) – The number of entries in the time series data that corresponds to one period of a cyclic signal. For instance, if data is known to possess a weekly seasonal signal, and if the data is daily data, period should be 7. For daily data with a yearly seasonal signal, period should be 365. Defaults to -1, which uses the statsmodels libarary’s freq_to_period function. statsmodels/statsmodels

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “degree”: Integer(1, 3)}

invalid_frequencies

[]

modifies_features

False

modifies_target

True

name

Polynomial Decomposer

needs_fitting

True

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

determine_periodicity

Function that uses autocorrelative methods to determine the likely most signficant period of the seasonal signal.

fit

Fits the PolynomialDecomposer and determine the seasonal signal.

fit_transform

Removes fitted trend and seasonality from target variable.

get_trend_dataframe

Return a list of dataframes with 4 columns: signal, trend, seasonality, residual.

inverse_transform

Adds back fitted trend and seasonality to target variable.

is_freq_valid

Determines if the given string represents a valid frequency for this decomposer.

load

Loads component at file path.

parameters

Returns the parameters which were used to initialize the component.

plot_decomposition

Plots the decomposition of the target signal.

save

Saves component at file path.

set_period

Function to set the component's seasonal period based on the target's seasonality.

transform

Transforms the target data by removing the polynomial trend and rolling average seasonality.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

classmethod determine_periodicity(cls, X: pandas.DataFrame, y: pandas.Series, acf_threshold: float = 0.01, rel_max_order: int = 5)#

Function that uses autocorrelative methods to determine the likely most signficant period of the seasonal signal.

Parameters
  • X (pandas.DataFrame) – The feature data of the time series problem.

  • y (pandas.Series) – The target data of a time series problem.

  • acf_threshold (float) – The threshold for the autocorrelation function to determine the period. Any values below the threshold are considered to be 0 and will not be considered for the period. Defaults to 0.01.

  • rel_max_order (int) – The order of the relative maximum to determine the period. Defaults to 5.

Returns

The integer number of entries in time series data over which the seasonal part of the target data

repeats. If the time series data is in days, then this is the number of days that it takes the target’s seasonal signal to repeat. Note: the target data can contain multiple seasonal signals. This function will only return the stronger. E.g. if the target has both weekly and yearly seasonality, the function may return either “7” or “365”, depending on which seasonality is more strongly autocorrelated. If no period is detected, returns None.

Return type

int

fit(self, X: pandas.DataFrame, y: pandas.Series = None) PolynomialDecomposer[source]#

Fits the PolynomialDecomposer and determine the seasonal signal.

Currently only fits the polynomial detrender. The seasonality is determined by removing the trend from the signal and using statsmodels’ seasonal_decompose(). Both the trend and seasonality are currently assumed to be additive.

Parameters
  • X (pd.DataFrame, optional) – Conditionally used to build datetime index.

  • y (pd.Series) – Target variable to detrend and deseasonalize.

Returns

self

Raises
  • NotImplementedError – If the input data has a frequency of “month-begin”. This isn’t supported by statsmodels decompose as the freqstr “MS” is misinterpreted as milliseconds.

  • ValueError – If y is None.

  • ValueError – If target data doesn’t have DatetimeIndex AND no Datetime features in features data

fit_transform(self, X: pandas.DataFrame, y: pandas.Series = None) tuple[pandas.DataFrame, pandas.Series]#

Removes fitted trend and seasonality from target variable.

Parameters
  • X (pd.DataFrame, optional) – Ignored.

  • y (pd.Series) – Target variable to detrend and deseasonalize.

Returns

The first element are the input features returned without modification.

The second element is the target variable y with the fitted trend removed.

Return type

tuple of pd.DataFrame, pd.Series

get_trend_dataframe(self, X: pandas.DataFrame, y: pandas.Series) list[pandas.DataFrame][source]#

Return a list of dataframes with 4 columns: signal, trend, seasonality, residual.

Scikit-learn’s PolynomialForecaster is used to generate the trend portion of the target data. statsmodel’s seasonal_decompose is used to generate the seasonality of the data.

Parameters
  • X (pd.DataFrame) – Input data with time series data in index.

  • y (pd.Series or pd.DataFrame) – Target variable data provided as a Series for univariate problems or a DataFrame for multivariate problems.

Returns

Each DataFrame contains the columns “signal”, “trend”, “seasonality” and “residual,”

with the latter 3 column values being the decomposed elements of the target data. The “signal” column is simply the input target signal but reindexed with a datetime index to match the input features.

Return type

list of pd.DataFrame

Raises
  • TypeError – If X does not have time-series data in the index.

  • ValueError – If time series index of X does not have an inferred frequency.

  • ValueError – If the forecaster associated with the detrender has not been fit yet.

  • TypeError – If y is not provided as a pandas Series or DataFrame.

inverse_transform(self, y_t: pandas.Series) tuple[pandas.DataFrame, pandas.Series][source]#

Adds back fitted trend and seasonality to target variable.

The polynomial trend is added back into the signal, calling the detrender’s inverse_transform(). Then, the seasonality is projected forward to and added back into the signal.

Parameters

y_t (pd.Series) – Target variable.

Returns

The first element are the input features returned without modification.

The second element is the target variable y with the trend and seasonality added back in.

Return type

tuple of pd.DataFrame, pd.Series

Raises

ValueError – If y is None.

classmethod is_freq_valid(cls, freq: str)#

Determines if the given string represents a valid frequency for this decomposer.

Parameters

freq (str) – A frequency to validate. See the pandas docs at https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases for options.

Returns

boolean representing whether the frequency is valid or not.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

property parameters(self)#

Returns the parameters which were used to initialize the component.

plot_decomposition(self, X: pandas.DataFrame, y: Union[pandas.Series, pandas.DataFrame], show: bool = False) Union[tuple[matplotlib.pyplot.Figure, list], dict[str, tuple[matplotlib.pyplot.Figure]]]#

Plots the decomposition of the target signal.

Parameters
  • X (pd.DataFrame) – Input data with time series data in index.

  • y (pd.Series or pd.DataFrame) – Target variable data provided as a Series for univariate problems or a DataFrame for multivariate problems.

  • show (bool) – Whether to display the plot or not. Defaults to False.

Returns

The figure and axes that have the decompositions

plotted on them

(Multi series) dict[str, (matplotlib.pyplot.Figure, list[matplotlib.pyplot.Axes])]: A dictionary that maps the series id

to the figure and axes that have the decompositions plotted on them

Return type

(Single series) matplotlib.pyplot.Figure, list[matplotlib.pyplot.Axes]

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

set_period(self, X: pandas.DataFrame, y: pandas.Series, acf_threshold: float = 0.01, rel_max_order: int = 5)#

Function to set the component’s seasonal period based on the target’s seasonality.

Parameters
  • X (pandas.DataFrame) – The feature data of the time series problem.

  • y (pandas.Series) – The target data of a time series problem.

  • acf_threshold (float) – The threshold for the autocorrelation function to determine the period. Any values below the threshold are considered to be 0 and will not be considered for the period. Defaults to 0.01.

  • rel_max_order (int) – The order of the relative maximum to determine the period. Defaults to 5.

transform(self, X: pandas.DataFrame, y: pandas.Series = None) tuple[pandas.DataFrame, pandas.Series][source]#

Transforms the target data by removing the polynomial trend and rolling average seasonality.

Applies the fit polynomial detrender to the target data, removing the additive polynomial trend. Then, utilizes the first period’s worth of seasonal data determined in the .fit() function to extrapolate the seasonal signal of the data to be transformed. This seasonal signal is also assumed to be additive and is removed.

Parameters
  • X (pd.DataFrame, optional) – Conditionally used to build datetime index.

  • y (pd.Series) – Target variable to detrend and deseasonalize.

Returns

The input features are returned without modification. The target

variable y is detrended and deseasonalized.

Return type

tuple of pd.DataFrame, pd.Series

Raises

ValueError – If target data doesn’t have DatetimeIndex AND no Datetime features in features data

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.ProphetRegressor(time_index: Optional[Hashable] = None, changepoint_prior_scale: float = 0.05, seasonality_prior_scale: int = 10, holidays_prior_scale: int = 10, seasonality_mode: str = 'additive', stan_backend: str = 'CMDSTANPY', interval_width: float = 0.95, random_seed: Union[int, float] = 0, **kwargs)[source]#

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

More information here: https://facebook.github.io/prophet/

Parameters
  • time_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.

  • changepoint_prior_scale (float) – Determines the strength of the sparse prior for fitting on rate changes. Increasing this value increases the flexibility of the trend. Defaults to 0.05.

  • seasonality_prior_scale (int) – Similar to changepoint_prior_scale. Adjusts the extent to which the seasonality model will fit the data. Defaults to 10.

  • holidays_prior_scale (int) – Similar to changepoint_prior_scale. Adjusts the extent to which holidays will fit the data. Defaults to 10.

  • seasonality_mode (str) – Determines how this component fits the seasonality. Options are “additive” and “multiplicative”. Defaults to “additive”.

  • stan_backend (str) – Determines the backend that should be used to run Prophet. Options are “CMDSTANPY” and “PYSTAN”. Defaults to “CMDSTANPY”.

  • interval_width (float) – Determines the confidence of the prediction interval range when calling get_prediction_intervals. Accepts values in the range (0,1). Defaults to 0.95.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “changepoint_prior_scale”: Real(0.001, 0.5), “seasonality_prior_scale”: Real(0.01, 10), “holidays_prior_scale”: Real(0.01, 10), “seasonality_mode”: [“additive”, “multiplicative”],}

model_family

ModelFamily.PROPHET

modifies_features

True

modifies_target

False

name

Prophet Regressor

supported_problem_types

[ProblemTypes.TIME_SERIES_REGRESSION]

training_only

False

Methods

build_prophet_df

Build the Prophet data to pass fit and predict on.

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns array of 0's with len(1) as feature_importance is not defined for Prophet regressor.

fit

Fits Prophet regressor component to data.

get_params

Get parameters for the Prophet regressor.

get_prediction_intervals

Find the prediction intervals using the fitted ProphetRegressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using fitted Prophet regressor.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

static build_prophet_df(X: pandas.DataFrame, y: Optional[pandas.Series] = None, time_index: str = 'ds') pandas.DataFrame[source]#

Build the Prophet data to pass fit and predict on.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls) dict#

Returns the default parameters for this component.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) numpy.ndarray#

Returns array of 0’s with len(1) as feature_importance is not defined for Prophet regressor.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)[source]#

Fits Prophet regressor component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

get_params(self) dict[source]#

Get parameters for the Prophet regressor.

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series][source]#

Find the prediction intervals using the fitted ProphetRegressor.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (List[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Not used for Prophet estimator.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) pandas.Series[source]#

Make predictions using fitted Prophet regressor.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

Returns

Predicted values.

Return type

pd.Series

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.RandomForestClassifier(n_estimators=100, max_depth=6, n_jobs=-1, random_seed=0, **kwargs)[source]#

Random Forest Classifier.

Parameters
  • n_estimators (float) – The number of trees in the forest. Defaults to 100.

  • max_depth (int) – Maximum tree depth for base learners. Defaults to 6.

  • n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “n_estimators”: Integer(10, 1000), “max_depth”: Integer(1, 10),}

model_family

ModelFamily.RANDOM_FOREST

modifies_features

True

modifies_target

False

name

Random Forest Classifier

supported_problem_types

[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns importance associated with each feature.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) pandas.Series#

Returns importance associated with each feature.

Returns

Importance associated with each feature.

Return type

np.ndarray

Raises

MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.RandomForestRegressor(n_estimators: int = 100, max_depth: int = 6, n_jobs: int = -1, random_seed: Union[int, float] = 0, **kwargs)[source]#

Random Forest Regressor.

Parameters
  • n_estimators (float) – The number of trees in the forest. Defaults to 100.

  • max_depth (int) – Maximum tree depth for base learners. Defaults to 6.

  • n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “n_estimators”: Integer(10, 1000), “max_depth”: Integer(1, 32),}

model_family

ModelFamily.RANDOM_FOREST

modifies_features

True

modifies_target

False

name

Random Forest Regressor

supported_problem_types

[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION, ProblemTypes.MULTISERIES_TIME_SERIES_REGRESSION,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns importance associated with each feature.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted RandomForestRegressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) pandas.Series#

Returns importance associated with each feature.

Returns

Importance associated with each feature.

Return type

np.ndarray

Raises

MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series][source]#

Find the prediction intervals using the fitted RandomForestRegressor.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Optional.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.ReplaceNullableTypes(random_seed=0, **kwargs)[source]#

Transformer to replace features with the new nullable dtypes with a dtype that is compatible in EvalML.

Attributes

hyperparameter_ranges

None

modifies_features

True

modifies_target

{}

name

Replace Nullable Types Transformer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

fit_transform

Substitutes non-nullable types for the new pandas nullable types in the data and target data.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data by replacing columns that contain nullable types with the appropriate replacement type.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y=None)[source]#

Substitutes non-nullable types for the new pandas nullable types in the data and target data.

Parameters
  • X (pd.DataFrame, optional) – Input features.

  • y (pd.Series) – Target data.

Returns

The input features and target data with the non-nullable types set.

Return type

tuple of pd.DataFrame, pd.Series

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data by replacing columns that contain nullable types with the appropriate replacement type.

“float64” for nullable integers and “category” for nullable booleans.

Parameters
  • X (pd.DataFrame) – Data to transform

  • y (pd.Series, optional) – Target data to transform

Returns

Transformed X pd.Series: Transformed y

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.RFClassifierRFESelector(step=0.2, min_features_to_select=1, cv=None, scoring=None, n_jobs=-1, n_estimators=10, max_depth=None, random_seed=0, **kwargs)[source]#

Selects relevant features using recursive feature elimination with a Random Forest Classifier.

Parameters
  • step (int, float) – The number of features to eliminate in each iteration. If an integer is specified this will represent the number of features to eliminate. If a float is specified this represents the percentage of features to eliminate each iteration. The last iteration may drop fewer than this number of features in order to satisfy the min_features_to_select constraint. Defaults to 0.2.

  • min_features_to_select (int) – The minimum number of features to return. Defaults to 1.

  • cv (int or None) – Number of folds to use for the cross-validation splitting strategy. Defaults to None which will use 5 folds.

  • scoring (str, callable or None) – A string or scorer callable object to specify the scoring method.

  • n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.

  • n_estimators (int) – The number of trees in the forest. Defaults to 10.

  • max_depth (int) – Maximum tree depth for base learners. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “step”: Real(0.05, 0.25)}

modifies_features

True

modifies_target

False

name

RFE Selector with RF Classifier

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

fit_transform

Fit and transform data using the feature selector.

get_names

Get names of selected features.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)#

Fit and transform data using the feature selector.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

get_names(self)#

Get names of selected features.

Returns

List of the names of features selected.

Return type

list[str]

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)#

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

Parameters
  • X (pd.DataFrame) – Data to transform.

  • y (pd.Series, optional) – Target data. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If feature selector does not have a transform method or a component_obj that implements transform

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.RFClassifierSelectFromModel(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold='median', n_jobs=-1, random_seed=0, **kwargs)[source]#

Selects top features based on importance weights using a Random Forest classifier.

Parameters
  • number_features (int) – The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to None.

  • n_estimators (int) – The number of trees in the forest. Defaults to 10.

  • max_depth (int) – Maximum tree depth for base learners. Defaults to None.

  • percent_features (float) – Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5.

  • threshold (string or float) – The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median”, then the threshold value is the median of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. Defaults to median.

  • n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “percent_features”: Real(0.01, 1), “threshold”: [“mean”, “median”],}

modifies_features

True

modifies_target

False

name

RF Classifier Select From Model

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

fit_transform

Fit and transform data using the feature selector.

get_names

Get names of selected features.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)#

Fit and transform data using the feature selector.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

get_names(self)#

Get names of selected features.

Returns

List of the names of features selected.

Return type

list[str]

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)#

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

Parameters
  • X (pd.DataFrame) – Data to transform.

  • y (pd.Series, optional) – Target data. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If feature selector does not have a transform method or a component_obj that implements transform

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.RFRegressorRFESelector(step=0.2, min_features_to_select=1, cv=None, scoring=None, n_jobs=-1, n_estimators=10, max_depth=None, random_seed=0, **kwargs)[source]#

Selects relevant features using recursive feature elimination with a Random Forest Regressor.

Parameters
  • step (int, float) – The number of features to eliminate in each iteration. If an integer is specified this will represent the number of features to eliminate. If a float is specified this represents the percentage of features to eliminate each iteration. The last iteration may drop fewer than this number of features in order to satisfy the min_features_to_select constraint. Defaults to 0.2.

  • min_features_to_select (int) – The minimum number of features to return. Defaults to 1.

  • cv (int or None) – Number of folds to use for the cross-validation splitting strategy. Defaults to None which will use 5 folds.

  • scoring (str, callable or None) – A string or scorer callable object to specify the scoring method.

  • n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.

  • n_estimators (int) – The number of trees in the forest. Defaults to 10.

  • max_depth (int) – Maximum tree depth for base learners. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “step”: Real(0.05, 0.25)}

modifies_features

True

modifies_target

False

name

RFE Selector with RF Regressor

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

fit_transform

Fit and transform data using the feature selector.

get_names

Get names of selected features.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)#

Fit and transform data using the feature selector.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

get_names(self)#

Get names of selected features.

Returns

List of the names of features selected.

Return type

list[str]

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)#

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

Parameters
  • X (pd.DataFrame) – Data to transform.

  • y (pd.Series, optional) – Target data. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If feature selector does not have a transform method or a component_obj that implements transform

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.RFRegressorSelectFromModel(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold='median', n_jobs=-1, random_seed=0, **kwargs)[source]#

Selects top features based on importance weights using a Random Forest regressor.

Parameters
  • number_features (int) – The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5.

  • n_estimators (int) – The number of trees in the forest. Defaults to 10.

  • max_depth (int) – Maximum tree depth for base learners. Defaults to None.

  • percent_features (float) – Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5.

  • threshold (string or float) – The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median”, then the threshold value is the median of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. Defaults to median.

  • n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “percent_features”: Real(0.01, 1), “threshold”: [“mean”, “median”],}

modifies_features

True

modifies_target

False

name

RF Regressor Select From Model

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

fit_transform

Fit and transform data using the feature selector.

get_names

Get names of selected features.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)#

Fit and transform data using the feature selector.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

get_names(self)#

Get names of selected features.

Returns

List of the names of features selected.

Return type

list[str]

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)#

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

Parameters
  • X (pd.DataFrame) – Data to transform.

  • y (pd.Series, optional) – Target data. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If feature selector does not have a transform method or a component_obj that implements transform

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.SelectByType(column_types=None, exclude=False, random_seed=0, **kwargs)[source]#

Selects columns by specified Woodwork logical type or semantic tag in input data.

Parameters
  • column_types (string, ww.LogicalType, list(string), list(ww.LogicalType)) – List of Woodwork types or tags, used to determine which columns to select or exclude.

  • exclude (bool) – If true, exclude the column_types instead of including them. Defaults to False.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

Select Columns By Type Transformer

needs_fitting

False

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the transformer by checking if column names are present in the dataset.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data X by selecting columns.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the transformer by checking if column names are present in the dataset.

Parameters
  • X (pd.DataFrame) – Data to check.

  • y (pd.Series, ignored) – Targets.

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by selecting columns.

Parameters
  • X (pd.DataFrame) – Data to transform.

  • y (pd.Series, optional) – Targets.

Returns

Transformed X.

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.SelectColumns(columns=None, random_seed=0, **kwargs)[source]#

Selects specified columns in input data.

Parameters
  • columns (list(string)) – List of column names, used to determine which columns to select. If columns are not present, they will not be selected.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

Select Columns Transformer

needs_fitting

False

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the transformer by checking if column names are present in the dataset.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transform data using fitted column selector component.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the transformer by checking if column names are present in the dataset.

Parameters
  • X (pd.DataFrame) – Data to check.

  • y (pd.Series, optional) – Targets.

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)#

Transform data using fitted column selector component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.SimpleImputer(impute_strategy='most_frequent', fill_value=None, random_seed=0, **kwargs)[source]#

Imputes missing data according to a specified imputation strategy. Natural language columns are ignored.

Parameters
  • impute_strategy (string) – Impute strategy to use. Valid values include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types.

  • fill_value (string) – When impute_strategy == “constant”, fill_value is used to replace missing data. Defaults to 0 when imputing numerical data and “missing_value” for strings or object data types.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “impute_strategy”: [“mean”, “median”, “most_frequent”]}

modifies_features

True

modifies_target

False

name

Simple Imputer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits imputer to data. 'None' values are converted to np.nan before imputation and are treated as the same.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms input by imputing missing values. 'None' and np.nan values are treated as the same.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are treated as the same.

Parameters
  • X (pd.DataFrame or np.ndarray) – the input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – the target training data of length [n_samples]

Returns

self

Raises

ValueError – if the SimpleImputer receives a dataframe with both Boolean and Categorical data.

fit_transform(self, X, y=None)[source]#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform

  • y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms input by imputing missing values. ‘None’ and np.nan values are treated as the same.

Parameters
  • X (pd.DataFrame) – Data to transform.

  • y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.StackedEnsembleBase(final_estimator=None, n_jobs=-1, random_seed=0, **kwargs)[source]#

Stacked Ensemble Base Class.

Parameters
  • final_estimator (Estimator or subclass) – The estimator used to combine the base estimators.

  • n_jobs (int or None) – Integer describing level of parallelism used for pipelines. None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs greater than -1, (n_cpus + 1 + n_jobs) are used. Defaults to -1. - Note: there could be some multi-process errors thrown for values of n_jobs != 1. If this is the case, please use n_jobs = 1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

model_family

ModelFamily.ENSEMBLE

modifies_features

True

modifies_target

False

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for stacked ensemble classes.

describe

Describe a component and its parameters.

feature_importance

Not implemented for StackedEnsembleClassifier and StackedEnsembleRegressor.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

name

Returns string name of this component.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

supported_problem_types

Problem types this estimator supports.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for stacked ensemble classes.

Returns

default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Not implemented for StackedEnsembleClassifier and StackedEnsembleRegressor.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

property name(cls)#

Returns string name of this component.

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

property supported_problem_types(cls)#

Problem types this estimator supports.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.StackedEnsembleClassifier(final_estimator=None, n_jobs=-1, random_seed=0, **kwargs)[source]#

Stacked Ensemble Classifier.

Parameters
  • final_estimator (Estimator or subclass) – The classifier used to combine the base estimators. If None, uses ElasticNetClassifier.

  • n_jobs (int or None) – Integer describing level of parallelism used for pipelines. None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Defaults to -1. - Note: there could be some multi-process errors thrown for values of n_jobs != 1. If this is the case, please use n_jobs = 1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Example

>>> from evalml.pipelines.component_graph import ComponentGraph
>>> from evalml.pipelines.components.estimators.classifiers.decision_tree_classifier import DecisionTreeClassifier
>>> from evalml.pipelines.components.estimators.classifiers.elasticnet_classifier import ElasticNetClassifier
...
>>> component_graph = {
...     "Decision Tree": [DecisionTreeClassifier(random_seed=3), "X", "y"],
...     "Decision Tree B": [DecisionTreeClassifier(random_seed=4), "X", "y"],
...     "Stacked Ensemble": [
...         StackedEnsembleClassifier(n_jobs=1, final_estimator=DecisionTreeClassifier()),
...         "Decision Tree.x",
...         "Decision Tree B.x",
...         "y",
...     ],
... }
...
>>> cg = ComponentGraph(component_graph)
>>> assert cg.default_parameters == {
...     'Decision Tree Classifier': {'criterion': 'gini',
...                                  'max_features': 'sqrt',
...                                  'max_depth': 6,
...                                  'min_samples_split': 2,
...                                  'min_weight_fraction_leaf': 0.0},
...     'Stacked Ensemble Classifier': {'final_estimator': ElasticNetClassifier,
...                                     'n_jobs': -1}}

Attributes

hyperparameter_ranges

{}

model_family

ModelFamily.ENSEMBLE

modifies_features

True

modifies_target

False

name

Stacked Ensemble Classifier

supported_problem_types

[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for stacked ensemble classes.

describe

Describe a component and its parameters.

feature_importance

Not implemented for StackedEnsembleClassifier and StackedEnsembleRegressor.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for stacked ensemble classes.

Returns

default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Not implemented for StackedEnsembleClassifier and StackedEnsembleRegressor.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.StackedEnsembleRegressor(final_estimator=None, n_jobs=-1, random_seed=0, **kwargs)[source]#

Stacked Ensemble Regressor.

Parameters
  • final_estimator (Estimator or subclass) – The regressor used to combine the base estimators. If None, uses ElasticNetRegressor.

  • n_jobs (int or None) – Integer describing level of parallelism used for pipelines. None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs greater than -1, (n_cpus + 1 + n_jobs) are used. Defaults to -1. - Note: there could be some multi-process errors thrown for values of n_jobs != 1. If this is the case, please use n_jobs = 1.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Example

>>> from evalml.pipelines.component_graph import ComponentGraph
>>> from evalml.pipelines.components.estimators.regressors.rf_regressor import RandomForestRegressor
>>> from evalml.pipelines.components.estimators.regressors.elasticnet_regressor import ElasticNetRegressor
...
>>> component_graph = {
...     "Random Forest": [RandomForestRegressor(random_seed=3), "X", "y"],
...     "Random Forest B": [RandomForestRegressor(random_seed=4), "X", "y"],
...     "Stacked Ensemble": [
...         StackedEnsembleRegressor(n_jobs=1, final_estimator=RandomForestRegressor()),
...         "Random Forest.x",
...         "Random Forest B.x",
...         "y",
...     ],
... }
...
>>> cg = ComponentGraph(component_graph)
>>> assert cg.default_parameters == {
...     'Random Forest Regressor': {'n_estimators': 100,
...                                 'max_depth': 6,
...                                 'n_jobs': -1},
...     'Stacked Ensemble Regressor': {'final_estimator': ElasticNetRegressor,
...                                    'n_jobs': -1}}

Attributes

hyperparameter_ranges

{}

model_family

ModelFamily.ENSEMBLE

modifies_features

True

modifies_target

False

name

Stacked Ensemble Regressor

supported_problem_types

[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for stacked ensemble classes.

describe

Describe a component and its parameters.

feature_importance

Not implemented for StackedEnsembleClassifier and StackedEnsembleRegressor.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for stacked ensemble classes.

Returns

default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Not implemented for StackedEnsembleClassifier and StackedEnsembleRegressor.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.StandardScaler(random_seed=0, **kwargs)[source]#

A transformer that standardizes input features by removing the mean and scaling to unit variance.

Parameters

random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

Standard Scaler

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the standard scalar on the given data.

fit_transform

Fit and transform data using the standard scaler component.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transform data using the fitted standard scaler.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the standard scalar on the given data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y=None)[source]#

Fit and transform data using the standard scaler component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transform data using the fitted standard scaler.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.STLDecomposer(time_index: str = None, series_id: str = None, degree: int = 1, period: int = None, periods: dict = None, seasonal_smoother: int = 7, random_seed: int = 0, **kwargs)[source]#

Removes trends and seasonality from time series using the STL algorithm.

https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.STL.html

Parameters
  • time_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.

  • series_id (str) – Specifies the name of the column in X that provides the series_id objects for multiseries. Defaults to None.

  • degree (int) – Not currently used. STL 3x “degree-like” values. None are able to be set at this time. Defaults to 1.

  • period (int) – The number of entries in the time series data that corresponds to one period of a cyclic signal. For instance, if data is known to possess a weekly seasonal signal, and if the data is daily data, the period should likely be 7. For daily data with a yearly seasonal signal, the period should likely be 365. If None, statsmodels will infer the period based on the frequency. Defaults to None.

  • seasonal_smoother (int) – The length of the seasonal smoother used by the underlying STL algorithm. For compatibility, must be odd. If an even number is provided, the next, highest odd number will be used. Defaults to 7.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

None

invalid_frequencies

[]

modifies_features

False

modifies_target

True

name

STL Decomposer

needs_fitting

True

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

determine_periodicity

Function that uses autocorrelative methods to determine the likely most signficant period of the seasonal signal.

fit

Fits the STLDecomposer and determine the seasonal signal.

fit_transform

Removes fitted trend and seasonality from target variable.

get_trend_dataframe

Return a list of dataframes with 4 columns: signal, trend, seasonality, residual.

get_trend_prediction_intervals

Calculate the prediction intervals for the trend data.

inverse_transform

Adds back fitted trend and seasonality to target variable.

is_freq_valid

Determines if the given string represents a valid frequency for this decomposer.

load

Loads component at file path.

parameters

Returns the parameters which were used to initialize the component.

plot_decomposition

Plots the decomposition of the target signal.

save

Saves component at file path.

set_period

Function to set the component's seasonal period based on the target's seasonality.

transform

Transforms the target data by removing the STL trend and seasonality.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

classmethod determine_periodicity(cls, X: pandas.DataFrame, y: pandas.Series, acf_threshold: float = 0.01, rel_max_order: int = 5)#

Function that uses autocorrelative methods to determine the likely most signficant period of the seasonal signal.

Parameters
  • X (pandas.DataFrame) – The feature data of the time series problem.

  • y (pandas.Series) – The target data of a time series problem.

  • acf_threshold (float) – The threshold for the autocorrelation function to determine the period. Any values below the threshold are considered to be 0 and will not be considered for the period. Defaults to 0.01.

  • rel_max_order (int) – The order of the relative maximum to determine the period. Defaults to 5.

Returns

The integer number of entries in time series data over which the seasonal part of the target data

repeats. If the time series data is in days, then this is the number of days that it takes the target’s seasonal signal to repeat. Note: the target data can contain multiple seasonal signals. This function will only return the stronger. E.g. if the target has both weekly and yearly seasonality, the function may return either “7” or “365”, depending on which seasonality is more strongly autocorrelated. If no period is detected, returns None.

Return type

int

fit(self, X: pandas.DataFrame, y: Union[pandas.Series, pandas.DataFrame] = None) STLDecomposer[source]#

Fits the STLDecomposer and determine the seasonal signal.

Instantiates a statsmodels STL decompose object with the component’s stored parameters and fits it. Since the statsmodels object does not fit the sklearn api, it is not saved during __init__() in _component_obj and will be re-instantiated each time fit is called.

To emulate the sklearn API, when the STL decomposer is fit, the full seasonal component, a single period sample of the seasonal component, the full trend-cycle component and the residual are saved.

y(t) = S(t) + T(t) + R(t)

Parameters
  • X (pd.DataFrame, optional) – Conditionally used to build datetime index.

  • y (pd.Series or pd.DataFrame) – Target variable to detrend and deseasonalize.

Returns

self

Raises
  • ValueError – If y is None.

  • ValueError – If target data doesn’t have DatetimeIndex AND no Datetime features in features data

fit_transform(self, X: pandas.DataFrame, y: pandas.Series = None) tuple[pandas.DataFrame, pandas.Series]#

Removes fitted trend and seasonality from target variable.

Parameters
  • X (pd.DataFrame, optional) – Ignored.

  • y (pd.Series) – Target variable to detrend and deseasonalize.

Returns

The first element are the input features returned without modification.

The second element is the target variable y with the fitted trend removed.

Return type

tuple of pd.DataFrame, pd.Series

get_trend_dataframe(self, X, y)[source]#

Return a list of dataframes with 4 columns: signal, trend, seasonality, residual.

Parameters
  • X (pd.DataFrame) – Input data with time series data in index.

  • y (pd.Series or pd.DataFrame) – Target variable data provided as a Series for univariate problems or a DataFrame for multivariate problems.

Returns

Each DataFrame contains the columns “signal”, “trend”, “seasonality” and “residual,”

with the latter 3 column values being the decomposed elements of the target data. The “signal” column is simply the input target signal but reindexed with a datetime index to match the input features.

(Multi series) dictionary of lists: Series id maps to a list of pd.DataFrames that each contain the columns “signal”, “trend”, “seasonality” and “residual”

Return type

(Single series) list of pd.DataFrame

Raises
  • TypeError – If X does not have time-series data in the index.

  • ValueError – If time series index of X does not have an inferred frequency.

  • ValueError – If the forecaster associated with the detrender has not been fit yet.

  • TypeError – If y is not provided as a pandas Series or DataFrame.

get_trend_prediction_intervals(self, y, coverage=None)[source]#

Calculate the prediction intervals for the trend data.

Parameters
  • y (pd.Series or pd.DataFrame) – Target data.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. (Multi series) dict of dict of pd.Series: Each series id maps to a dictionary of prediction intervals

Return type

(Single series) dict of pd.Series

inverse_transform(self, y_t: Union[pandas.Series, pandas.DataFrame]) Union[pandas.Series, pandas.DataFrame][source]#

Adds back fitted trend and seasonality to target variable.

The STL trend is projected to cover the entire requested target range, then added back into the signal. Then, the seasonality is projected forward to and added back into the signal.

Parameters

y_t (pd.Series or pd.DataFrame) – Target variable.

Returns

The target variable y with the trend and seasonality added back in.

Return type

pd.Series or pd.DataFrame

Raises

ValueError – If y is None.

classmethod is_freq_valid(cls, freq: str)#

Determines if the given string represents a valid frequency for this decomposer.

Parameters

freq (str) – A frequency to validate. See the pandas docs at https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases for options.

Returns

boolean representing whether the frequency is valid or not.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

property parameters(self)#

Returns the parameters which were used to initialize the component.

plot_decomposition(self, X: pandas.DataFrame, y: Union[pandas.Series, pandas.DataFrame], show: bool = False) Union[tuple[matplotlib.pyplot.Figure, list], dict[str, tuple[matplotlib.pyplot.Figure]]]#

Plots the decomposition of the target signal.

Parameters
  • X (pd.DataFrame) – Input data with time series data in index.

  • y (pd.Series or pd.DataFrame) – Target variable data provided as a Series for univariate problems or a DataFrame for multivariate problems.

  • show (bool) – Whether to display the plot or not. Defaults to False.

Returns

The figure and axes that have the decompositions

plotted on them

(Multi series) dict[str, (matplotlib.pyplot.Figure, list[matplotlib.pyplot.Axes])]: A dictionary that maps the series id

to the figure and axes that have the decompositions plotted on them

Return type

(Single series) matplotlib.pyplot.Figure, list[matplotlib.pyplot.Axes]

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

set_period(self, X: pandas.DataFrame, y: pandas.Series, acf_threshold: float = 0.01, rel_max_order: int = 5)#

Function to set the component’s seasonal period based on the target’s seasonality.

Parameters
  • X (pandas.DataFrame) – The feature data of the time series problem.

  • y (pandas.Series) – The target data of a time series problem.

  • acf_threshold (float) – The threshold for the autocorrelation function to determine the period. Any values below the threshold are considered to be 0 and will not be considered for the period. Defaults to 0.01.

  • rel_max_order (int) – The order of the relative maximum to determine the period. Defaults to 5.

transform(self, X: pandas.DataFrame, y: Union[pandas.Series, pandas.DataFrame] = None) Union[tuple[pandas.DataFrame, pandas.Series], tuple[pandas.DataFrame, pandas.DataFrame]][source]#

Transforms the target data by removing the STL trend and seasonality.

Uses an ARIMA model to project forward the addititve trend and removes it. Then, utilizes the first period’s worth of seasonal data determined in the .fit() function to extrapolate the seasonal signal of the data to be transformed. This seasonal signal is also assumed to be additive and is removed.

Parameters
  • X (pd.DataFrame, optional) – Conditionally used to build datetime index.

  • y (pd.Series or pd.DataFrame) – Target variable to detrend and deseasonalize.

Returns

The list of input features are returned without modification. The target

variable y is detrended and deseasonalized.

(Multi series) pd.DataFrame, pd.DataFrame: The list of input features are returned without modification. The target

variable y is detrended and deseasonalized.

Return type

(Single series) pd.DataFrame, pd.Series

Raises

ValueError – If target data doesn’t have DatetimeIndex AND no Datetime features in features data

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.SVMClassifier(C=1.0, kernel='rbf', gamma='auto', probability=True, random_seed=0, **kwargs)[source]#

Support Vector Machine Classifier.

Parameters
  • C (float) – The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0.

  • kernel ({"poly", "rbf", "sigmoid"}) – Specifies the kernel type to be used in the algorithm. Defaults to “rbf”.

  • gamma ({"scale", "auto"} or float) – Kernel coefficient for “rbf”, “poly” and “sigmoid”. Defaults to “auto”. - If gamma=’scale’ is passed then it uses 1 / (n_features * X.var()) as value of gamma - If “auto” (default), uses 1 / n_features

  • probability (boolean) – Whether to enable probability estimates. Defaults to True.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “C”: Real(0, 10), “kernel”: [“poly”, “rbf”, “sigmoid”], “gamma”: [“scale”, “auto”],}

model_family

ModelFamily.SVM

modifies_features

True

modifies_target

False

name

SVM Classifier

supported_problem_types

[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Feature importance only works with linear kernels.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Feature importance only works with linear kernels.

If the kernel isn’t linear, we return a numpy array of zeros.

Returns

Feature importance of fitted SVM classifier or a numpy array of zeroes if the kernel is not linear.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.SVMRegressor(C=1.0, kernel='rbf', gamma='auto', random_seed=0, **kwargs)[source]#

Support Vector Machine Regressor.

Parameters
  • C (float) – The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0.

  • kernel ({"poly", "rbf", "sigmoid"}) – Specifies the kernel type to be used in the algorithm. Defaults to “rbf”.

  • gamma ({"scale", "auto"} or float) – Kernel coefficient for “rbf”, “poly” and “sigmoid”. Defaults to “auto”. - If gamma=’scale’ is passed then it uses 1 / (n_features * X.var()) as value of gamma - If “auto” (default), uses 1 / n_features

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “C”: Real(0, 10), “kernel”: [“poly”, “rbf”, “sigmoid”], “gamma”: [“scale”, “auto”],}

model_family

ModelFamily.SVM

modifies_features

True

modifies_target

False

name

SVM Regressor

supported_problem_types

[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Feature importance of fitted SVM regresor.

fit

Fits estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Feature importance of fitted SVM regresor.

Only works with linear kernels. If the kernel isn’t linear, we return a numpy array of zeros.

Returns

The feature importance of the fitted SVM regressor, or an array of zeroes if the kernel is not linear.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)#

Fits estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series#

Make predictions using selected features.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.TargetEncoder(cols=None, smoothing=1, handle_unknown='value', handle_missing='value', random_seed=0, **kwargs)[source]#

A transformer that encodes categorical features into target encodings.

Parameters
  • cols (list) – Columns to encode. If None, all string columns will be encoded, otherwise only the columns provided will be encoded. Defaults to None

  • smoothing (float) – The smoothing factor to apply. The larger this value is, the more influence the expected target value has on the resulting target encodings. Must be strictly larger than 0. Defaults to 1.0

  • handle_unknown (string) – Determines how to handle unknown categories for a feature encountered. Options are ‘value’, ‘error’, nd ‘return_nan’. Defaults to ‘value’, which replaces with the target mean

  • handle_missing (string) – Determines how to handle missing values encountered during fit or transform. Options are ‘value’, ‘error’, and ‘return_nan’. Defaults to ‘value’, which replaces with the target mean

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

Target Encoder

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the target encoder.

fit_transform

Fit and transform data using the target encoder.

get_feature_names

Return feature names for the input features after fitting.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transform data using the fitted target encoder.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)[source]#

Fits the target encoder.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y)[source]#

Fit and transform data using the target encoder.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

get_feature_names(self)[source]#

Return feature names for the input features after fitting.

Returns

The feature names after encoding.

Return type

np.array

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transform data using the fitted target encoder.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.TargetImputer(impute_strategy='most_frequent', fill_value=None, random_seed=0, **kwargs)[source]#

Imputes missing target data according to a specified imputation strategy.

Parameters
  • impute_strategy (string) – Impute strategy to use. Valid values include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to “most_frequent”.

  • fill_value (string) – When impute_strategy == “constant”, fill_value is used to replace missing data. Defaults to None which uses 0 when imputing numerical data and “missing_value” for strings or object data types.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “impute_strategy”: [“mean”, “median”, “most_frequent”]}

modifies_features

False

modifies_target

True

name

Target Imputer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits imputer to target data. 'None' values are converted to np.nan before imputation and are treated as the same.

fit_transform

Fits on and transforms the input target data.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms input target data by imputing missing values. 'None' and np.nan values are treated as the same.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)[source]#

Fits imputer to target data. ‘None’ values are converted to np.nan before imputation and are treated as the same.

Parameters
  • X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]. Ignored.

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

TypeError – If target is filled with all null values.

fit_transform(self, X, y)[source]#

Fits on and transforms the input target data.

Parameters
  • X (pd.DataFrame) – Features. Ignored.

  • y (pd.Series) – Target data to impute.

Returns

The original X, transformed y

Return type

(pd.DataFrame, pd.Series)

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y)[source]#

Transforms input target data by imputing missing values. ‘None’ and np.nan values are treated as the same.

Parameters
  • X (pd.DataFrame) – Features. Ignored.

  • y (pd.Series) – Target data to impute.

Returns

The original X, transformed y

Return type

(pd.DataFrame, pd.Series)

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.TimeSeriesBaselineEstimator(gap=1, forecast_horizon=1, random_seed=0, **kwargs)[source]#

Time series estimator that predicts using the naive forecasting approach.

This is useful as a simple baseline estimator for time series problems.

Parameters
  • gap (int) – Gap between prediction date and target date and must be a positive integer. If gap is 0, target date will be shifted ahead by 1 time period. Defaults to 1.

  • forecast_horizon (int) – Number of time steps the model is expected to predict.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

model_family

ModelFamily.BASELINE

modifies_features

True

modifies_target

False

name

Time Series Baseline Estimator

supported_problem_types

[ ProblemTypes.TIME_SERIES_REGRESSION, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns importance associated with each feature.

fit

Fits time series baseline estimator to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using fitted time series baseline estimator.

predict_proba

Make prediction probabilities using fitted time series baseline estimator.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Returns importance associated with each feature.

Since baseline estimators do not use input features to calculate predictions, returns an array of zeroes.

Returns

An array of zeroes.

Return type

np.ndarray (float)

fit(self, X, y=None)[source]#

Fits time series baseline estimator to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If input y is None.

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X)[source]#

Make predictions using fitted time series baseline estimator.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

Raises

ValueError – If input y is None.

predict_proba(self, X)[source]#

Make prediction probabilities using fitted time series baseline estimator.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted probability values.

Return type

pd.DataFrame

Raises

ValueError – If input y is None.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.TimeSeriesFeaturizer(time_index=None, max_delay=2, gap=0, forecast_horizon=1, conf_level=0.05, rolling_window_size=0.25, delay_features=True, delay_target=True, random_seed=0, **kwargs)[source]#

Transformer that delays input features and target variable for time series problems.

This component uses an algorithm based on the autocorrelation values of the target variable to determine which lags to select from the set of all possible lags.

The algorithm is based on the idea that the local maxima of the autocorrelation function indicate the lags that have the most impact on the present time.

The algorithm computes the autocorrelation values and finds the local maxima, called “peaks”, that are significant at the given conf_level. Since lags in the range [0, 10] tend to be predictive but not local maxima, the union of the peaks is taken with the significant lags in the range [0, 10]. At the end, only selected lags in the range [0, max_delay] are used.

Parametrizing the algorithm by conf_level lets the AutoMLAlgorithm tune the set of lags chosen so that the chances of finding a good set of lags is higher.

Using conf_level value of 1 selects all possible lags.

Parameters
  • time_index (str) – Name of the column containing the datetime information used to order the data. Ignored.

  • max_delay (int) – Maximum number of time units to delay each feature. Defaults to 2.

  • forecast_horizon (int) – The number of time periods the pipeline is expected to forecast.

  • conf_level (float) – Float in range (0, 1] that determines the confidence interval size used to select which lags to compute from the set of [1, max_delay]. A delay of 1 will always be computed. If 1, selects all possible lags in the set of [1, max_delay], inclusive.

  • rolling_window_size (float) – Float in range (0, 1] that determines the size of the window used for rolling features. Size is computed as rolling_window_size * max_delay.

  • delay_features (bool) – Whether to delay the input features. Defaults to True.

  • delay_target (bool) – Whether to delay the target. Defaults to True.

  • gap (int) – The number of time units between when the features are collected and when the target is collected. For example, if you are predicting the next time step’s target, gap=1. This is only needed because when gap=0, we need to be sure to start the lagging of the target variable at 1. Defaults to 1.

  • random_seed (int) – Seed for the random number generator. This transformer performs the same regardless of the random seed provided.

Attributes

df_colname_prefix

{}_delay_{}

hyperparameter_ranges

Real(0.001, 1.0), “rolling_window_size”: Real(0.001, 1.0)}:type: {“conf_level”

modifies_features

True

modifies_target

False

name

Time Series Featurizer

needs_fitting

True

target_colname_prefix

target_delay_{}

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the DelayFeatureTransformer.

fit_transform

Fit the component and transform the input data.

load

Loads component at file path.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Computes the delayed values and rolling means for X and y.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the DelayFeatureTransformer.

Parameters
  • X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

ValueError – if self.time_index is None

fit_transform(self, X, y=None)[source]#

Fit the component and transform the input data.

Parameters
  • X (pd.DataFrame) – Data to transform.

  • y (pd.Series, or None) – Target.

Returns

Transformed X.

Return type

pd.DataFrame

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Computes the delayed values and rolling means for X and y.

The chosen delays are determined by the autocorrelation function of the target variable. See the class docstring for more information on how they are chosen. If y is None, all possible lags are chosen.

If y is not None, it will also compute the delayed values for the target variable.

The rolling means for all numeric features in X and y, if y is numeric, are also returned.

Parameters
  • X (pd.DataFrame or None) – Data to transform. None is expected when only the target variable is being used.

  • y (pd.Series, or None) – Target.

Returns

Transformed X. No original features are returned.

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.TimeSeriesImputer(categorical_impute_strategy='forwards_fill', numeric_impute_strategy='interpolate', target_impute_strategy='forwards_fill', random_seed=0, **kwargs)[source]#

Imputes missing data according to a specified timeseries-specific imputation strategy.

This Transformer should be used after the TimeSeriesRegularizer in order to impute the missing values that were added to X and y (if passed).

Parameters
  • categorical_impute_strategy (string) – Impute strategy to use for string, object, boolean, categorical dtypes. Valid values include “backwards_fill” and “forwards_fill”. Defaults to “forwards_fill”.

  • numeric_impute_strategy (string) – Impute strategy to use for numeric columns. Valid values include “backwards_fill”, “forwards_fill”, and “interpolate”. Defaults to “interpolate”.

  • target_impute_strategy (string) – Impute strategy to use for the target column. Valid values include “backwards_fill”, “forwards_fill”, and “interpolate”. Defaults to “forwards_fill”.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Raises

ValueError – If categorical_impute_strategy, numeric_impute_strategy, or target_impute_strategy is not one of the valid values.

Attributes

hyperparameter_ranges

{ “categorical_impute_strategy”: [“backwards_fill”, “forwards_fill”], “numeric_impute_strategy”: [“backwards_fill”, “forwards_fill”, “interpolate”], “target_impute_strategy”: [“backwards_fill”, “forwards_fill”, “interpolate”],}

modifies_features

True

modifies_target

True

name

Time Series Imputer

training_only

True

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits imputer to data.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data X by imputing missing values using specified timeseries-specific strategies. 'None' values are converted to np.nan before imputation and are treated as the same.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits imputer to data.

‘None’ values are converted to np.nan before imputation and are treated as the same. If a value is missing at the beginning or end of a column, that value will be imputed using backwards fill or forwards fill as necessary, respectively.

Parameters
  • X (pd.DataFrame, np.ndarray) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by imputing missing values using specified timeseries-specific strategies. ‘None’ values are converted to np.nan before imputation and are treated as the same.

Parameters
  • X (pd.DataFrame) – Data to transform.

  • y (pd.Series, optional) – Optionally, target data to transform.

Returns

Transformed X and y

Return type

pd.DataFrame

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.TimeSeriesRegularizer(time_index=None, frequency_payload=None, window_length=4, threshold=0.4, random_seed=0, **kwargs)[source]#

Transformer that regularizes an inconsistently spaced datetime column.

If X is passed in to fit/transform, the column time_index will be checked for an inferrable offset frequency. If the time_index column is perfectly inferrable then this Transformer will do nothing and return the original X and y.

If X does not have a perfectly inferrable frequency but one can be estimated, then X and y will be reformatted based on the estimated frequency for time_index. In the original X and y passed: - Missing datetime values will be added and will have their corresponding columns in X and y set to None. - Duplicate datetime values will be dropped. - Extra datetime values will be dropped. - If it can be determined that a duplicate or extra value is misaligned, then it will be repositioned to take the place of a missing value.

This Transformer should be used before the TimeSeriesImputer in order to impute the missing values that were added to X and y (if passed).

If used on multiseries dataset, works specifically on unstacked datasets.

Parameters
  • time_index (string) – Name of the column containing the datetime information used to order the data, required. Defaults to None.

  • frequency_payload (tuple) – Payload returned from Woodwork’s infer_frequency function where debug is True. Defaults to None.

  • window_length (int) – The size of the rolling window over which inference is conducted to determine the prevalence of uninferrable frequencies.

  • 5. (Lower values make this component more sensitive to recognizing numerous faulty datetime values. Defaults to) –

  • threshold (float) – The minimum percentage of windows that need to have been able to infer a frequency. Lower values make this component more

  • 0.8. (sensitive to recognizing numerous faulty datetime values. Defaults to) –

  • random_seed (int) – Seed for the random number generator. This transformer performs the same regardless of the random seed provided.

  • 0. (Defaults to) –

Raises

ValueError – if the frequency_payload parameter has not been passed a tuple

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

True

name

Time Series Regularizer

training_only

True

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the TimeSeriesRegularizer.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Regularizes a dataframe and target data to an inferrable offset frequency.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the TimeSeriesRegularizer.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises
  • ValueError – if self.time_index is None, if X and y have different lengths, if time_index in X does not have an offset frequency that can be estimated

  • TypeError – if the time_index column is not of type Datetime

  • KeyError – if the time_index column doesn’t exist

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Regularizes a dataframe and target data to an inferrable offset frequency.

A ‘clean’ X and y (if y was passed in) are created based on an inferrable offset frequency and matching datetime values with the original X and y are imputed into the clean X and y. Datetime values identified as misaligned are shifted into their appropriate position.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Data with an inferrable time_index offset frequency.

Return type

(pd.DataFrame, pd.Series)

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.Transformer(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]#

A component that may or may not need fitting that transforms data. These components are used before an estimator.

To implement a new Transformer, define your own class which is a subclass of Transformer, including a name and a list of acceptable ranges for any parameters to be tuned during the automl search (hyperparameters). Define an __init__ method which sets up any necessary state and objects. Make sure your __init__ only uses standard keyword arguments and calls super().__init__() with a parameters dict. You may also override the fit, transform, fit_transform and other methods in this class if appropriate.

To see some examples, check out the definitions of any Transformer component.

Parameters
  • parameters (dict) – Dictionary of parameters for the component. Defaults to None.

  • component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

modifies_features

True

modifies_target

False

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

name

Returns string name of this component.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data X.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)[source]#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

property name(cls)#

Returns string name of this component.

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

abstract transform(self, X, y=None)[source]#

Transforms data X.

Parameters
  • X (pd.DataFrame) – Data to transform.

  • y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.Undersampler(sampling_ratio=0.25, sampling_ratio_dict=None, min_samples=100, min_percentage=0.1, random_seed=0, **kwargs)[source]#

Initializes an undersampling transformer to downsample the majority classes in the dataset.

This component is only run during training and not during predict.

Parameters
  • sampling_ratio (float) – The smallest minority:majority ratio that is accepted as ‘balanced’. For instance, a 1:4 ratio would be represented as 0.25, while a 1:1 ratio is 1.0. Must be between 0 and 1, inclusive. Defaults to 0.25.

  • sampling_ratio_dict (dict) – A dictionary specifying the desired balanced ratio for each target value. For instance, in a binary case where class 1 is the minority, we could specify: sampling_ratio_dict={0: 0.5, 1: 1}, which means we would undersample class 0 to have twice the number of samples as class 1 (minority:majority ratio = 0.5), and don’t sample class 1. Overrides sampling_ratio if provided. Defaults to None.

  • min_samples (int) – The minimum number of samples that we must have for any class, pre or post sampling. If a class must be downsampled, it will not be downsampled past this value. To determine severe imbalance, the minority class must occur less often than this and must have a class ratio below min_percentage. Must be greater than 0. Defaults to 100.

  • min_percentage (float) – The minimum percentage of the minimum class to total dataset that we tolerate, as long as it is above min_samples. If min_percentage and min_samples are not met, treat this as severely imbalanced, and we will not resample the data. Must be between 0 and 0.5, inclusive. Defaults to 0.1.

  • random_seed (int) – The seed to use for random sampling. Defaults to 0.

Raises
  • ValueError – If sampling_ratio is not in the range (0, 1].

  • ValueError – If min_sample is not greater than 0.

  • ValueError – If min_percentage is not between 0 and 0.5, inclusive.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

True

name

Undersampler

training_only

True

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the sampler to the data.

fit_resample

Resampling technique for this sampler.

fit_transform

Fit and transform data using the sampler component.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms the input data by sampling the data.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)#

Fits the sampler to the data.

Parameters
  • X (pd.DataFrame) – Input features.

  • y (pd.Series) – Target.

Returns

self

Raises

ValueError – If y is None.

fit_resample(self, X, y)[source]#

Resampling technique for this sampler.

Parameters
  • X (pd.DataFrame) – Training data to fit and resample.

  • y (pd.Series) – Training data targets to fit and resample.

Returns

Indices to keep for training data.

Return type

list

fit_transform(self, X, y)#

Fit and transform data using the sampler component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

(pd.DataFrame, pd.Series)

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms the input data by sampling the data.

Parameters
  • X (pd.DataFrame) – Training features.

  • y (pd.Series) – Target.

Returns

Transformed features and target.

Return type

pd.DataFrame, pd.Series

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.URLFeaturizer(random_seed=0, **kwargs)[source]#

Transformer that can automatically extract features from URL.

Parameters

random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

URL Featurizer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits component to data.

fit_transform

Fits on X and transforms X.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transforms data X.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters
  • X (pd.DataFrame) – Data to fit and transform.

  • y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)#

Transforms data X.

Parameters
  • X (pd.DataFrame) – Data to transform.

  • y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.VARMAXRegressor(time_index: Optional[Hashable] = None, p: int = 1, q: int = 0, trend: Optional[str] = 'c', random_seed: Union[int, float] = 0, maxiter: int = 10, use_covariates: bool = False, **kwargs)[source]#

Vector Autoregressive Moving Average with eXogenous regressors model. The two parameters (p, q) are the AR order and the MA order. More information here: https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.varmax.VARMAX.html.

Currently VARMAXRegressor isn’t supported via conda install. It’s recommended that it be installed via PyPI.

Parameters
  • time_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.

  • p (int) – Maximum Autoregressive order. Defaults to 1.

  • q (int) – Maximum Moving Average order. Defaults to 0.

  • trend (str) – Controls the deterministic trend. Options are [‘n’, ‘c’, ‘t’, ‘ct’] where ‘c’ is a constant term, ‘t’ indicates a linear trend, and ‘ct’ is both. Can also be an iterable when defining a polynomial, such as [1, 1, 0, 1].

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

  • max_iter (int) – Maximum number of iterations for solver. Defaults to 10.

  • use_covariates (bool) – If True, will pass exogenous variables in fit/predict methods. If False, forecasts will solely be based off of the datetimes and target values. Defaults to True.

Attributes

hyperparameter_ranges

{ “p”: Integer(1, 10), “q”: Integer(1, 10), “trend”: Categorical([‘n’, ‘c’, ‘t’, ‘ct’]),}

model_family

ModelFamily.VARMAX

modifies_features

True

modifies_target

False

name

VARMAX Regressor

supported_problem_types

[ProblemTypes.MULTISERIES_TIME_SERIES_REGRESSION]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Returns array of 0's with a length of 1 as feature_importance is not defined for VARMAX regressor.

fit

Fits VARMAX regressor to data.

get_prediction_intervals

Find the prediction intervals using the fitted VARMAXRegressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using fitted VARMAX regressor.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) numpy.ndarray#

Returns array of 0’s with a length of 1 as feature_importance is not defined for VARMAX regressor.

fit(self, X: pandas.DataFrame, y: Optional[pandas.DataFrame] = None)[source]#

Fits VARMAX regressor to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.DataFrane) – The target training data of shape [n_samples, n_series_id_values].

Returns

self

Raises

ValueError – If y was not passed in.

get_prediction_intervals(self, X: pandas.DataFrame, y: pandas.DataFrame = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series][source]#

Find the prediction intervals using the fitted VARMAXRegressor.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.DataFrame) – Target data of shape [n_samples, n_series_id_values]. Optional.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Not used for VARMAX regressor.

Returns

A dict of prediction intervals, where the dict is in the format {series_id: {coverage}_lower or {coverage}_upper}.

Return type

dict[dict]

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame, y: Optional[pandas.DataFrame] = None) pandas.Series[source]#

Make predictions using fitted VARMAX regressor.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.DataFrame) – Target data of shape [n_samples, n_series_id_values].

Returns

Predicted values.

Return type

pd.Series

Raises

ValueError – If X was passed to fit but not passed in predict.

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.XGBoostClassifier(eta=0.1, max_depth=6, min_child_weight=1, n_estimators=100, random_seed=0, eval_metric='logloss', n_jobs=12, **kwargs)[source]#

XGBoost Classifier.

Parameters
  • eta (float) – Boosting learning rate. Defaults to 0.1.

  • max_depth (int) – Maximum tree depth for base learners. Defaults to 6.

  • min_child_weight (float) – Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0

  • n_estimators (int) – Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

  • n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to 12.

Attributes

hyperparameter_ranges

{ “eta”: Real(0.000001, 1), “max_depth”: Integer(1, 10), “min_child_weight”: Real(1, 10), “n_estimators”: Integer(1, 1000),}

model_family

ModelFamily.XGBOOST

modifies_features

True

modifies_target

False

name

XGBoost Classifier

SEED_MAX

None

SEED_MIN

None

supported_problem_types

[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Feature importance of fitted XGBoost classifier.

fit

Fits XGBoost classifier component to data.

get_prediction_intervals

Find the prediction intervals using the fitted regressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using the fitted XGBoost classifier.

predict_proba

Make predictions using the fitted CatBoost classifier.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self)#

Feature importance of fitted XGBoost classifier.

fit(self, X, y=None)[source]#

Fits XGBoost classifier component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series]#

Find the prediction intervals using the fitted regressor.

This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

Raises

MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X)[source]#

Make predictions using the fitted XGBoost classifier.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.DataFrame

predict_proba(self, X)[source]#

Make predictions using the fitted CatBoost classifier.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.DataFrame

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.XGBoostRegressor(eta: float = 0.1, max_depth: int = 6, min_child_weight: int = 1, n_estimators: int = 100, random_seed: Union[int, float] = 0, n_jobs: int = 12, **kwargs)[source]#

XGBoost Regressor.

Parameters
  • eta (float) – Boosting learning rate. Defaults to 0.1.

  • max_depth (int) – Maximum tree depth for base learners. Defaults to 6.

  • min_child_weight (float) – Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0

  • n_estimators (int) – Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

  • n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to 12.

Attributes

hyperparameter_ranges

{ “eta”: Real(0.000001, 1), “max_depth”: Integer(1, 20), “min_child_weight”: Real(1, 10), “n_estimators”: Integer(1, 1000),}

model_family

ModelFamily.XGBOOST

modifies_features

True

modifies_target

False

name

XGBoost Regressor

SEED_MAX

None

SEED_MIN

None

supported_problem_types

[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION, ProblemTypes.MULTISERIES_TIME_SERIES_REGRESSION,]

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

feature_importance

Feature importance of fitted XGBoost regressor.

fit

Fits XGBoost regressor component to data.

get_prediction_intervals

Find the prediction intervals using the fitted XGBoostRegressor.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

predict

Make predictions using fitted XGBoost regressor.

predict_proba

Make probability estimates for labels.

save

Saves component at file path.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

property feature_importance(self) pandas.Series#

Feature importance of fitted XGBoost regressor.

fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None)[source]#

Fits XGBoost regressor component to data.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) Dict[str, pandas.Series][source]#

Find the prediction intervals using the fitted XGBoostRegressor.

Parameters
  • X (pd.DataFrame) – Data of shape [n_samples, n_features].

  • y (pd.Series) – Target data. Ignored.

  • coverage (List[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.

  • predictions (pd.Series) – Optional list of predictions to use. If None, will generate predictions using X.

Returns

Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.

Return type

dict

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

predict(self, X: pandas.DataFrame) pandas.Series[source]#

Make predictions using fitted XGBoost regressor.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].

Returns

Predicted values.

Return type

pd.Series

predict_proba(self, X: pandas.DataFrame) pandas.Series#

Make probability estimates for labels.

Parameters

X (pd.DataFrame) – Features.

Returns

Probability estimates.

Return type

pd.Series

Raises

MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.