components¶

Subpackages¶

Submodules¶

Package Contents¶

Classes Summary¶

`ARIMARegressor`	Autoregressive Integrated Moving Average Model.
`BaselineClassifier`	Classifier that predicts using the specified strategy.
`BaselineRegressor`	Baseline regressor that uses a simple strategy to make predictions.
`CatBoostClassifier`	CatBoost Classifier, a classifier that uses gradient-boosting on decision trees.
`CatBoostRegressor`	CatBoost Regressor, a regressor that uses gradient-boosting on decision trees.
`ComponentBase`	Base class for all components.
`ComponentBaseMeta`	Metaclass that overrides creating a new component by wrapping methods with validators and setters
`DateTimeFeaturizer`	Transformer that can automatically extract features from datetime columns.
`DecisionTreeClassifier`	Decision Tree Classifier.
`DecisionTreeRegressor`	Decision Tree Regressor.
`DelayedFeatureTransformer`	Transformer that delays input features and target variable for time series problems.
`DFSTransformer`	Featuretools DFS component that generates features for the input features.
`DropColumns`	Drops specified columns in input data.
`DropNullColumns`	Transformer to drop features whose percentage of NaN values exceeds a specified threshold.
`DropRowsTransformer`	Transformer to drop rows specified by row indices.
`ElasticNetClassifier`	Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator.
`ElasticNetRegressor`	Elastic Net Regressor.
`EmailFeaturizer`	Transformer that can automatically extract features from emails.
`Estimator`	A component that fits and predicts given data.
`ExtraTreesClassifier`	Extra Trees Classifier.
`ExtraTreesRegressor`	Extra Trees Regressor.
`FeatureSelector`	Selects top features based on importance weights.
`Imputer`	Imputes missing data according to a specified imputation strategy.
`KNeighborsClassifier`	K-Nearest Neighbors Classifier.
`LightGBMClassifier`	LightGBM Classifier.
`LightGBMRegressor`	LightGBM Regressor.
`LinearDiscriminantAnalysis`	Reduces the number of features by using Linear Discriminant Analysis.
`LinearRegressor`	Linear Regressor.
`LogisticRegressionClassifier`	Logistic Regression Classifier.
`LogTransformer`	Applies a log transformation to the target data.
`LSA`	Transformer to calculate the Latent Semantic Analysis Values of text input.
`OneHotEncoder`	A transformer that encodes categorical features in a one-hot numeric array.
`Oversampler`	SMOTE Oversampler component. Will automatically select whether to use SMOTE, SMOTEN, or SMOTENC based on inputs to the component.
`PCA`	Reduces the number of features by using Principal Component Analysis (PCA).
`PerColumnImputer`	Imputes missing data according to a specified imputation strategy per column.
`PolynomialDetrender`	Removes trends from time series by fitting a polynomial to the data.
`ProphetRegressor`	Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.
`RandomForestClassifier`	Random Forest Classifier.
`RandomForestRegressor`	Random Forest Regressor.
`RFClassifierSelectFromModel`	Selects top features based on importance weights using a Random Forest classifier.
`RFRegressorSelectFromModel`	Selects top features based on importance weights using a Random Forest regressor.
`SelectByType`	Selects columns by specified Woodwork logical type or semantic tag in input data.
`SelectColumns`	Selects specified columns in input data.
`SimpleImputer`	Imputes missing data according to a specified imputation strategy.
`SklearnStackedEnsembleClassifier`	Scikit-learn Stacked Ensemble Classifier.
`SklearnStackedEnsembleRegressor`	Scikit-learn Stacked Ensemble Regressor.
`StandardScaler`	A transformer that standardizes input features by removing the mean and scaling to unit variance.
`SVMClassifier`	Support Vector Machine Classifier.
`SVMRegressor`	Support Vector Machine Regressor.
`TargetEncoder`	A transformer that encodes categorical features into target encodings.
`TargetImputer`	Imputes missing target data according to a specified imputation strategy.
`TextFeaturizer`	Transformer that can automatically featurize text columns using featuretools’ nlp_primitives.
`TimeSeriesBaselineEstimator`	Time series estimator that predicts using the naive forecasting approach.
`Transformer`	A component that may or may not need fitting that transforms data.
`Undersampler`	Initializes an undersampling transformer to downsample the majority classes in the dataset.
`URLFeaturizer`	Transformer that can automatically extract features from URL.
`XGBoostClassifier`	XGBoost Classifier.
`XGBoostRegressor`	XGBoost Regressor.

Contents¶

class evalml.pipelines.components.ARIMARegressor(date_index=None, trend=None, start_p=2, d=0, start_q=2, max_p=5, max_d=2, max_q=5, seasonal=True, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Autoregressive Integrated Moving Average Model. The three parameters (p, d, q) are the AR order, the degree of differencing, and the MA order. More information here: https://www.statsmodels.org/devel/generated/statsmodels.tsa.arima_model.ARIMA.html

Currently ARIMARegressor isn’t supported via conda install. It’s recommended that it be installed via PyPI.

Parameters

date_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.
trend (str) – Controls the deterministic trend. Options are [‘n’, ‘c’, ‘t’, ‘ct’] where ‘c’ is a constant term, ‘t’ indicates a linear trend, and ‘ct’ is both. Can also be an iterable when defining a polynomial, such as [1, 1, 0, 1].
start_p (int) – Minimum Autoregressive order. Defaults to 2.
d (int) – Minimum Differencing degree. Defaults to 0.
start_q (int) – Minimum Moving Average order. Defaults to 2.
max_p (int) – Maximum Autoregressive order. Defaults to 5.
max_d (int) – Maximum Differencing degree. Defaults to 2.
max_q (int) – Maximum Moving Average order. Defaults to 5.
seasonal (boolean) – Whether to fit a seasonal model to ARIMA. Defaults to True.
n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “start_p”: Integer(1, 3), “d”: Integer(0, 2), “start_q”: Integer(1, 3), “max_p”: Integer(3, 10), “max_d”: Integer(2, 5), “max_q”: Integer(3, 10), “seasonal”: [True, False],}
model_family	ModelFamily.ARIMA
modifies_features	True
modifies_target	False
name	ARIMA Regressor
predict_uses_y	False
supported_problem_types	[ProblemTypes.TIME_SERIES_REGRESSION]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns array of 0’s with a length of 1 as feature_importance is not defined for ARIMA regressor.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶: Returns array of 0’s with a length of 1 as feature_importance is not defined for ARIMA regressor.

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X, y=None)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.BaselineClassifier(strategy='mode', random_seed=0, **kwargs)[source]¶

Classifier that predicts using the specified strategy.

This is useful as a simple baseline classifier to compare with other classifiers.

Parameters

strategy (str) – Method used to predict. Valid options are “mode”, “random” and “random_weighted”. Defaults to “mode”.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.BASELINE
modifies_features	True
modifies_target	False
name	Baseline Classifier
predict_uses_y	False
supported_problem_types	[ProblemTypes.BINARY, ProblemTypes.MULTICLASS]

Methods

`classes_`	Returns class labels. Will return None before fitting.
`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature. Since baseline classifiers do not use input features to calculate predictions, returns an array of zeroes.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

property classes_(self)¶

Returns class labels. Will return None before fitting.

Returns: Class names
Return type: list[str] or list(float)

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature. Since baseline classifiers do not use input features to calculate predictions, returns an array of zeroes.

Returns: An array of zeroes
Return type: np.ndarray (float)

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)[source]¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.BaselineRegressor(strategy='mean', random_seed=0, **kwargs)[source]¶

Baseline regressor that uses a simple strategy to make predictions. This is useful as a simple baseline regressor to compare with other regressors.

Parameters

strategy (str) – Method used to predict. Valid options are “mean”, “median”. Defaults to “mean”.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.BASELINE
modifies_features	True
modifies_target	False
name	Baseline Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature. Since baseline regressors do not use input features to calculate predictions, returns an array of zeroes.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature. Since baseline regressors do not use input features to calculate predictions, returns an array of zeroes.

Returns: An array of zeroes
Return type: np.ndarray (float)

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.CatBoostClassifier(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=True, allow_writing_files=False, random_seed=0, n_jobs=- 1, **kwargs)[source]¶

CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.

For more information, check out https://catboost.ai/

Parameters

n_estimators (float) – The maximum number of trees to build. Defaults to 10.
eta (float) – The learning rate. Defaults to 0.03.
max_depth (int) – The maximum tree depth for base learners. Defaults to 6.
bootstrap_type (string) – Defines the method for sampling the weights of objects. Available methods are ‘Bayesian’, ‘Bernoulli’, ‘MVS’. Defaults to None.
silent (boolean) – Whether to use the “silent” logging mode. Defaults to True.
allow_writing_files (boolean) – Whether to allow writing snapshot files while training. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “n_estimators”: Integer(4, 100), “eta”: Real(0.000001, 1), “max_depth”: Integer(4, 10),}
model_family	ModelFamily.CATBOOST
modifies_features	True
modifies_target	False
name	CatBoost Classifier
predict_uses_y	False
supported_problem_types	[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.CatBoostRegressor(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=False, allow_writing_files=False, random_seed=0, n_jobs=- 1, **kwargs)[source]¶

CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.

For more information, check out https://catboost.ai/

Parameters

n_estimators (float) – The maximum number of trees to build. Defaults to 10.
eta (float) – The learning rate. Defaults to 0.03.
max_depth (int) – The maximum tree depth for base learners. Defaults to 6.
bootstrap_type (string) – Defines the method for sampling the weights of objects. Available methods are ‘Bayesian’, ‘Bernoulli’, ‘MVS’. Defaults to None.
silent (boolean) – Whether to use the “silent” logging mode. Defaults to True.
allow_writing_files (boolean) – Whether to allow writing snapshot files while training. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “n_estimators”: Integer(4, 100), “eta”: Real(0.000001, 1), “max_depth”: Integer(4, 10),}
model_family	ModelFamily.CATBOOST
modifies_features	True
modifies_target	False
name	CatBoost Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.ComponentBase(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]¶

Base class for all components.

Parameters

parameters (dict) – Dictionary of parameters for the component. Defaults to None.
component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`load`	Loads component at file path
`model_family`	Returns ModelFamily of this component
`modifies_features`	Returns whether this component modifies (subsets or transforms) the features variable during transform.
`modifies_target`	Returns whether this component modifies (subsets or transforms) the target variable during transform.
`name`	Returns string name of this component
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path

clone(self)[source]¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)[source]¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)[source]¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

property model_family(cls)¶: Returns ModelFamily of this component

property modifies_features(cls)¶: Returns whether this component modifies (subsets or transforms) the features variable during transform. For Estimator objects, this attribute determines if the return value from predict or predict_proba should be used as features or targets.

property modifies_target(cls)¶: Returns whether this component modifies (subsets or transforms) the target variable during transform. For Estimator objects, this attribute determines if the return value from predict or predict_proba should be used as features or targets.

property name(cls)¶: Returns string name of this component

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)[source]¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.ComponentBaseMeta[source]¶

Metaclass that overrides creating a new component by wrapping methods with validators and setters

Attributes

FIT_METHODS	[‘fit’, ‘fit_transform’]
METHODS_TO_CHECK	[‘predict’, ‘predict_proba’, ‘transform’, ‘inverse_transform’]
PROPERTIES_TO_CHECK	[‘feature_importance’]

Methods

`check_for_fit`	check_for_fit wraps a method that validates if self._is_fitted is True.
`register`	Register a virtual subclass of an ABC.
`set_fit`

classmethod check_for_fit(cls, method)[source]¶: check_for_fit wraps a method that validates if self._is_fitted is True. It raises an exception if False and calls and returns the wrapped method if True.

register(cls, subclass)¶

Returns the subclass, to allow usage as a class decorator.

classmethod set_fit(cls, method)¶

class evalml.pipelines.components.DateTimeFeaturizer(features_to_extract=None, encode_as_categories=False, date_index=None, random_seed=0, **kwargs)[source]¶

Transformer that can automatically extract features from datetime columns.

Parameters

features_to_extract (list) – List of features to extract. Valid options include “year”, “month”, “day_of_week”, “hour”. Defaults to None.
encode_as_categories (bool) – Whether day-of-week and month features should be encoded as pandas “category” dtype. This allows OneHotEncoders to encode these features. Defaults to False.
date_index (str) – Name of the column containing the datetime information used to order the data. Ignored.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	DateTime Featurization Component

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`get_feature_names`	Gets the categories of each datetime feature.
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X by creating new features using existing DateTime columns, and then dropping those DateTime columns

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

get_feature_names(self)[source]¶

Gets the categories of each datetime feature.

Returns: Dictionary, where each key-value pair is a column name and a dictionary mapping the unique feature values to their integer encoding.

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X by creating new features using existing DateTime columns, and then dropping those DateTime columns

Parameters

X (pd.DataFrame) – Data to transform
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.DecisionTreeClassifier(criterion='gini', max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs)[source]¶

Decision Tree Classifier.

Parameters

criterion ({"gini", "entropy"}) – The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. Defaults to “gini”.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
- If “auto”, then max_features=sqrt(n_features).
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
- If int, then consider min_samples_split as the minimum number.
- If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Defaults to 2.
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “criterion”: [“gini”, “entropy”], “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family	ModelFamily.DECISION_TREE
modifies_features	True
modifies_target	False
name	Decision Tree Classifier
predict_uses_y	False
supported_problem_types	[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.DecisionTreeRegressor(criterion='mse', max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs)[source]¶

Decision Tree Regressor.

Parameters

criterion ({"mse", "friedman_mse", "mae", "poisson"}) –
The function to measure the quality of a split. Supported criteria are:
- ”mse” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node
- ”friedman_mse”, which uses mean squared error with Friedman”s improvement score for potential splits
- ”mae” for the mean absolute error, which minimizes the L1 loss using the median of each terminal node,
- ”poisson” which uses reduction in Poisson deviance to find splits.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
- If “auto”, then max_features=sqrt(n_features).
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
- If int, then consider min_samples_split as the minimum number.
- If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Defaults to 2.
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “criterion”: [“mse”, “friedman_mse”, “mae”], “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family	ModelFamily.DECISION_TREE
modifies_features	True
modifies_target	False
name	Decision Tree Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.DelayedFeatureTransformer(date_index=None, max_delay=2, delay_features=True, delay_target=True, gap=1, random_seed=0, **kwargs)[source]¶

Transformer that delays input features and target variable for time series problems.

Parameters

date_index (str) – Name of the column containing the datetime information used to order the data. Ignored.
max_delay (int) – Maximum number of time units to delay each feature. Defaults to 2.
delay_features (bool) – Whether to delay the input features. Defaults to True.
delay_target (bool) – Whether to delay the target. Defaults to True.
gap (int) – The number of time units between when the features are collected and when the target is collected. For example, if you are predicting the next time step’s target, gap=1. This is only needed because when gap=0, we need to be sure to start the lagging of the target variable at 1. Defaults to 1.
random_seed (int) – Seed for the random number generator. This transformer performs the same regardless of the random seed provided.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	Delayed Feature Transformer
needs_fitting	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits the DelayFeatureTransformer.
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Computes the delayed features for all features in X and y.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits the DelayFeatureTransformer.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y)[source]¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Computes the delayed features for all features in X and y.

For each feature in X, it will add a column to the output dataframe for each delay in the (inclusive) range [1, max_delay]. The values of each delayed feature are simply the original feature shifted forward in time by the delay amount. For example, a delay of 3 units means that the feature value at row n will be taken from the n-3rd row of that feature

If y is not None, it will also compute the delayed values for the target variable.

Parameters

X (pd.DataFrame or None) – Data to transform. None is expected when only the target variable is being used.
y (pd.Series, or None) – Target.

Returns

Transformed X.

Return type

pd.DataFrame

class evalml.pipelines.components.DFSTransformer(index='index', random_seed=0, **kwargs)[source]¶

Featuretools DFS component that generates features for the input features.

Parameters

index (string) – The name of the column that contains the indices. If no column with this name exists, then featuretools.EntitySet() creates a column with this name to serve as the index column. Defaults to ‘index’.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	DFS Transformer

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits the DFSTransformer Transformer component.
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Computes the feature matrix for the input X using featuretools’ dfs algorithm.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits the DFSTransformer Transformer component.

Parameters

X (pd.DataFrame, np.array) – The input data to transform, of shape [n_samples, n_features]
y (pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Computes the feature matrix for the input X using featuretools’ dfs algorithm.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data to transform. Has shape [n_samples, n_features]
y (pd.Series, optional) – Ignored.

Returns

Feature matrix

Return type

pd.DataFrame

class evalml.pipelines.components.DropColumns(columns=None, random_seed=0, **kwargs)[source]¶

Drops specified columns in input data.

Parameters

columns (list(string)) – List of column names, used to determine which columns to drop.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	Drop Columns Transformer
needs_fitting	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits the transformer by checking if column names are present in the dataset.
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X by dropping columns.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)¶

Fits the transformer by checking if column names are present in the dataset.

Parameters

X (pd.DataFrame) – Data to check.
y (pd.Series, optional) – Targets.

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X by dropping columns.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Targets.

Returns

Transformed X.

Return type

pd.DataFrame

class evalml.pipelines.components.DropNullColumns(pct_null_threshold=1.0, random_seed=0, **kwargs)[source]¶

Transformer to drop features whose percentage of NaN values exceeds a specified threshold.

Parameters

pct_null_threshold (float) – The percentage of NaN values in an input feature to drop. Must be a value between [0, 1] inclusive. If equal to 0.0, will drop columns with any null values. If equal to 1.0, will drop columns with all null values. Defaults to 0.95.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	Drop Null Columns Transformer

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X by dropping columns that exceed the threshold of null values.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X by dropping columns that exceed the threshold of null values.

Parameters

X (pd.DataFrame) – Data to transform
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.DropRowsTransformer(indices_to_drop=None, random_seed=0)[source]¶

Transformer to drop rows specified by row indices.

Parameters

indices_to_drop (list) – List of indices to drop in the input data. Defaults to None.
random_seed (int) – Seed for the random number generator. Is not used by this component. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	True
name	Drop Rows Transformer

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.ElasticNetClassifier(penalty='elasticnet', C=1.0, l1_ratio=0.15, multi_class='auto', solver='saga', n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator.

Parameters

penalty ({"l1", "l2", "elasticnet", "none"}) – The norm used in penalization. Defaults to “elasticnet”.
C (float) – Inverse of regularization strength. Must be a positive float. Defaults to 1.0.
l1_ratio (float) – The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’. Setting l1_ratio=0 is equivalent to using penalty=’l2’, while setting l1_ratio=1 is equivalent to using penalty=’l1’. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15.
multi_class ({"auto", "ovr", "multinomial"}) – If the option chosen is “ovr”, then a binary problem is fit for each label. For “multinomial” the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. “multinomial” is unavailable when solver=”liblinear”. “auto” selects “ovr” if the data is binary, or if solver=”liblinear”, and otherwise selects “multinomial”. Defaults to “auto”.
solver ({"newton-cg", "lbfgs", "liblinear", "sag", "saga"}) –
Algorithm to use in the optimization problem. For small datasets, “liblinear” is a good choice, whereas “sag” and “saga” are faster for large ones. For multiclass problems, only “newton-cg”, “sag”, “saga” and “lbfgs” handle multinomial loss; “liblinear” is limited to one-versus-rest schemes.
- ”newton-cg”, “lbfgs”, “sag” and “saga” handle L2 or no penalty
- ”liblinear” and “saga” also handle L1 penalty
- ”saga” also supports “elasticnet” penalty
- ”liblinear” does not support setting penalty=’none’
Defaults to “saga”.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “C”: Real(0.01, 10), “l1_ratio”: Real(0, 1)}
model_family	ModelFamily.LINEAR_MODEL
modifies_features	True
modifies_target	False
name	Elastic Net Classifier
predict_uses_y	False
supported_problem_types	[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.ElasticNetRegressor(alpha=0.0001, l1_ratio=0.15, max_iter=1000, normalize=False, random_seed=0, **kwargs)[source]¶

Elastic Net Regressor.

Parameters

alpha (float) – Constant that multiplies the penalty terms. Defaults to 0.0001.
l1_ratio (float) – The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’. Setting l1_ratio=0 is equivalent to using penalty=’l2’, while setting l1_ratio=1 is equivalent to using penalty=’l1’. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15.
max_iter (int) – The maximum number of iterations. Defaults to 1000.
normalize (boolean) – If True, the regressors will be normalized before regression by subtracting the mean and dividing by the l2-norm. Defaults to False.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “alpha”: Real(0, 1), “l1_ratio”: Real(0, 1),}
model_family	ModelFamily.LINEAR_MODEL
modifies_features	True
modifies_target	False
name	Elastic Net Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.EmailFeaturizer(random_seed=0, **kwargs)[source]¶

Transformer that can automatically extract features from emails.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	Email Featurizer

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)¶

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.Estimator(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]¶

A component that fits and predicts given data.

To implement a new Estimator, define your own class which is a subclass of Estimator, including a name and a list of acceptable ranges for any parameters to be tuned during the automl search (hyperparameters). Define an __init__ method which sets up any necessary state and objects. Make sure your __init__ only uses standard keyword arguments and calls super().__init__() with a parameters dict. You may also override the fit, transform, fit_transform and other methods in this class if appropriate.

To see some examples, check out the definitions of any Estimator component.

Parameters

parameters (dict) – Dictionary of parameters for the component. Defaults to None.
component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
predict_uses_y	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`name`	Returns string name of this component
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path
`supported_problem_types`	Problem types this estimator supports

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

property name(cls)¶: Returns string name of this component

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)[source]¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

property supported_problem_types(cls)¶: Problem types this estimator supports

class evalml.pipelines.components.ExtraTreesClassifier(n_estimators=100, max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Extra Trees Classifier.

Parameters

n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
- If “auto”, then max_features=sqrt(n_features).
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
- If int, then consider min_samples_split as the minimum number.
- If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
to 2. (Defaults) –
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “n_estimators”: Integer(10, 1000), “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family	ModelFamily.EXTRA_TREES
modifies_features	True
modifies_target	False
name	Extra Trees Classifier
predict_uses_y	False
supported_problem_types	[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.ExtraTreesRegressor(n_estimators=100, max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Extra Trees Regressor.

Parameters

n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
- If “auto”, then max_features=sqrt(n_features).
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
- If int, then consider min_samples_split as the minimum number.
- If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
to 2. (Defaults) –
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “n_estimators”: Integer(10, 1000), “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family	ModelFamily.EXTRA_TREES
modifies_features	True
modifies_target	False
name	Extra Trees Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.FeatureSelector(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]¶

Selects top features based on importance weights.

Parameters

parameters (dict) – Dictionary of parameters for the component. Defaults to None.
component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`get_names`	Get names of selected features.
`load`	Loads component at file path
`name`	Returns string name of this component
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)[source]¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

get_names(self)[source]¶

Get names of selected features.

Returns: List of the names of features selected
Return type: list[str]

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

property name(cls)¶: Returns string name of this component

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.Imputer(categorical_impute_strategy='most_frequent', categorical_fill_value=None, numeric_impute_strategy='mean', numeric_fill_value=None, random_seed=0, **kwargs)[source]¶

Imputes missing data according to a specified imputation strategy.

Parameters

categorical_impute_strategy (string) – Impute strategy to use for string, object, boolean, categorical dtypes. Valid values include “most_frequent” and “constant”.
numeric_impute_strategy (string) – Impute strategy to use for numeric columns. Valid values include “mean”, “median”, “most_frequent”, and “constant”.
categorical_fill_value (string) – When categorical_impute_strategy == “constant”, fill_value is used to replace missing data. The default value of None will fill with the string “missing_value”.
numeric_fill_value (int, float) – When numeric_impute_strategy == “constant”, fill_value is used to replace missing data. The default value of None will fill with 0.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “categorical_impute_strategy”: [“most_frequent”], “numeric_impute_strategy”: [“mean”, “median”, “most_frequent”],}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	Imputer

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X by imputing missing values. ‘None’ values are converted to np.nan before imputation and are

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are: treated as the same.

Parameters

X (pd.DataFrame, np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X by imputing missing values. ‘None’ values are converted to np.nan before imputation and are: treated as the same.

Parameters

X (pd.DataFrame) – Data to transform
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, random_seed=0, **kwargs)[source]¶

K-Nearest Neighbors Classifier.

Parameters

n_neighbors (int) – Number of neighbors to use by default. Defaults to 5.
weights ({‘uniform’, ‘distance’} or callable) –
Weight function used in prediction. Can be:
- ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
- ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
- [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
Defaults to “uniform”.
algorithm ({‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}) –
Algorithm used to compute the nearest neighbors:
- ‘ball_tree’ will use BallTree
- ‘kd_tree’ will use KDTree
- ‘brute’ will use a brute-force search.
‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method. Defaults to “auto”. Note: fitting on sparse input will override the setting of this parameter, using brute force.
leaf_size (int) – Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. Defaults to 30.
p (int) – Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. Defaults to 2.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “n_neighbors”: Integer(2, 12), “weights”: [“uniform”, “distance”], “algorithm”: [“auto”, “ball_tree”, “kd_tree”, “brute”], “leaf_size”: Integer(10, 30), “p”: Integer(1, 5),}
model_family	ModelFamily.K_NEIGHBORS
modifies_features	True
modifies_target	False
name	KNN Classifier
predict_uses_y	False
supported_problem_types	[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns array of 0’s matching the input number of features as feature_importance is
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶: Returns array of 0’s matching the input number of features as feature_importance is not defined for KNN classifiers.

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.LightGBMClassifier(boosting_type='gbdt', learning_rate=0.1, n_estimators=100, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

LightGBM Classifier.

Parameters

boosting_type (string) – Type of boosting to use. Defaults to “gbdt”. - ‘gbdt’ uses traditional Gradient Boosting Decision Tree - “dart”, uses Dropouts meet Multiple Additive Regression Trees - “goss”, uses Gradient-based One-Side Sampling - “rf”, uses Random Forest
learning_rate (float) – Boosting learning rate. Defaults to 0.1.
n_estimators (int) – Number of boosted trees to fit. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners, <=0 means no limit. Defaults to 0.
num_leaves (int) – Maximum tree leaves for base learners. Defaults to 31.
min_child_samples (int) – Minimum number of data needed in a child (leaf). Defaults to 20.
bagging_fraction (float) – LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9.
bagging_freq (int) – Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0.
n_jobs (int or None) – Number of threads to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “learning_rate”: Real(0.000001, 1), “boosting_type”: [“gbdt”, “dart”, “goss”, “rf”], “n_estimators”: Integer(10, 100), “max_depth”: Integer(0, 10), “num_leaves”: Integer(2, 100), “min_child_samples”: Integer(1, 100), “bagging_fraction”: Real(0.000001, 1), “bagging_freq”: Integer(0, 1),}
model_family	ModelFamily.LIGHTGBM
modifies_features	True
modifies_target	False
name	LightGBM Classifier
predict_uses_y	False
SEED_MAX	SEED_BOUNDS.max_bound
SEED_MIN	0
supported_problem_types	[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)[source]¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.LightGBMRegressor(boosting_type='gbdt', learning_rate=0.1, n_estimators=20, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

LightGBM Regressor.

Parameters

boosting_type (string) – Type of boosting to use. Defaults to “gbdt”. - ‘gbdt’ uses traditional Gradient Boosting Decision Tree - “dart”, uses Dropouts meet Multiple Additive Regression Trees - “goss”, uses Gradient-based One-Side Sampling - “rf”, uses Random Forest
learning_rate (float) – Boosting learning rate. Defaults to 0.1.
n_estimators (int) – Number of boosted trees to fit. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners, <=0 means no limit. Defaults to 0.
num_leaves (int) – Maximum tree leaves for base learners. Defaults to 31.
min_child_samples (int) – Minimum number of data needed in a child (leaf). Defaults to 20.
bagging_fraction (float) – LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9.
bagging_freq (int) – Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0.
n_jobs (int or None) – Number of threads to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “learning_rate”: Real(0.000001, 1), “boosting_type”: [“gbdt”, “dart”, “goss”, “rf”], “n_estimators”: Integer(10, 100), “max_depth”: Integer(0, 10), “num_leaves”: Integer(2, 100), “min_child_samples”: Integer(1, 100), “bagging_fraction”: Real(0.000001, 1), “bagging_freq”: Integer(0, 1),}
model_family	ModelFamily.LIGHTGBM
modifies_features	True
modifies_target	False
name	LightGBM Regressor
predict_uses_y	False
SEED_MAX	SEED_BOUNDS.max_bound
SEED_MIN	0
supported_problem_types	[ProblemTypes.REGRESSION]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.LinearDiscriminantAnalysis(n_components=None, random_seed=0, **kwargs)[source]¶

Reduces the number of features by using Linear Discriminant Analysis.

Parameters

n_components (int) – The number of features to maintain after computation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	Linear Discriminant Analysis Transformer

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)[source]¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.LinearRegressor(fit_intercept=True, normalize=False, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Linear Regressor.

Parameters

fit_intercept (boolean) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered). Defaults to True.
normalize (boolean) – If True, the regressors will be normalized before regression by subtracting the mean and dividing by the l2-norm. This parameter is ignored when fit_intercept is set to False. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “fit_intercept”: [True, False], “normalize”: [True, False]}
model_family	ModelFamily.LINEAR_MODEL
modifies_features	True
modifies_target	False
name	Linear Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.LogisticRegressionClassifier(penalty='l2', C=1.0, multi_class='auto', solver='lbfgs', n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Logistic Regression Classifier.

Parameters

penalty ({"l1", "l2", "elasticnet", "none"}) – The norm used in penalization. Defaults to “l2”.
C (float) – Inverse of regularization strength. Must be a positive float. Defaults to 1.0.
multi_class ({"auto", "ovr", "multinomial"}) – If the option chosen is “ovr”, then a binary problem is fit for each label. For “multinomial” the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. “multinomial” is unavailable when solver=”liblinear”. “auto” selects “ovr” if the data is binary, or if solver=”liblinear”, and otherwise selects “multinomial”. Defaults to “auto”.
solver ({"newton-cg", "lbfgs", "liblinear", "sag", "saga"}) –
Algorithm to use in the optimization problem. For small datasets, “liblinear” is a good choice, whereas “sag” and “saga” are faster for large ones. For multiclass problems, only “newton-cg”, “sag”, “saga” and “lbfgs” handle multinomial loss; “liblinear” is limited to one-versus-rest schemes.
- ”newton-cg”, “lbfgs”, “sag” and “saga” handle L2 or no penalty
- ”liblinear” and “saga” also handle L1 penalty
- ”saga” also supports “elasticnet” penalty
- ”liblinear” does not support setting penalty=’none’
Defaults to “lbfgs”.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “penalty”: [“l2”], “C”: Real(0.01, 10),}
model_family	ModelFamily.LINEAR_MODEL
modifies_features	True
modifies_target	False
name	Logistic Regression Classifier
predict_uses_y	False
supported_problem_types	[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.LogTransformer(random_seed=0)[source]¶

Applies a log transformation to the target data.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	False
modifies_target	True
name	Log Transformer

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits the LogTransformer.
`fit_transform`	Log transforms the target variable.
`inverse_transform`	Inverts the transformation done by the transform method.
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Log transforms the target variable.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits the LogTransformer.

Parameters

X (pd.DataFrame or np.ndarray) – Ignored.
y (pd.Series, optional) – Ignored.

Returns

self

fit_transform(self, X, y=None)[source]¶

Log transforms the target variable.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to log transform.

Returns

The input features are returned without modification. The target: variable y is log transformed.

Return type

tuple of pd.DataFrame, pd.Series

inverse_transform(self, y)[source]¶

Inverts the transformation done by the transform method.

Arguments:
y (pd.Series): Target transformed by this component.

Returns: Target without the transformation.
Return type: pd.Seriesø

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Log transforms the target variable.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target data to log transform.

Returns

The input features are returned without modification. The target: variable y is log transformed.

Return type

tuple of pd.DataFrame, pd.Series

class evalml.pipelines.components.LSA(random_seed=0, **kwargs)[source]¶

Transformer to calculate the Latent Semantic Analysis Values of text input.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	LSA Transformer

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X by applying the LSA pipeline.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X by applying the LSA pipeline.

Parameters

X (pd.DataFrame) – The data to transform.
y (pd.Series, optional) – Ignored.

Returns

Transformed X. The original column is removed and replaced with two columns of the: format LSA(original_column_name)[feature_number], where feature_number is 0 or 1.

Return type

pd.DataFrame

class evalml.pipelines.components.OneHotEncoder(top_n=10, features_to_encode=None, categories=None, drop='if_binary', handle_unknown='ignore', handle_missing='error', random_seed=0, **kwargs)[source]¶

A transformer that encodes categorical features in a one-hot numeric array.

Parameters

top_n (int) – Number of categories per column to encode. If None, all categories will be encoded. Otherwise, the n most frequent will be encoded and all others will be dropped. Defaults to 10.
features_to_encode (list[str]) – List of columns to encode. All other columns will remain untouched. If None, all appropriate columns will be encoded. Defaults to None.
categories (list) – A two dimensional list of categories, where categories[i] is a list of the categories for the column at index i. This can also be None, or “auto” if top_n is not None. Defaults to None.
drop (string, list) – Method (“first” or “if_binary”) to use to drop one category per feature. Can also be a list specifying which categories to drop for each feature. Defaults to ‘if_binary’.
handle_unknown (string) – Whether to ignore or error for unknown categories for a feature encountered during fit or transform. If either top_n or categories is used to limit the number of categories per column, this must be “ignore”. Defaults to “ignore”.
handle_missing (string) – Options for how to handle missing (NaN) values encountered during fit or transform. If this is set to “as_category” and NaN values are within the n most frequent, “nan” values will be encoded as their own column. If this is set to “error”, any missing values encountered will raise an error. Defaults to “error”.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	One Hot Encoder

Methods

`categories`	Returns a list of the unique categories to be encoded for the particular feature, in order.
`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`get_feature_names`	Return feature names for the categorical features after fitting.
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	One-hot encode the input data.

categories(self, feature_name)[source]¶

Returns a list of the unique categories to be encoded for the particular feature, in order.

Parameters: feature_name (str) – the name of any feature provided to one-hot encoder during fit
Returns: the unique categories, in the same dtype as they were provided during fit
Return type: np.ndarray

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

get_feature_names(self)[source]¶

Return feature names for the categorical features after fitting.

Feature names are formatted as {column name}_{category name}. In the event of a duplicate name, an integer will be added at the end of the feature name to distinguish it.

For example, consider a dataframe with a column called “A” and category “x_y” and another column called “A_x” with “y”. In this example, the feature names would be “A_x_y” and “A_x_y_1”.

Returns: The feature names after encoding, provided in the same order as input_features.
Return type: np.ndarray

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

One-hot encode the input data.

Parameters

X (pd.DataFrame) – Features to one-hot encode.
y (pd.Series) – Ignored.

Returns

Transformed data, where each categorical feature has been encoded into numerical columns using one-hot encoding.

Return type

pd.DataFrame

class evalml.pipelines.components.Oversampler(sampling_ratio=0.25, sampling_ratio_dict=None, k_neighbors_default=5, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

SMOTE Oversampler component. Will automatically select whether to use SMOTE, SMOTEN, or SMOTENC based on inputs to the component.

Parameters

sampling_ratio (float) – This is the goal ratio of the minority to majority class, with range (0, 1]. A value of 0.25 means we want a 1:4 ratio of the minority to majority class after oversampling. We will create the a sampling dictionary using this ratio, with the keys corresponding to the class and the values responding to the number of samples. Defaults to 0.25.
sampling_ratio_dict (dict) – A dictionary specifying the desired balanced ratio for each target value. For instance, in a binary case where class 1 is the minority, we could specify: sampling_ratio_dict={0: 0.5, 1: 1}, which means we would undersample class 0 to have twice the number of samples as class 1 (minority:majority ratio = 0.5), and don’t sample class 1. Overrides sampling_ratio if provided. Defaults to None.
k_neighbors_default (int) – The number of nearest neighbors used to construct synthetic samples. This is the default value used, but the actual k_neighbors value might be smaller if there are less samples. Defaults to 5.
n_jobs (int) – The number of CPU cores to use. Defaults to -1.
random_seed (int) – The seed to use for random sampling. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	True
name	Oversampler

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits the sampler to the data.
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms the input data by sampling the data.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y)[source]¶

Fits the sampler to the data.

Parameters

X (pd.DataFrame) – Input features.
y (pd.Series) – Target.

Returns

self

fit_transform(self, X, y)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)¶

Transforms the input data by sampling the data.

Parameters

X (pd.DataFrame) – Training features.
y (pd.Series) – Target.

Returns

Transformed features and target.

Return type

pd.DataFrame, pd.Series

class evalml.pipelines.components.PCA(variance=0.95, n_components=None, random_seed=0, **kwargs)[source]¶

Reduces the number of features by using Principal Component Analysis (PCA).

Parameters

variance (float) – The percentage of the original data variance that should be preserved when reducing the number of features. Defaults to 0.95.
n_components (int) – The number of features to maintain after computing SVD. Defaults to None, but will override variance variable if set.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	Real(0.25, 1)}:type: {“variance”
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	PCA Transformer

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)[source]¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.PerColumnImputer(impute_strategies=None, default_impute_strategy='most_frequent', random_seed=0, **kwargs)[source]¶

Imputes missing data according to a specified imputation strategy per column.

Parameters

impute_strategies (dict) – Column and {“impute_strategy”: strategy, “fill_value”:value} pairings. Valid values for impute strategy include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to None, which uses “most_frequent” for all columns. When impute_strategy == “constant”, fill_value is used to replace missing data. When None, uses 0 when imputing numerical data and “missing_value” for strings or object data types.
default_impute_strategy (str) – Impute strategy to fall back on when none is provided for a certain column. Valid values include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to “most_frequent”.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	Per Column Imputer

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits imputers on input data
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms input data by imputing missing values.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits imputers on input data

Parameters

X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features] to fit.
y (pd.Series, optional) – The target training data of length [n_samples]. Ignored.

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms input data by imputing missing values.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features] to transform.
y (pd.Series, optional) – The target training data of length [n_samples]. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.PolynomialDetrender(degree=1, random_seed=0, **kwargs)[source]¶

Removes trends from time series by fitting a polynomial to the data.

Parameters

degree (int) – Degree for the polynomial. If 1, linear model is fit to the data. If 2, quadratic model is fit, etc. Defaults to 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “degree”: Integer(1, 3)}
model_family	ModelFamily.NONE
modifies_features	False
modifies_target	True
name	Polynomial Detrender

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits the PolynomialDetrender.
`fit_transform`	Removes fitted trend from target variable.
`inverse_transform`	Adds back fitted trend to target variable.
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Removes fitted trend from target variable.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits the PolynomialDetrender.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to detrend.

Returns

self

fit_transform(self, X, y=None)[source]¶

Removes fitted trend from target variable.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to detrend.

Returns

The first element are the input features returned without modification.: The second element is the target variable y with the fitted trend removed.

Return type

tuple of pd.DataFrame, pd.Series

inverse_transform(self, y)[source]¶

Adds back fitted trend to target variable.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable.

Returns

The first element are the input features returned without modification.: The second element is the target variable y with the trend added back.

Return type

tuple of pd.DataFrame, pd.Series

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Removes fitted trend from target variable.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to detrend.

Returns

The input features are returned without modification. The target: variable y is detrended

Return type

tuple of pd.DataFrame, pd.Series

class evalml.pipelines.components.ProphetRegressor(date_index=None, changepoint_prior_scale=0.05, seasonality_prior_scale=10, holidays_prior_scale=10, seasonality_mode='additive', random_seed=0, stan_backend='CMDSTANPY', **kwargs)[source]¶

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

More information here: https://facebook.github.io/prophet/

Attributes

hyperparameter_ranges	{ “changepoint_prior_scale”: Real(0.001, 0.5), “seasonality_prior_scale”: Real(0.01, 10), “holidays_prior_scale”: Real(0.01, 10), “seasonality_mode”: [“additive”, “multiplicative”],}
model_family	ModelFamily.PROPHET
modifies_features	True
modifies_target	False
name	Prophet Regressor
predict_uses_y	False
supported_problem_types	[ProblemTypes.TIME_SERIES_REGRESSION]

Methods

`build_prophet_df`
`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns array of 0’s with len(1) as feature_importance is not defined for Prophet regressor.
`fit`	Fits component to data
`get_params`
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

static build_prophet_df(X, y=None, date_column='ds')[source]¶

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶: Returns array of 0’s with len(1) as feature_importance is not defined for Prophet regressor.

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

get_params(self)[source]¶

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X, y=None)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.RandomForestClassifier(n_estimators=100, max_depth=6, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Random Forest Classifier.

Parameters

n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “n_estimators”: Integer(10, 1000), “max_depth”: Integer(1, 10),}
model_family	ModelFamily.RANDOM_FOREST
modifies_features	True
modifies_target	False
name	Random Forest Classifier
predict_uses_y	False
supported_problem_types	[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.RandomForestRegressor(n_estimators=100, max_depth=6, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Random Forest Regressor.

Parameters

n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “n_estimators”: Integer(10, 1000), “max_depth”: Integer(1, 32),}
model_family	ModelFamily.RANDOM_FOREST
modifies_features	True
modifies_target	False
name	Random Forest Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.RFClassifierSelectFromModel(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold=- np.inf, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Selects top features based on importance weights using a Random Forest classifier.

Parameters

number_features (int) – The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. Defaults to None.
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
percent_features (float) – Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5.
threshold (string or float) – The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median”, then the threshold value is the median of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. Defaults to -np.inf.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “percent_features”: Real(0.01, 1), “threshold”: [“mean”, -np.inf],}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	RF Classifier Select From Model

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`get_names`	Get names of selected features.
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

get_names(self)¶

Get names of selected features.

Returns: List of the names of features selected
Return type: list[str]

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)¶

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.RFRegressorSelectFromModel(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold=- np.inf, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Selects top features based on importance weights using a Random Forest regressor.

Parameters

number_features (int) – The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. Defaults to None.
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
percent_features (float) – Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5.
threshold (string or float) – The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median”, then the threshold value is the median of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. Defaults to -np.inf.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “percent_features”: Real(0.01, 1), “threshold”: [“mean”, -np.inf],}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	RF Regressor Select From Model

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`get_names`	Get names of selected features.
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

get_names(self)¶

Get names of selected features.

Returns: List of the names of features selected
Return type: list[str]

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)¶

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.SelectByType(column_types=None, random_seed=0, **kwargs)[source]¶

Selects columns by specified Woodwork logical type or semantic tag in input data.

Parameters

column_types (string, ww.LogicalType, list(string), list(ww.LogicalType)) – List of Woodwork types or tags, used to determine which columns to select.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	Select Columns By Type Transformer
needs_fitting	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits the transformer by checking if column names are present in the dataset.
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)¶

Fits the transformer by checking if column names are present in the dataset.

Parameters

X (pd.DataFrame) – Data to check.
y (pd.Series, optional) – Targets.

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.SelectColumns(columns=None, random_seed=0, **kwargs)[source]¶

Selects specified columns in input data.

Parameters

columns (list(string)) – List of column names, used to determine which columns to select.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	Select Columns Transformer
needs_fitting	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits the transformer by checking if column names are present in the dataset.
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X by selecting columns.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)¶

Fits the transformer by checking if column names are present in the dataset.

Parameters

X (pd.DataFrame) – Data to check.
y (pd.Series, optional) – Targets.

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X by selecting columns.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Targets.

Returns

Transformed X.

Return type

pd.DataFrame

class evalml.pipelines.components.SimpleImputer(impute_strategy='most_frequent', fill_value=None, random_seed=0, **kwargs)[source]¶

Imputes missing data according to a specified imputation strategy.

Parameters

impute_strategy (string) – Impute strategy to use. Valid values include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types.
fill_value (string) – When impute_strategy == “constant”, fill_value is used to replace missing data. Defaults to 0 when imputing numerical data and “missing_value” for strings or object data types.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “impute_strategy”: [“mean”, “median”, “most_frequent”]}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	Simple Imputer

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms input by imputing missing values. ‘None’ and np.nan values are treated as the same.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are: treated as the same.

Parameters

X (pd.DataFrame or np.ndarray) – the input training data of shape [n_samples, n_features]
y (pd.Series, optional) – the target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)[source]¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms input by imputing missing values. ‘None’ and np.nan values are treated as the same.

Parameters

X (pd.DataFrame) – Data to transform
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.SklearnStackedEnsembleClassifier(input_pipelines=None, final_estimator=None, cv=None, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Scikit-learn Stacked Ensemble Classifier.

Parameters

input_pipelines (list(PipelineBase or subclass obj)) – List of pipeline instances to use as the base estimators. This must not be None or an empty list or else EnsembleMissingPipelinesError will be raised.
final_estimator (Estimator or subclass) – The classifier used to combine the base estimators. If None, uses LogisticRegressionClassifier.
cv (int, cross-validation generator or an iterable) –
Determines the cross-validation splitting strategy used to train final_estimator. For int/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. Defaults to None. Possible inputs for cv are:
- None: 3-fold cross validation
- int: the number of folds in a (Stratified) KFold
- An scikit-learn cross-validation generator object
- An iterable yielding (train, test) splits
n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Defaults to -1. - Note: there could be some multi-process errors thrown for values of n_jobs != 1. If this is the case, please use n_jobs = 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.ENSEMBLE
modifies_features	True
modifies_target	False
name	Sklearn Stacked Ensemble Classifier
predict_uses_y	False
supported_problem_types	[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for stacked ensemble classes.
`describe`	Describe a component and its parameters
`feature_importance`	Not implemented for SklearnStackedEnsembleClassifier and SklearnStackedEnsembleRegressor
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for stacked ensemble classes.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶: Not implemented for SklearnStackedEnsembleClassifier and SklearnStackedEnsembleRegressor

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.SklearnStackedEnsembleRegressor(input_pipelines=None, final_estimator=None, cv=None, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Scikit-learn Stacked Ensemble Regressor.

Parameters

input_pipelines (list(PipelineBase or subclass obj)) – List of pipeline instances to use as the base estimators. This must not be None or an empty list or else EnsembleMissingPipelinesError will be raised.
final_estimator (Estimator or subclass) – The regressor used to combine the base estimators. If None, uses LinearRegressor.
cv (int, cross-validation generator or an iterable) –
Determines the cross-validation splitting strategy used to train final_estimator. For int/None inputs, KFold is used. Defaults to None. Possible inputs for cv are:
- None: 3-fold cross validation
- int: the number of folds in a (Stratified) KFold
- An scikit-learn cross-validation generator object
- An iterable yielding (train, test) splits
n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Defaults to -1. - Note: there could be some multi-process errors thrown for values of n_jobs != 1. If this is the case, please use n_jobs = 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.ENSEMBLE
modifies_features	True
modifies_target	False
name	Sklearn Stacked Ensemble Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for stacked ensemble classes.
`describe`	Describe a component and its parameters
`feature_importance`	Not implemented for SklearnStackedEnsembleClassifier and SklearnStackedEnsembleRegressor
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for stacked ensemble classes.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶: Not implemented for SklearnStackedEnsembleClassifier and SklearnStackedEnsembleRegressor

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.StandardScaler(random_seed=0, **kwargs)[source]¶

A transformer that standardizes input features by removing the mean and scaling to unit variance.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	Standard Scaler

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)[source]¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.SVMClassifier(C=1.0, kernel='rbf', gamma='auto', probability=True, random_seed=0, **kwargs)[source]¶

Support Vector Machine Classifier.

Parameters

C (float) – The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0.
kernel ({"poly", "rbf", "sigmoid"}) – Specifies the kernel type to be used in the algorithm. Defaults to “rbf”.
gamma ({"scale", "auto"} or float) – Kernel coefficient for “rbf”, “poly” and “sigmoid”. Defaults to “auto”. - If gamma=’scale’ is passed then it uses 1 / (n_features * X.var()) as value of gamma - If “auto” (default), uses 1 / n_features
probability (boolean) – Whether to enable probability estimates. Defaults to True.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “C”: Real(0, 10), “kernel”: [“poly”, “rbf”, “sigmoid”], “gamma”: [“scale”, “auto”],}
model_family	ModelFamily.SVM
modifies_features	True
modifies_target	False
name	SVM Classifier
predict_uses_y	False
supported_problem_types	[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Feature importance only works with linear kernels.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶: Feature importance only works with linear kernels. If the kernel isn’t linear, we return a numpy array of zeros

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.SVMRegressor(C=1.0, kernel='rbf', gamma='auto', random_seed=0, **kwargs)[source]¶

Support Vector Machine Regressor.

Parameters

C (float) – The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0.
kernel ({"poly", "rbf", "sigmoid"}) – Specifies the kernel type to be used in the algorithm. Defaults to “rbf”.
gamma ({"scale", "auto"} or float) – Kernel coefficient for “rbf”, “poly” and “sigmoid”. Defaults to “auto”. - If gamma=’scale’ is passed then it uses 1 / (n_features * X.var()) as value of gamma - If “auto” (default), uses 1 / n_features
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “C”: Real(0, 10), “kernel”: [“poly”, “rbf”, “sigmoid”], “gamma”: [“scale”, “auto”],}
model_family	ModelFamily.SVM
modifies_features	True
modifies_target	False
name	SVM Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Feature importance only works with linear kernels.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶: Feature importance only works with linear kernels. If the kernel isn’t linear, we return a numpy array of zeros

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.TargetEncoder(cols=None, smoothing=1.0, handle_unknown='value', handle_missing='value', random_seed=0, **kwargs)[source]¶

A transformer that encodes categorical features into target encodings.

Parameters

cols (list) – Columns to encode. If None, all string columns will be encoded, otherwise only the columns provided will be encoded. Defaults to None
smoothing (float) – The smoothing factor to apply. The larger this value is, the more influence the expected target value has on the resulting target encodings. Must be strictly larger than 0. Defaults to 1.0
handle_unknown (string) – Determines how to handle unknown categories for a feature encountered. Options are ‘value’, ‘error’, nd ‘return_nan’. Defaults to ‘value’, which replaces with the target mean
handle_missing (string) – Determines how to handle missing values encountered during fit or transform. Options are ‘value’, ‘error’, and ‘return_nan’. Defaults to ‘value’, which replaces with the target mean
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	Target Encoder

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`get_feature_names`	Return feature names for the input features after fitting.
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y)[source]¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

get_feature_names(self)[source]¶

Return feature names for the input features after fitting.

Returns: The feature names after encoding
Return type: np.array

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.TargetImputer(impute_strategy='most_frequent', fill_value=None, random_seed=0, **kwargs)[source]¶

Imputes missing target data according to a specified imputation strategy.

Parameters

impute_strategy (string) – Impute strategy to use. Valid values include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to “most_frequent”.
fill_value (string) – When impute_strategy == “constant”, fill_value is used to replace missing data. Defaults to None which uses 0 when imputing numerical data and “missing_value” for strings or object data types.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “impute_strategy”: [“mean”, “median”, “most_frequent”]}
model_family	ModelFamily.NONE
modifies_features	False
modifies_target	True
name	Target Imputer

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits imputer to target data. ‘None’ values are converted to np.nan before imputation and are
`fit_transform`	Fits on and transforms the input target data.
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms input target data by imputing missing values. ‘None’ and np.nan values are treated as the same.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y)[source]¶

Fits imputer to target data. ‘None’ values are converted to np.nan before imputation and are: treated as the same.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]. Ignored.
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y)[source]¶

Fits on and transforms the input target data.

Parameters

X (pd.DataFrame) – Features. Ignored.
y (pd.Series) – Target data to impute.

Returns

The original X, transformed y

Return type

(pd.DataFrame, pd.Series)

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y)[source]¶

Transforms input target data by imputing missing values. ‘None’ and np.nan values are treated as the same.

Parameters

X (pd.DataFrame) – Features. Ignored.
y (pd.Series) – Target data to impute.

Returns

The original X, transformed y

Return type

(pd.DataFrame, pd.Series)

class evalml.pipelines.components.TextFeaturizer(random_seed=0, **kwargs)[source]¶

Transformer that can automatically featurize text columns using featuretools’ nlp_primitives.

Since models cannot handle non-numeric data, any text must be broken down into features that provide useful information about that text. This component splits each text column into several informative features: Diversity Score, Mean Characters per Word, Polarity Score, and LSA (Latent Semantic Analysis). Calling transform on this component will replace any text columns in the given dataset with these numeric columns.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	Text Featurization Component

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X by creating new features using existing text columns

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X by creating new features using existing text columns

Parameters

X (pd.DataFrame) – The data to transform.
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.TimeSeriesBaselineEstimator(gap=1, random_seed=0, **kwargs)[source]¶

Time series estimator that predicts using the naive forecasting approach.

This is useful as a simple baseline estimator for time series problems.

Parameters

gap (int) – Gap between prediction date and target date and must be a positive integer. If gap is 0, target date will be shifted ahead by 1 time period. Defaults to 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.BASELINE
modifies_features	True
modifies_target	False
name	Time Series Baseline Estimator
predict_uses_y	True
supported_problem_types	[ ProblemTypes.TIME_SERIES_REGRESSION, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Since baseline estimators do not use input features to calculate predictions, returns an array of zeroes.

Returns: an array of zeroes
Return type: np.ndarray (float)

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X, y=None)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X, y=None)[source]¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.Transformer(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]¶

A component that may or may not need fitting that transforms data. These components are used before an estimator.

To implement a new Transformer, define your own class which is a subclass of Transformer, including a name and a list of acceptable ranges for any parameters to be tuned during the automl search (hyperparameters). Define an __init__ method which sets up any necessary state and objects. Make sure your __init__ only uses standard keyword arguments and calls super().__init__() with a parameters dict. You may also override the fit, transform, fit_transform and other methods in this class if appropriate.

To see some examples, check out the definitions of any Transformer component.

Parameters

parameters (dict) – Dictionary of parameters for the component. Defaults to None.
component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`name`	Returns string name of this component
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)[source]¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

property name(cls)¶: Returns string name of this component

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.Undersampler(sampling_ratio=0.25, sampling_ratio_dict=None, min_samples=100, min_percentage=0.1, random_seed=0, **kwargs)[source]¶

Initializes an undersampling transformer to downsample the majority classes in the dataset.

This component is only run during training and not during predict.

Parameters

sampling_ratio (float) – The smallest minority:majority ratio that is accepted as ‘balanced’. For instance, a 1:4 ratio would be represented as 0.25, while a 1:1 ratio is 1.0. Must be between 0 and 1, inclusive. Defaults to 0.25.
sampling_ratio_dict (dict) – A dictionary specifying the desired balanced ratio for each target value. For instance, in a binary case where class 1 is the minority, we could specify: sampling_ratio_dict={0: 0.5, 1: 1}, which means we would undersample class 0 to have twice the number of samples as class 1 (minority:majority ratio = 0.5), and don’t sample class 1. Overrides sampling_ratio if provided. Defaults to None.
min_samples (int) – The minimum number of samples that we must have for any class, pre or post sampling. If a class must be downsampled, it will not be downsampled past this value. To determine severe imbalance, the minority class must occur less often than this and must have a class ratio below min_percentage. Must be greater than 0. Defaults to 100.
min_percentage (float) – The minimum percentage of the minimum class to total dataset that we tolerate, as long as it is above min_samples. If min_percentage and min_samples are not met, treat this as severely imbalanced, and we will not resample the data. Must be between 0 and 0.5, inclusive. Defaults to 0.1.
random_seed (int) – The seed to use for random sampling. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	True
name	Undersampler

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits the sampler to the data.
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms the input data by sampling the data.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y)¶

Fits the sampler to the data.

Parameters

X (pd.DataFrame) – Input features.
y (pd.Series) – Target.

Returns

self

fit_transform(self, X, y)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)[source]¶

Transforms the input data by sampling the data.

Parameters

X (pd.DataFrame) – Training features.
y (pd.Series) – Target.

Returns

Transformed features and target.

Return type

pd.DataFrame, pd.Series

class evalml.pipelines.components.URLFeaturizer(random_seed=0, **kwargs)[source]¶

Transformer that can automatically extract features from URL.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.NONE
modifies_features	True
modifies_target	False
name	URL Featurizer

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`fit`	Fits component to data
`fit_transform`	Fits on X and transforms X
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`save`	Saves component at file path
`transform`	Transforms data X.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

transform(self, X, y=None)¶

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.XGBoostClassifier(eta=0.1, max_depth=6, min_child_weight=1, n_estimators=100, random_seed=0, n_jobs=- 1, **kwargs)[source]¶

XGBoost Classifier.

Parameters

eta (float) – Boosting learning rate. Defaults to 0.1.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
min_child_weight (float) – Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0
n_estimators (int) – Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100.
random_seed (int) – Seed for the random number generator. Defaults to 0.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.

Attributes

hyperparameter_ranges	{ “eta”: Real(0.000001, 1), “max_depth”: Integer(1, 10), “min_child_weight”: Real(1, 10), “n_estimators”: Integer(1, 1000),}
model_family	ModelFamily.XGBOOST
modifies_features	True
modifies_target	False
name	XGBoost Classifier
predict_uses_y	False
SEED_MAX	None
SEED_MIN	None
supported_problem_types	[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)[source]¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.XGBoostRegressor(eta=0.1, max_depth=6, min_child_weight=1, n_estimators=100, random_seed=0, n_jobs=- 1, **kwargs)[source]¶

XGBoost Regressor.

Parameters

eta (float) – Boosting learning rate. Defaults to 0.1.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
min_child_weight (float) – Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0
n_estimators (int) – Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100.
random_seed (int) – Seed for the random number generator. Defaults to 0.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.

Attributes

hyperparameter_ranges	{ “eta”: Real(0.000001, 1), “max_depth”: Integer(1, 20), “min_child_weight”: Real(1, 10), “n_estimators”: Integer(1, 1000),}
model_family	ModelFamily.XGBOOST
modifies_features	True
modifies_target	False
name	XGBoost Regressor
predict_uses_y	False
SEED_MAX	None
SEED_MIN	None
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

Pipelines ensemble