transformers#

Components that transform data.

Submodules#

Package Contents#

Classes Summary#

`DateTimeFeaturizer`	Transformer that can automatically extract features from datetime columns.
`DFSTransformer`	Featuretools DFS component that generates features for the input features.
`DropColumns`	Drops specified columns in input data.
`DropNaNRowsTransformer`	Transformer to drop rows with NaN values.
`DropNullColumns`	Transformer to drop features whose percentage of NaN values exceeds a specified threshold.
`DropRowsTransformer`	Transformer to drop rows specified by row indices.
`EmailFeaturizer`	Transformer that can automatically extract features from emails.
`FeatureSelector`	Selects top features based on importance weights.
`Imputer`	Imputes missing data according to a specified imputation strategy.
`LabelEncoder`	A transformer that encodes target labels using values between 0 and num_classes - 1.
`LinearDiscriminantAnalysis`	Reduces the number of features by using Linear Discriminant Analysis.
`LogTransformer`	Applies a log transformation to the target data.
`LSA`	Transformer to calculate the Latent Semantic Analysis Values of text input.
`NaturalLanguageFeaturizer`	Transformer that can automatically featurize text columns using featuretools' nlp_primitives.
`OneHotEncoder`	A transformer that encodes categorical features in a one-hot numeric array.
`Oversampler`	SMOTE Oversampler component. Will automatically select whether to use SMOTE, SMOTEN, or SMOTENC based on inputs to the component.
`PCA`	Reduces the number of features by using Principal Component Analysis (PCA).
`PerColumnImputer`	Imputes missing data according to a specified imputation strategy per column.
`PolynomialDetrender`	Removes trends from time series by fitting a polynomial to the data.
`ReplaceNullableTypes`	Transformer to replace features with the new nullable dtypes with a dtype that is compatible in EvalML.
`RFClassifierSelectFromModel`	Selects top features based on importance weights using a Random Forest classifier.
`RFRegressorSelectFromModel`	Selects top features based on importance weights using a Random Forest regressor.
`SelectByType`	Selects columns by specified Woodwork logical type or semantic tag in input data.
`SelectColumns`	Selects specified columns in input data.
`SimpleImputer`	Imputes missing data according to a specified imputation strategy. Natural language columns are ignored.
`StandardScaler`	A transformer that standardizes input features by removing the mean and scaling to unit variance.
`TargetEncoder`	A transformer that encodes categorical features into target encodings.
`TargetImputer`	Imputes missing target data according to a specified imputation strategy.
`TimeSeriesFeaturizer`	Transformer that delays input features and target variable for time series problems.
`TimeSeriesImputer`	Imputes missing data according to a specified timeseries-specific imputation strategy.
`TimeSeriesRegularizer`	Transformer that regularizes an inconsistently spaced datetime column.
`Transformer`	A component that may or may not need fitting that transforms data. These components are used before an estimator.
`Undersampler`	Initializes an undersampling transformer to downsample the majority classes in the dataset.
`URLFeaturizer`	Transformer that can automatically extract features from URL.

Contents#

class evalml.pipelines.components.transformers.DateTimeFeaturizer(features_to_extract=None, encode_as_categories=False, time_index=None, random_seed=0, **kwargs)[source]#

Transformer that can automatically extract features from datetime columns.

Parameters

features_to_extract (list) – List of features to extract. Valid options include “year”, “month”, “day_of_week”, “hour”. Defaults to None.
encode_as_categories (bool) – Whether day-of-week and month features should be encoded as pandas “category” dtype. This allows OneHotEncoders to encode these features. Defaults to False.
time_index (str) – Name of the column containing the datetime information used to order the data. Ignored.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	DateTime Featurizer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fit the datetime featurizer component.
`fit_transform`	Fits on X and transforms X.
`get_feature_names`	Gets the categories of each datetime feature.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X by creating new features using existing DateTime columns, and then dropping those DateTime columns.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fit the datetime featurizer component.

Parameters

X (pd.DataFrame) – Input features.
y (pd.Series, optional) – Target data. Ignored.

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

get_feature_names(self)[source]#

Gets the categories of each datetime feature.

Returns

Dictionary, where each key-value pair is a column name and a dictionary: mapping the unique feature values to their integer encoding.

Return type

dict

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by creating new features using existing DateTime columns, and then dropping those DateTime columns.

Parameters

X (pd.DataFrame) – Input features.
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.DFSTransformer(index='index', features=None, random_seed=0, **kwargs)[source]#

Featuretools DFS component that generates features for the input features.

Parameters

index (string) – The name of the column that contains the indices. If no column with this name exists, then featuretools.EntitySet() creates a column with this name to serve as the index column. Defaults to ‘index’.
random_seed (int) – Seed for the random number generator. Defaults to 0.
features (list) – List of features to run DFS on. Defaults to None. Features will only be computed if the columns used by the feature exist in the input and if the feature itself is not in input.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	DFS Transformer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the DFSTransformer Transformer component.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Computes the feature matrix for the input X using featuretools' dfs algorithm.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the DFSTransformer Transformer component.

Parameters

X (pd.DataFrame, np.array) – The input data to transform, of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Computes the feature matrix for the input X using featuretools’ dfs algorithm.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data to transform. Has shape [n_samples, n_features]
y (pd.Series, optional) – Ignored.

Returns

Feature matrix

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.DropColumns(columns=None, random_seed=0, **kwargs)[source]#

Drops specified columns in input data.

Parameters

columns (list(string)) – List of column names, used to determine which columns to drop.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Drop Columns Transformer
needs_fitting	False
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the transformer by checking if column names are present in the dataset.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X by dropping columns.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits the transformer by checking if column names are present in the dataset.

Parameters

X (pd.DataFrame) – Data to check.
y (pd.Series, ignored) – Targets.

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by dropping columns.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Targets.

Returns

Transformed X.

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.DropNaNRowsTransformer(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]#

Transformer to drop rows with NaN values.

Parameters: random_seed (int) – Seed for the random number generator. Is not used by this component. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	True
name	Drop NaN Rows Transformer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data using fitted component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data using fitted component.

Parameters

X (pd.DataFrame) – Features.
y (pd.Series, optional) – Target data.

Returns

Data with NaN rows dropped.

Return type

(pd.DataFrame, pd.Series)

class evalml.pipelines.components.transformers.DropNullColumns(pct_null_threshold=1.0, random_seed=0, **kwargs)[source]#

Transformer to drop features whose percentage of NaN values exceeds a specified threshold.

Parameters

pct_null_threshold (float) – The percentage of NaN values in an input feature to drop. Must be a value between [0, 1] inclusive. If equal to 0.0, will drop columns with any null values. If equal to 1.0, will drop columns with all null values. Defaults to 0.95.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Drop Null Columns Transformer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X by dropping columns that exceed the threshold of null values.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by dropping columns that exceed the threshold of null values.

Parameters

X (pd.DataFrame) – Data to transform
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.DropRowsTransformer(indices_to_drop=None, random_seed=0)[source]#

Transformer to drop rows specified by row indices.

Parameters

indices_to_drop (list) – List of indices to drop in the input data. Defaults to None.
random_seed (int) – Seed for the random number generator. Is not used by this component. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	True
name	Drop Rows Transformer
training_only	True

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data using fitted component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If indices to drop do not exist in input features or target.

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data using fitted component.

Parameters

X (pd.DataFrame) – Features.
y (pd.Series, optional) – Target data.

Returns

Data with row indices dropped.

Return type

(pd.DataFrame, pd.Series)

class evalml.pipelines.components.transformers.EmailFeaturizer(random_seed=0, **kwargs)[source]#

Transformer that can automatically extract features from emails.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Email Featurizer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)#

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

class evalml.pipelines.components.transformers.FeatureSelector(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]#

Selects top features based on importance weights.

Parameters

parameters (dict) – Dictionary of parameters for the component. Defaults to None.
component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

modifies_features	True
modifies_target	False
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fit and transform data using the feature selector.
`get_names`	Get names of selected features.
`load`	Loads component at file path.
`name`	Returns string name of this component.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)[source]#

Fit and transform data using the feature selector.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

get_names(self)[source]#

Get names of selected features.

Returns: List of the names of features selected.
Return type: list[str]

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

property name(cls)#: Returns string name of this component.

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If feature selector does not have a transform method or a component_obj that implements transform

class evalml.pipelines.components.transformers.Imputer(categorical_impute_strategy='most_frequent', categorical_fill_value=None, numeric_impute_strategy='mean', numeric_fill_value=None, random_seed=0, **kwargs)[source]#

Imputes missing data according to a specified imputation strategy.

Parameters

categorical_impute_strategy (string) – Impute strategy to use for string, object, boolean, categorical dtypes. Valid values include “most_frequent” and “constant”.
numeric_impute_strategy (string) – Impute strategy to use for numeric columns. Valid values include “mean”, “median”, “most_frequent”, and “constant”.
categorical_fill_value (string) – When categorical_impute_strategy == “constant”, fill_value is used to replace missing data. The default value of None will fill with the string “missing_value”.
numeric_fill_value (int, float) – When numeric_impute_strategy == “constant”, fill_value is used to replace missing data. The default value of None will fill with 0.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “categorical_impute_strategy”: [“most_frequent”], “numeric_impute_strategy”: [“mean”, “median”, “most_frequent”],}
modifies_features	True
modifies_target	False
name	Imputer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits imputer to data. 'None' values are converted to np.nan before imputation and are treated as the same.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X by imputing missing values. 'None' values are converted to np.nan before imputation and are treated as the same.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are treated as the same.

Parameters

X (pd.DataFrame, np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by imputing missing values. ‘None’ values are converted to np.nan before imputation and are treated as the same.

Parameters

X (pd.DataFrame) – Data to transform
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.LabelEncoder(positive_label=None, random_seed=0, **kwargs)[source]#

A transformer that encodes target labels using values between 0 and num_classes - 1.

Parameters

positive_label (int, str) – The label for the class that should be treated as positive (1) for binary classification problems. Ignored for multiclass problems. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0. Ignored.

Attributes

hyperparameter_ranges	{}
modifies_features	False
modifies_target	True
name	Label Encoder
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the label encoder.
`fit_transform`	Fit and transform data using the label encoder.
`inverse_transform`	Decodes the target data.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transform the target using the fitted label encoder.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)[source]#

Fits the label encoder.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features]. Ignored.
y (pd.Series) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If input y is None.

fit_transform(self, X, y)[source]#

Fit and transform data using the label encoder.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].

Returns

The original features and an encoded version of the target.

Return type

pd.DataFrame, pd.Series

inverse_transform(self, y)[source]#

Decodes the target data.

Parameters: y (pd.Series) – Target data.
Returns: The decoded version of the target.
Return type: pd.Series
Raises: ValueError – If input y is None.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transform the target using the fitted label encoder.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features]. Ignored.
y (pd.Series) – The target training data of length [n_samples].

Returns

The original features and an encoded version of the target.

Return type

pd.DataFrame, pd.Series

Raises

ValueError – If input y is None.

class evalml.pipelines.components.transformers.LinearDiscriminantAnalysis(n_components=None, random_seed=0, **kwargs)[source]#

Reduces the number of features by using Linear Discriminant Analysis.

Parameters

n_components (int) – The number of features to maintain after computation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Linear Discriminant Analysis Transformer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the LDA component.
`fit_transform`	Fit and transform data using the LDA component.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transform data using the fitted LDA component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)[source]#

Fits the LDA component.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If input data is not all numeric.

fit_transform(self, X, y=None)[source]#

Fit and transform data using the LDA component.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

Raises

ValueError – If input data is not all numeric.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transform data using the fitted LDA component.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

Raises

ValueError – If input data is not all numeric.

class evalml.pipelines.components.transformers.LogTransformer(random_seed=0)[source]#

Applies a log transformation to the target data.

Attributes

hyperparameter_ranges	{}
modifies_features	False
modifies_target	True
name	Log Transformer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the LogTransformer.
`fit_transform`	Log transforms the target variable.
`inverse_transform`	Apply exponential to target data.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Log transforms the target variable.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the LogTransformer.

Parameters

X (pd.DataFrame or np.ndarray) – Ignored.
y (pd.Series, optional) – Ignored.

Returns

self

fit_transform(self, X, y=None)[source]#

Log transforms the target variable.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to log transform.

Returns

The input features are returned without modification. The target: variable y is log transformed.

Return type

tuple of pd.DataFrame, pd.Series

inverse_transform(self, y)[source]#

Apply exponential to target data.

Parameters: y (pd.Series) – Target variable.
Returns: Target with exponential applied.
Return type: pd.Series

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Log transforms the target variable.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target data to log transform.

Returns

The input features are returned without modification. The target: variable y is log transformed.

Return type

tuple of pd.DataFrame, pd.Series

class evalml.pipelines.components.transformers.LSA(random_seed=0, **kwargs)[source]#

Transformer to calculate the Latent Semantic Analysis Values of text input.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	LSA Transformer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the input data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X by applying the LSA pipeline.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the input data.

Parameters

X (pd.DataFrame) – The data to transform.
y (pd.Series, optional) – Ignored.

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by applying the LSA pipeline.

Parameters

X (pd.DataFrame) – The data to transform.
y (pd.Series, optional) – Ignored.

Returns

Transformed X. The original column is removed and replaced with two columns of the: format LSA(original_column_name)[feature_number], where feature_number is 0 or 1.

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.NaturalLanguageFeaturizer(random_seed=0, **kwargs)[source]#

Transformer that can automatically featurize text columns using featuretools’ nlp_primitives.

Since models cannot handle non-numeric data, any text must be broken down into features that provide useful information about that text. This component splits each text column into several informative features: Diversity Score, Mean Characters per Word, Polarity Score, LSA (Latent Semantic Analysis), Number of Characters, and Number of Words. Calling transform on this component will replace any text columns in the given dataset with these numeric columns.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Natural Language Featurizer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X by creating new features using existing text columns.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits component to data.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by creating new features using existing text columns.

Parameters

X (pd.DataFrame) – The data to transform.
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.OneHotEncoder(top_n=10, features_to_encode=None, categories=None, drop='if_binary', handle_unknown='ignore', handle_missing='error', random_seed=0, **kwargs)[source]#

A transformer that encodes categorical features in a one-hot numeric array.

Parameters

top_n (int) – Number of categories per column to encode. If None, all categories will be encoded. Otherwise, the n most frequent will be encoded and all others will be dropped. Defaults to 10.
features_to_encode (list[str]) – List of columns to encode. All other columns will remain untouched. If None, all appropriate columns will be encoded. Defaults to None.
categories (list) – A two dimensional list of categories, where categories[i] is a list of the categories for the column at index i. This can also be None, or “auto” if top_n is not None. Defaults to None.
drop (string, list) – Method (“first” or “if_binary”) to use to drop one category per feature. Can also be a list specifying which categories to drop for each feature. Defaults to ‘if_binary’.
handle_unknown (string) – Whether to ignore or error for unknown categories for a feature encountered during fit or transform. If either top_n or categories is used to limit the number of categories per column, this must be “ignore”. Defaults to “ignore”.
handle_missing (string) – Options for how to handle missing (NaN) values encountered during fit or transform. If this is set to “as_category” and NaN values are within the n most frequent, “nan” values will be encoded as their own column. If this is set to “error”, any missing values encountered will raise an error. Defaults to “error”.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	One Hot Encoder
training_only	False

Methods

`categories`	Returns a list of the unique categories to be encoded for the particular feature, in order.
`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the one-hot encoder component.
`fit_transform`	Fits on X and transforms X.
`get_feature_names`	Return feature names for the categorical features after fitting.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	One-hot encode the input data.

categories(self, feature_name)[source]#

Returns a list of the unique categories to be encoded for the particular feature, in order.

Parameters: feature_name (str) – The name of any feature provided to one-hot encoder during fit.
Returns: The unique categories, in the same dtype as they were provided during fit.
Return type: np.ndarray
Raises: ValueError – If feature was not provided to one-hot encoder as a training feature.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the one-hot encoder component.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If encoding a column failed.

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

get_feature_names(self)[source]#

Return feature names for the categorical features after fitting.

Feature names are formatted as {column name}_{category name}. In the event of a duplicate name, an integer will be added at the end of the feature name to distinguish it.

For example, consider a dataframe with a column called “A” and category “x_y” and another column called “A_x” with “y”. In this example, the feature names would be “A_x_y” and “A_x_y_1”.

Returns: The feature names after encoding, provided in the same order as input_features.
Return type: np.ndarray

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

One-hot encode the input data.

Parameters

X (pd.DataFrame) – Features to one-hot encode.
y (pd.Series) – Ignored.

Returns

Transformed data, where each categorical feature has been encoded into numerical columns using one-hot encoding.

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.Oversampler(sampling_ratio=0.25, sampling_ratio_dict=None, k_neighbors_default=5, n_jobs=- 1, random_seed=0, **kwargs)[source]#

SMOTE Oversampler component. Will automatically select whether to use SMOTE, SMOTEN, or SMOTENC based on inputs to the component.

Parameters

sampling_ratio (float) – This is the goal ratio of the minority to majority class, with range (0, 1]. A value of 0.25 means we want a 1:4 ratio of the minority to majority class after oversampling. We will create the a sampling dictionary using this ratio, with the keys corresponding to the class and the values responding to the number of samples. Defaults to 0.25.
sampling_ratio_dict (dict) – A dictionary specifying the desired balanced ratio for each target value. For instance, in a binary case where class 1 is the minority, we could specify: sampling_ratio_dict={0: 0.5, 1: 1}, which means we would undersample class 0 to have twice the number of samples as class 1 (minority:majority ratio = 0.5), and don’t sample class 1. Overrides sampling_ratio if provided. Defaults to None.
k_neighbors_default (int) – The number of nearest neighbors used to construct synthetic samples. This is the default value used, but the actual k_neighbors value might be smaller if there are less samples. Defaults to 5.
n_jobs (int) – The number of CPU cores to use. Defaults to -1.
random_seed (int) – The seed to use for random sampling. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	True
name	Oversampler
training_only	True

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits oversampler to data.
`fit_transform`	Fit and transform data using the sampler component.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms the input data by sampling the data.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)[source]#

Fits oversampler to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y)#

Fit and transform data using the sampler component.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

(pd.DataFrame, pd.Series)

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)#

Transforms the input data by sampling the data.

Parameters

X (pd.DataFrame) – Training features.
y (pd.Series) – Target.

Returns

Transformed features and target.

Return type

pd.DataFrame, pd.Series

class evalml.pipelines.components.transformers.PCA(variance=0.95, n_components=None, random_seed=0, **kwargs)[source]#

Reduces the number of features by using Principal Component Analysis (PCA).

Parameters

variance (float) – The percentage of the original data variance that should be preserved when reducing the number of features. Defaults to 0.95.
n_components (int) – The number of features to maintain after computing SVD. Defaults to None, but will override variance variable if set.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	Real(0.25, 1)}:type: {“variance”
modifies_features	True
modifies_target	False
name	PCA Transformer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the PCA component.
`fit_transform`	Fit and transform data using the PCA component.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transform data using fitted PCA component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the PCA component.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If input data is not all numeric.

fit_transform(self, X, y=None)[source]#

Fit and transform data using the PCA component.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

Raises

ValueError – If input data is not all numeric.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transform data using fitted PCA component.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

Raises

ValueError – If input data is not all numeric.

class evalml.pipelines.components.transformers.PerColumnImputer(impute_strategies=None, random_seed=0, **kwargs)[source]#

Imputes missing data according to a specified imputation strategy per column.

Parameters

impute_strategies (dict) – Column and {“impute_strategy”: strategy, “fill_value”:value} pairings. Valid values for impute strategy include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to None, which uses “most_frequent” for all columns. When impute_strategy == “constant”, fill_value is used to replace missing data. When None, uses 0 when imputing numerical data and “missing_value” for strings or object data types.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Per Column Imputer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits imputers on input data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms input data by imputing missing values.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits imputers on input data.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features] to fit.
y (pd.Series, optional) – The target training data of length [n_samples]. Ignored.

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms input data by imputing missing values.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features] to transform.
y (pd.Series, optional) – The target training data of length [n_samples]. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.PolynomialDetrender(degree=1, random_seed=0, **kwargs)[source]#

Removes trends from time series by fitting a polynomial to the data.

Parameters

degree (int) – Degree for the polynomial. If 1, linear model is fit to the data. If 2, quadratic model is fit, etc. Defaults to 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “degree”: Integer(1, 3)}
modifies_features	False
modifies_target	True
name	Polynomial Detrender
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the PolynomialDetrender.
`fit_transform`	Removes fitted trend from target variable.
`inverse_transform`	Adds back fitted trend to target variable.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Removes fitted trend from target variable.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the PolynomialDetrender.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to detrend.

Returns

self

Raises

ValueError – If y is None.

fit_transform(self, X, y=None)[source]#

Removes fitted trend from target variable.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to detrend.

Returns

The first element are the input features returned without modification.: The second element is the target variable y with the fitted trend removed.

Return type

tuple of pd.DataFrame, pd.Series

inverse_transform(self, y)[source]#

Adds back fitted trend to target variable.

Parameters

y (pd.Series) – Target variable.

Returns

The first element are the input features returned without modification.: The second element is the target variable y with the trend added back.

Return type

tuple of pd.DataFrame, pd.Series

Raises

ValueError – If y is None.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Removes fitted trend from target variable.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to detrend.

Returns

The input features are returned without modification. The target: variable y is detrended

Return type

tuple of pd.DataFrame, pd.Series

class evalml.pipelines.components.transformers.ReplaceNullableTypes(random_seed=0, **kwargs)[source]#

Transformer to replace features with the new nullable dtypes with a dtype that is compatible in EvalML.

Attributes

hyperparameter_ranges	None
modifies_features	True
modifies_target	{}
name	Replace Nullable Types Transformer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Substitutes non-nullable types for the new pandas nullable types in the data and target data.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data by replacing columns that contain nullable types with the appropriate replacement type.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y=None)[source]#

Substitutes non-nullable types for the new pandas nullable types in the data and target data.

Parameters

X (pd.DataFrame, optional) – Input features.
y (pd.Series) – Target data.

Returns

The input features and target data with the non-nullable types set.

Return type

tuple of pd.DataFrame, pd.Series

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data by replacing columns that contain nullable types with the appropriate replacement type.

“float64” for nullable integers and “category” for nullable booleans.

Parameters

X (pd.DataFrame) – Data to transform
y (pd.Series, optional) – Target data to transform

Returns

Transformed X pd.Series: Transformed y

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.RFClassifierSelectFromModel(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold='median', n_jobs=- 1, random_seed=0, **kwargs)[source]#

Selects top features based on importance weights using a Random Forest classifier.

Parameters

number_features (int) – The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. Defaults to None.
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
percent_features (float) – Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5.
threshold (string or float) – The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median”, then the threshold value is the median of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. Defaults to -np.inf.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “percent_features”: Real(0.01, 1), “threshold”: [“mean”, “median”],}
modifies_features	True
modifies_target	False
name	RF Classifier Select From Model
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fit and transform data using the feature selector.
`get_names`	Get names of selected features.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)#

Fit and transform data using the feature selector.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

get_names(self)#

Get names of selected features.

Returns: List of the names of features selected.
Return type: list[str]

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)#

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If feature selector does not have a transform method or a component_obj that implements transform

class evalml.pipelines.components.transformers.RFRegressorSelectFromModel(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold='median', n_jobs=- 1, random_seed=0, **kwargs)[source]#

Selects top features based on importance weights using a Random Forest regressor.

Parameters

number_features (int) – The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. Defaults to None.
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
percent_features (float) – Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5.
threshold (string or float) – The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median”, then the threshold value is the median of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. Defaults to -np.inf.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “percent_features”: Real(0.01, 1), “threshold”: [“mean”, “median”],}
modifies_features	True
modifies_target	False
name	RF Regressor Select From Model
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fit and transform data using the feature selector.
`get_names`	Get names of selected features.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)#

Fit and transform data using the feature selector.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

get_names(self)#

Get names of selected features.

Returns: List of the names of features selected.
Return type: list[str]

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)#

Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data. Ignored.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If feature selector does not have a transform method or a component_obj that implements transform

class evalml.pipelines.components.transformers.SelectByType(column_types=None, exclude=False, random_seed=0, **kwargs)[source]#

Selects columns by specified Woodwork logical type or semantic tag in input data.

Parameters

column_types (string, ww.LogicalType, list(string), list(ww.LogicalType)) – List of Woodwork types or tags, used to determine which columns to select or exclude.
exclude (bool) – If true, exclude the column_types instead of including them. Defaults to False.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Select Columns By Type Transformer
needs_fitting	False
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the transformer by checking if column names are present in the dataset.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X by selecting columns.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the transformer by checking if column names are present in the dataset.

Parameters

X (pd.DataFrame) – Data to check.
y (pd.Series, ignored) – Targets.

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by selecting columns.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Targets.

Returns

Transformed X.

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.SelectColumns(columns=None, random_seed=0, **kwargs)[source]#

Selects specified columns in input data.

Parameters

columns (list(string)) – List of column names, used to determine which columns to select. If columns are not present, they will not be selected.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Select Columns Transformer
needs_fitting	False
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the transformer by checking if column names are present in the dataset.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transform data using fitted column selector component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the transformer by checking if column names are present in the dataset.

Parameters

X (pd.DataFrame) – Data to check.
y (pd.Series, optional) – Targets.

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)#

Transform data using fitted column selector component.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.SimpleImputer(impute_strategy='most_frequent', fill_value=None, random_seed=0, **kwargs)[source]#

Imputes missing data according to a specified imputation strategy. Natural language columns are ignored.

Parameters

impute_strategy (string) – Impute strategy to use. Valid values include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types.
fill_value (string) – When impute_strategy == “constant”, fill_value is used to replace missing data. Defaults to 0 when imputing numerical data and “missing_value” for strings or object data types.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “impute_strategy”: [“mean”, “median”, “most_frequent”]}
modifies_features	True
modifies_target	False
name	Simple Imputer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits imputer to data. 'None' values are converted to np.nan before imputation and are treated as the same.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms input by imputing missing values. 'None' and np.nan values are treated as the same.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are treated as the same.

Parameters

X (pd.DataFrame or np.ndarray) – the input training data of shape [n_samples, n_features]
y (pd.Series, optional) – the target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)[source]#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms input by imputing missing values. ‘None’ and np.nan values are treated as the same.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.StandardScaler(random_seed=0, **kwargs)[source]#

A transformer that standardizes input features by removing the mean and scaling to unit variance.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Standard Scaler
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fit and transform data using the standard scaler component.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transform data using the fitted standard scaler.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)[source]#

Fit and transform data using the standard scaler component.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transform data using the fitted standard scaler.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.TargetEncoder(cols=None, smoothing=1, handle_unknown='value', handle_missing='value', random_seed=0, **kwargs)[source]#

A transformer that encodes categorical features into target encodings.

Parameters

cols (list) – Columns to encode. If None, all string columns will be encoded, otherwise only the columns provided will be encoded. Defaults to None
smoothing (float) – The smoothing factor to apply. The larger this value is, the more influence the expected target value has on the resulting target encodings. Must be strictly larger than 0. Defaults to 1.0
handle_unknown (string) – Determines how to handle unknown categories for a feature encountered. Options are ‘value’, ‘error’, nd ‘return_nan’. Defaults to ‘value’, which replaces with the target mean
handle_missing (string) – Determines how to handle missing values encountered during fit or transform. Options are ‘value’, ‘error’, and ‘return_nan’. Defaults to ‘value’, which replaces with the target mean
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Target Encoder
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the target encoder.
`fit_transform`	Fit and transform data using the target encoder.
`get_feature_names`	Return feature names for the input features after fitting.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transform data using the fitted target encoder.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)[source]#

Fits the target encoder.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y)[source]#

Fit and transform data using the target encoder.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

get_feature_names(self)[source]#

Return feature names for the input features after fitting.

Returns: The feature names after encoding.
Return type: np.array

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transform data using the fitted target encoder.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.TargetImputer(impute_strategy='most_frequent', fill_value=None, random_seed=0, **kwargs)[source]#

Imputes missing target data according to a specified imputation strategy.

Parameters

impute_strategy (string) – Impute strategy to use. Valid values include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to “most_frequent”.
fill_value (string) – When impute_strategy == “constant”, fill_value is used to replace missing data. Defaults to None which uses 0 when imputing numerical data and “missing_value” for strings or object data types.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “impute_strategy”: [“mean”, “median”, “most_frequent”]}
modifies_features	False
modifies_target	True
name	Target Imputer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits imputer to target data. 'None' values are converted to np.nan before imputation and are treated as the same.
`fit_transform`	Fits on and transforms the input target data.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms input target data by imputing missing values. 'None' and np.nan values are treated as the same.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)[source]#

Fits imputer to target data. ‘None’ values are converted to np.nan before imputation and are treated as the same.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]. Ignored.
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

TypeError – If target is filled with all null values.

fit_transform(self, X, y)[source]#

Fits on and transforms the input target data.

Parameters

X (pd.DataFrame) – Features. Ignored.
y (pd.Series) – Target data to impute.

Returns

The original X, transformed y

Return type

(pd.DataFrame, pd.Series)

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y)[source]#

Transforms input target data by imputing missing values. ‘None’ and np.nan values are treated as the same.

Parameters

X (pd.DataFrame) – Features. Ignored.
y (pd.Series) – Target data to impute.

Returns

The original X, transformed y

Return type

(pd.DataFrame, pd.Series)

class evalml.pipelines.components.transformers.TimeSeriesFeaturizer(time_index=None, max_delay=2, gap=0, forecast_horizon=1, conf_level=0.05, rolling_window_size=0.25, delay_features=True, delay_target=True, random_seed=0, **kwargs)[source]#

Transformer that delays input features and target variable for time series problems.

This component uses an algorithm based on the autocorrelation values of the target variable to determine which lags to select from the set of all possible lags.

The algorithm is based on the idea that the local maxima of the autocorrelation function indicate the lags that have the most impact on the present time.

The algorithm computes the autocorrelation values and finds the local maxima, called “peaks”, that are significant at the given conf_level. Since lags in the range [0, 10] tend to be predictive but not local maxima, the union of the peaks is taken with the significant lags in the range [0, 10]. At the end, only selected lags in the range [0, max_delay] are used.

Parametrizing the algorithm by conf_level lets the AutoMLAlgorithm tune the set of lags chosen so that the chances of finding a good set of lags is higher.

Using conf_level value of 1 selects all possible lags.

Parameters

time_index (str) – Name of the column containing the datetime information used to order the data. Ignored.
max_delay (int) – Maximum number of time units to delay each feature. Defaults to 2.
forecast_horizon (int) – The number of time periods the pipeline is expected to forecast.
conf_level (float) – Float in range (0, 1] that determines the confidence interval size used to select which lags to compute from the set of [1, max_delay]. A delay of 1 will always be computed. If 1, selects all possible lags in the set of [1, max_delay], inclusive.
rolling_window_size (float) – Float in range (0, 1] that determines the size of the window used for rolling features. Size is computed as rolling_window_size * max_delay.
delay_features (bool) – Whether to delay the input features. Defaults to True.
delay_target (bool) – Whether to delay the target. Defaults to True.
gap (int) – The number of time units between when the features are collected and when the target is collected. For example, if you are predicting the next time step’s target, gap=1. This is only needed because when gap=0, we need to be sure to start the lagging of the target variable at 1. Defaults to 1.
random_seed (int) – Seed for the random number generator. This transformer performs the same regardless of the random seed provided.

Attributes

hyperparameter_ranges	Real(0.001, 1.0), “rolling_window_size”: Real(0.001, 1.0)}:type: {“conf_level”
modifies_features	True
modifies_target	False
name	Time Series Featurizer
needs_fitting	True
target_colname_prefix	target_delay_{}
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the DelayFeatureTransformer.
`fit_transform`	Fit the component and transform the input data.
`load`	Loads component at file path.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Computes the delayed values and rolling means for X and y.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the DelayFeatureTransformer.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

ValueError – if self.time_index is None

fit_transform(self, X, y=None)[source]#

Fit the component and transform the input data.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, or None) – Target.

Returns

Transformed X.

Return type

pd.DataFrame

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Computes the delayed values and rolling means for X and y.

The chosen delays are determined by the autocorrelation function of the target variable. See the class docstring for more information on how they are chosen. If y is None, all possible lags are chosen.

If y is not None, it will also compute the delayed values for the target variable.

The rolling means for all numeric features in X and y, if y is numeric, are also returned.

Parameters

X (pd.DataFrame or None) – Data to transform. None is expected when only the target variable is being used.
y (pd.Series, or None) – Target.

Returns

Transformed X. No original features are returned.

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.TimeSeriesImputer(categorical_impute_strategy='forwards_fill', numeric_impute_strategy='interpolate', target_impute_strategy='forwards_fill', random_seed=0, **kwargs)[source]#

Imputes missing data according to a specified timeseries-specific imputation strategy.

This Transformer should be used after the TimeSeriesRegularizer in order to impute the missing values that were added to X and y (if passed).

Parameters

categorical_impute_strategy (string) – Impute strategy to use for string, object, boolean, categorical dtypes. Valid values include “backwards_fill” and “forwards_fill”. Defaults to “forwards_fill”.
numeric_impute_strategy (string) – Impute strategy to use for numeric columns. Valid values include “backwards_fill”, “forwards_fill”, and “interpolate”. Defaults to “interpolate”.
target_impute_strategy (string) – Impute strategy to use for the target column. Valid values include “backwards_fill”, “forwards_fill”, and “interpolate”. Defaults to “forwards_fill”.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Raises

ValueError – If categorical_impute_strategy, numeric_impute_strategy, or target_impute_strategy is not one of the valid values.

Attributes

hyperparameter_ranges	{ “categorical_impute_strategy”: [“backwards_fill”, “forwards_fill”], “numeric_impute_strategy”: [“backwards_fill”, “forwards_fill”, “interpolate”], “target_impute_strategy”: [“backwards_fill”, “forwards_fill”, “interpolate”],}
modifies_features	True
modifies_target	True
name	Time Series Imputer
training_only	True

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits imputer to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X by imputing missing values using specified timeseries-specific strategies. 'None' values are converted to np.nan before imputation and are treated as the same.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits imputer to data.

‘None’ values are converted to np.nan before imputation and are treated as the same. If a value is missing at the beginning or end of a column, that value will be imputed using backwards fill or forwards fill as necessary, respectively.

Parameters

X (pd.DataFrame, np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms data X by imputing missing values using specified timeseries-specific strategies. ‘None’ values are converted to np.nan before imputation and are treated as the same.

Parameters

X (pd.DataFrame) – Data to transform
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.TimeSeriesRegularizer(time_index=None, frequency_payload=None, window_length=5, threshold=0.8, random_seed=0, **kwargs)[source]#

Transformer that regularizes an inconsistently spaced datetime column.

If X is passed in to fit/transform, the column time_index will be checked for an inferrable offset frequency. If the time_index column is perfectly inferrable then this Transformer will do nothing and return the original X and y.

If X does not have a perfectly inferrable frequency but one can be estimated, then X and y will be reformatted based on the estimated frequency for time_index. In the original X and y passed: - Missing datetime values will be added and will have their corresponding columns in X and y set to None. - Duplicate datetime values will be dropped. - Extra datetime values will be dropped. - If it can be determined that a duplicate or extra value is misaligned, then it will be repositioned to take the place of a missing value.

This Transformer should be used before the TimeSeriesImputer in order to impute the missing values that were added to X and y (if passed).

Parameters

time_index (string) – Name of the column containing the datetime information used to order the data, required. Defaults to None.
frequency_payload (tuple) – Payload returned from Woodwork’s infer_frequency function where debug is True. Defaults to None.
window_length (int) – The size of the rolling window over which inference is conducted to determine the prevalence of uninferrable frequencies.
5. (Lower values make this component more sensitive to recognizing numerous faulty datetime values. Defaults to) –
threshold (float) – The minimum percentage of windows that need to have been able to infer a frequency. Lower values make this component more
0.8. (sensitive to recognizing numerous faulty datetime values. Defaults to) –
random_seed (int) – Seed for the random number generator. This transformer performs the same regardless of the random seed provided.
0. (Defaults to) –

Raises

ValueError – if the frequency_payload parameter has not been passed a tuple

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	True
name	Time Series Regularizer
training_only	True

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the TimeSeriesRegularizer.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Regularizes a dataframe and target data to an inferrable offset frequency.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the TimeSeriesRegularizer.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – if self.time_index is None, if X and y have different lengths, if time_index in X does not have an offset frequency that can be estimated
TypeError – if the time_index column is not of type Datetime
KeyError – if the time_index column doesn’t exist

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Regularizes a dataframe and target data to an inferrable offset frequency.

A ‘clean’ X and y (if y was passed in) are created based on an inferrable offset frequency and matching datetime values with the original X and y are imputed into the clean X and y. Datetime values identified as misaligned are shifted into their appropriate position.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Data with an inferrable time_index offset frequency.

Return type

(pd.DataFrame, pd.Series)

class evalml.pipelines.components.transformers.Transformer(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]#

A component that may or may not need fitting that transforms data. These components are used before an estimator.

To implement a new Transformer, define your own class which is a subclass of Transformer, including a name and a list of acceptable ranges for any parameters to be tuned during the automl search (hyperparameters). Define an __init__ method which sets up any necessary state and objects. Make sure your __init__ only uses standard keyword arguments and calls super().__init__() with a parameters dict. You may also override the fit, transform, fit_transform and other methods in this class if appropriate.

To see some examples, check out the definitions of any Transformer component.

Parameters

parameters (dict) – Dictionary of parameters for the component. Defaults to None.
component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

modifies_features	True
modifies_target	False
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`name`	Returns string name of this component.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)[source]#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

property name(cls)#: Returns string name of this component.

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

abstract transform(self, X, y=None)[source]#

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

class evalml.pipelines.components.transformers.Undersampler(sampling_ratio=0.25, sampling_ratio_dict=None, min_samples=100, min_percentage=0.1, random_seed=0, **kwargs)[source]#

Initializes an undersampling transformer to downsample the majority classes in the dataset.

This component is only run during training and not during predict.

Parameters

sampling_ratio (float) – The smallest minority:majority ratio that is accepted as ‘balanced’. For instance, a 1:4 ratio would be represented as 0.25, while a 1:1 ratio is 1.0. Must be between 0 and 1, inclusive. Defaults to 0.25.
sampling_ratio_dict (dict) – A dictionary specifying the desired balanced ratio for each target value. For instance, in a binary case where class 1 is the minority, we could specify: sampling_ratio_dict={0: 0.5, 1: 1}, which means we would undersample class 0 to have twice the number of samples as class 1 (minority:majority ratio = 0.5), and don’t sample class 1. Overrides sampling_ratio if provided. Defaults to None.
min_samples (int) – The minimum number of samples that we must have for any class, pre or post sampling. If a class must be downsampled, it will not be downsampled past this value. To determine severe imbalance, the minority class must occur less often than this and must have a class ratio below min_percentage. Must be greater than 0. Defaults to 100.
min_percentage (float) – The minimum percentage of the minimum class to total dataset that we tolerate, as long as it is above min_samples. If min_percentage and min_samples are not met, treat this as severely imbalanced, and we will not resample the data. Must be between 0 and 0.5, inclusive. Defaults to 0.1.
random_seed (int) – The seed to use for random sampling. Defaults to 0.

Raises

ValueError – If sampling_ratio is not in the range (0, 1].
ValueError – If min_sample is not greater than 0.
ValueError – If min_percentage is not between 0 and 0.5, inclusive.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	True
name	Undersampler
training_only	True

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the sampler to the data.
`fit_resample`	Resampling technique for this sampler.
`fit_transform`	Fit and transform data using the sampler component.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms the input data by sampling the data.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)#

Fits the sampler to the data.

Parameters

X (pd.DataFrame) – Input features.
y (pd.Series) – Target.

Returns

self

Raises

ValueError – If y is None.

fit_resample(self, X, y)[source]#

Resampling technique for this sampler.

Parameters

X (pd.DataFrame) – Training data to fit and resample.
y (pd.Series) – Training data targets to fit and resample.

Returns

Indices to keep for training data.

Return type

list

fit_transform(self, X, y)#

Fit and transform data using the sampler component.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

(pd.DataFrame, pd.Series)

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transforms the input data by sampling the data.

Parameters

X (pd.DataFrame) – Training features.
y (pd.Series) – Target.

Returns

Transformed features and target.

Return type

pd.DataFrame, pd.Series

class evalml.pipelines.components.transformers.URLFeaturizer(random_seed=0, **kwargs)[source]#

Transformer that can automatically extract features from URL.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	URL Featurizer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)#

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)#

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)#

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)#: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)#

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

transformers#

Subpackages#

Submodules#

Package Contents#

Classes Summary#

Contents#