preprocessing¶

Preprocessing transformer components.

Submodules¶

Package Contents¶

Classes Summary¶

`DateTimeFeaturizer`	Transformer that can automatically extract features from datetime columns.
`DelayedFeatureTransformer`	Transformer that delays input features and target variable for time series problems.
`DFSTransformer`	Featuretools DFS component that generates features for the input features.
`DropNullColumns`	Transformer to drop features whose percentage of NaN values exceeds a specified threshold.
`DropRowsTransformer`	Transformer to drop rows specified by row indices.
`EmailFeaturizer`	Transformer that can automatically extract features from emails.
`LogTransformer`	Applies a log transformation to the target data.
`LSA`	Transformer to calculate the Latent Semantic Analysis Values of text input.
`PolynomialDetrender`	Removes trends from time series by fitting a polynomial to the data.
`TextFeaturizer`	Transformer that can automatically featurize text columns using featuretools’ nlp_primitives.
`TextTransformer`	Base class for all transformers working with text features.
`URLFeaturizer`	Transformer that can automatically extract features from URL.

Contents¶

class evalml.pipelines.components.transformers.preprocessing.DateTimeFeaturizer(features_to_extract=None, encode_as_categories=False, date_index=None, random_seed=0, **kwargs)[source]¶

Transformer that can automatically extract features from datetime columns.

Parameters

features_to_extract (list) – List of features to extract. Valid options include “year”, “month”, “day_of_week”, “hour”. Defaults to None.
encode_as_categories (bool) – Whether day-of-week and month features should be encoded as pandas “category” dtype. This allows OneHotEncoders to encode these features. Defaults to False.
date_index (str) – Name of the column containing the datetime information used to order the data. Ignored.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	DateTime Featurization Component
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fit the datetime featurizer component.
`fit_transform`	Fits on X and transforms X.
`get_feature_names`	Gets the categories of each datetime feature.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X by creating new features using existing DateTime columns, and then dropping those DateTime columns.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]¶

Fit the datetime featurizer component.

Parameters

X (pd.DataFrame) – Input features.
y (pd.Series, optional) – Target data. Ignored.

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

get_feature_names(self)[source]¶

Gets the categories of each datetime feature.

Returns

Dictionary, where each key-value pair is a column name and a dictionary: mapping the unique feature values to their integer encoding.

Return type

dict

static load(file_path)¶

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)¶

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)¶: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]¶

Transforms data X by creating new features using existing DateTime columns, and then dropping those DateTime columns.

Parameters

X (pd.DataFrame) – Input features.
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.preprocessing.DelayedFeatureTransformer(date_index=None, max_delay=2, gap=0, forecast_horizon=1, delay_features=True, delay_target=True, random_seed=0, **kwargs)[source]¶

Transformer that delays input features and target variable for time series problems.

Parameters

date_index (str) – Name of the column containing the datetime information used to order the data. Ignored.
max_delay (int) – Maximum number of time units to delay each feature. Defaults to 2.
forecast_horizon (int) – The number of time periods the pipeline is expected to forecast.
delay_features (bool) – Whether to delay the input features. Defaults to True.
delay_target (bool) – Whether to delay the target. Defaults to True.
gap (int) – The number of time units between when the features are collected and when the target is collected. For example, if you are predicting the next time step’s target, gap=1. This is only needed because when gap=0, we need to be sure to start the lagging of the target variable at 1. Defaults to 1.
random_seed (int) – Seed for the random number generator. This transformer performs the same regardless of the random seed provided.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Delayed Feature Transformer
needs_fitting	False
target_colname_prefix	target_delay_{}
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the DelayFeatureTransformer.
`fit_transform`	Fit the component and transform the input data.
`load`	Loads component at file path.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Computes the delayed features for all features in X and y.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits the DelayFeatureTransformer.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y)[source]¶

Fit the component and transform the input data.

Parameters

X (pd.DataFrame or None) – Data to transform. None is expected when only the target variable is being used.
y (pd.Series, or None) – Target.

Returns

Transformed X.

Return type

pd.DataFrame

static load(file_path)¶

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

property parameters(self)¶: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]¶

Computes the delayed features for all features in X and y.

For each feature in X, it will add a column to the output dataframe for each delay in the (inclusive) range [1, max_delay]. The values of each delayed feature are simply the original feature shifted forward in time by the delay amount. For example, a delay of 3 units means that the feature value at row n will be taken from the n-3rd row of that feature

If y is not None, it will also compute the delayed values for the target variable.

Parameters

X (pd.DataFrame or None) – Data to transform. None is expected when only the target variable is being used.
y (pd.Series, or None) – Target.

Returns

Transformed X.

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.preprocessing.DFSTransformer(index='index', random_seed=0, **kwargs)[source]¶

Featuretools DFS component that generates features for the input features.

Parameters

index (string) – The name of the column that contains the indices. If no column with this name exists, then featuretools.EntitySet() creates a column with this name to serve as the index column. Defaults to ‘index’.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	DFS Transformer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the DFSTransformer Transformer component.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Computes the feature matrix for the input X using featuretools’ dfs algorithm.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits the DFSTransformer Transformer component.

Parameters

X (pd.DataFrame, np.array) – The input data to transform, of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)¶

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)¶

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)¶: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]¶

Computes the feature matrix for the input X using featuretools’ dfs algorithm.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data to transform. Has shape [n_samples, n_features]
y (pd.Series, optional) – Ignored.

Returns

Feature matrix

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.preprocessing.DropNullColumns(pct_null_threshold=1.0, random_seed=0, **kwargs)[source]¶

Transformer to drop features whose percentage of NaN values exceeds a specified threshold.

Parameters

pct_null_threshold (float) – The percentage of NaN values in an input feature to drop. Must be a value between [0, 1] inclusive. If equal to 0.0, will drop columns with any null values. If equal to 1.0, will drop columns with all null values. Defaults to 0.95.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Drop Null Columns Transformer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X by dropping columns that exceed the threshold of null values.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)¶

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)¶

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)¶: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]¶

Transforms data X by dropping columns that exceed the threshold of null values.

Parameters

X (pd.DataFrame) – Data to transform
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.preprocessing.DropRowsTransformer(indices_to_drop=None, random_seed=0)[source]¶

Transformer to drop rows specified by row indices.

Parameters

indices_to_drop (list) – List of indices to drop in the input data. Defaults to None.
random_seed (int) – Seed for the random number generator. Is not used by this component. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	True
name	Drop Rows Transformer
training_only	True

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data using fitted component.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If indices to drop do not exist in input features or target.

fit_transform(self, X, y=None)¶

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)¶

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)¶

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)¶: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]¶

Transforms data using fitted component.

Parameters

X (pd.DataFrame) – Features.
y (pd.Series, optional) – Target data.

Returns

Data with row indices dropped.

Return type

(pd.DataFrame, pd.Series)

class evalml.pipelines.components.transformers.preprocessing.EmailFeaturizer(random_seed=0, **kwargs)[source]¶

Transformer that can automatically extract features from emails.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Email Featurizer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)¶

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)¶

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)¶

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)¶

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)¶: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)¶

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

class evalml.pipelines.components.transformers.preprocessing.LogTransformer(random_seed=0)[source]¶

Applies a log transformation to the target data.

Attributes

hyperparameter_ranges	{}
modifies_features	False
modifies_target	True
name	Log Transformer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the LogTransformer.
`fit_transform`	Log transforms the target variable.
`inverse_transform`	Apply exponential to target data.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Log transforms the target variable.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits the LogTransformer.

Parameters

X (pd.DataFrame or np.ndarray) – Ignored.
y (pd.Series, optional) – Ignored.

Returns

self

fit_transform(self, X, y=None)[source]¶

Log transforms the target variable.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to log transform.

Returns

The input features are returned without modification. The target: variable y is log transformed.

Return type

tuple of pd.DataFrame, pd.Series

inverse_transform(self, y)[source]¶

Apply exponential to target data.

Parameters: y (pd.Series) – Target variable.
Returns: Target with exponential applied.
Return type: pd.Series

static load(file_path)¶

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)¶

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)¶: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]¶

Log transforms the target variable.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target data to log transform.

Returns

The input features are returned without modification. The target: variable y is log transformed.

Return type

tuple of pd.DataFrame, pd.Series

class evalml.pipelines.components.transformers.preprocessing.LSA(random_seed=0, **kwargs)[source]¶

Transformer to calculate the Latent Semantic Analysis Values of text input.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	LSA Transformer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the input data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X by applying the LSA pipeline.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits the input data.

Parameters

X (pd.DataFrame) – The data to transform.
y (pd.Series, optional) – Ignored.

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)¶

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)¶

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)¶: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]¶

Transforms data X by applying the LSA pipeline.

Parameters

X (pd.DataFrame) – The data to transform.
y (pd.Series, optional) – Ignored.

Returns

Transformed X. The original column is removed and replaced with two columns of the: format LSA(original_column_name)[feature_number], where feature_number is 0 or 1.

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.preprocessing.PolynomialDetrender(degree=1, random_seed=0, **kwargs)[source]¶

Removes trends from time series by fitting a polynomial to the data.

Parameters

degree (int) – Degree for the polynomial. If 1, linear model is fit to the data. If 2, quadratic model is fit, etc. Defaults to 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “degree”: Integer(1, 3)}
modifies_features	False
modifies_target	True
name	Polynomial Detrender
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits the PolynomialDetrender.
`fit_transform`	Removes fitted trend from target variable.
`inverse_transform`	Adds back fitted trend to target variable.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Removes fitted trend from target variable.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits the PolynomialDetrender.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to detrend.

Returns

self

Raises

ValueError – If y is None.

fit_transform(self, X, y=None)[source]¶

Removes fitted trend from target variable.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to detrend.

Returns

The first element are the input features returned without modification.: The second element is the target variable y with the fitted trend removed.

Return type

tuple of pd.DataFrame, pd.Series

inverse_transform(self, y)[source]¶

Adds back fitted trend to target variable.

Parameters

y (pd.Series) – Target variable.

Returns

The first element are the input features returned without modification.: The second element is the target variable y with the trend added back.

Return type

tuple of pd.DataFrame, pd.Series

Raises

ValueError – If y is None.

static load(file_path)¶

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)¶

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)¶: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]¶

Removes fitted trend from target variable.

Parameters

X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to detrend.

Returns

The input features are returned without modification. The target: variable y is detrended

Return type

tuple of pd.DataFrame, pd.Series

class evalml.pipelines.components.transformers.preprocessing.TextFeaturizer(random_seed=0, **kwargs)[source]¶

Transformer that can automatically featurize text columns using featuretools’ nlp_primitives.

Since models cannot handle non-numeric data, any text must be broken down into features that provide useful information about that text. This component splits each text column into several informative features: Diversity Score, Mean Characters per Word, Polarity Score, and LSA (Latent Semantic Analysis). Calling transform on this component will replace any text columns in the given dataset with these numeric columns.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	Text Featurization Component
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X by creating new features using existing text columns.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]¶

Fits component to data.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y=None)¶

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)¶

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)¶

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)¶: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]¶

Transforms data X by creating new features using existing text columns.

Parameters

X (pd.DataFrame) – The data to transform.
y (pd.Series, optional) – Ignored.

Returns

Transformed X

Return type

pd.DataFrame

class evalml.pipelines.components.transformers.preprocessing.TextTransformer(component_obj=None, random_seed=0, **kwargs)[source]¶

Base class for all transformers working with text features.

Parameters

component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

modifies_features	True
modifies_target	False
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`name`	Returns string name of this component.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)¶

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)¶

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)¶

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

property name(cls)¶: Returns string name of this component.

needs_fitting(self)¶

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)¶: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

abstract transform(self, X, y=None)¶

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

class evalml.pipelines.components.transformers.preprocessing.URLFeaturizer(random_seed=0, **kwargs)[source]¶

Transformer that can automatically extract features from URL.

Parameters: random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	False
name	URL Featurizer
training_only	False

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits component to data.
`fit_transform`	Fits on X and transforms X.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms data X.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)¶

Fits component to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

Raises

MethodPropertyNotFoundError – If component does not have a fit method or a component_obj that implements fit.

fit_transform(self, X, y=None)¶

Fits on X and transforms X.

Parameters

X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.

Returns

Transformed X.

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

static load(file_path)¶

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)¶

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)¶: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)¶

Transforms data X.

Parameters

X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.

Returns

Transformed X

Return type

pd.DataFrame

Raises

MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.

target_imputer

datetime_featurizer