transformers¶
Subpackages¶
Submodules¶
Package Contents¶
Classes Summary¶
Transformer that can automatically extract features from datetime columns. |
|
Transformer that delays input features and target variable for time series problems. |
|
Featuretools DFS component that generates features for the input features. |
|
Drops specified columns in input data. |
|
Transformer to drop features whose percentage of NaN values exceeds a specified threshold. |
|
Transformer that can automatically extract features from emails. |
|
Selects top features based on importance weights. |
|
Imputes missing data according to a specified imputation strategy. |
|
Reduces the number of features by using Linear Discriminant Analysis. |
|
Applies a log transformation to the target data. |
|
Transformer to calculate the Latent Semantic Analysis Values of text input. |
|
A transformer that encodes categorical features in a one-hot numeric array. |
|
Reduces the number of features by using Principal Component Analysis (PCA). |
|
Imputes missing data according to a specified imputation strategy per column. |
|
Removes trends from time series by fitting a polynomial to the data. |
|
Selects top features based on importance weights using a Random Forest classifier. |
|
Selects top features based on importance weights using a Random Forest regressor. |
|
Selects columns by specified Woodwork logical type or semantic tag in input data. |
|
Selects specified columns in input data. |
|
Imputes missing data according to a specified imputation strategy. |
|
SMOTENC Oversampler component. Uses SMOTENC to generate synthetic samples. Works on a mix of nomerical and categorical columns. |
|
SMOTEN Oversampler component. Uses SMOTEN to generate synthetic samples. Works for purely categorical datasets. |
|
SMOTE Oversampler component. Works on numerical datasets only. This component is only run during training and not during predict. |
|
A transformer that standardizes input features by removing the mean and scaling to unit variance. |
|
A transformer that encodes categorical features into target encodings. |
|
Imputes missing target data according to a specified imputation strategy. |
|
Transformer that can automatically featurize text columns using featuretools’ nlp_primitives. |
|
A component that may or may not need fitting that transforms data. |
|
Initializes an undersampling transformer to downsample the majority classes in the dataset. |
|
Transformer that can automatically extract features from URL. |
Contents¶
-
class
evalml.pipelines.components.transformers.
DateTimeFeaturizer
(features_to_extract=None, encode_as_categories=False, date_index=None, random_seed=0, **kwargs)[source]¶ Transformer that can automatically extract features from datetime columns.
- Parameters
features_to_extract (list) – List of features to extract. Valid options include “year”, “month”, “day_of_week”, “hour”. Defaults to None.
encode_as_categories (bool) – Whether day-of-week and month features should be encoded as pandas “category” dtype. This allows OneHotEncoders to encode these features. Defaults to False.
date_index (str) – Name of the column containing the datetime information used to order the data. Ignored.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
DateTime Featurization Component
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Gets the categories of each datetime feature.
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X by creating new features using existing DateTime columns, and then dropping those DateTime columns
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
get_feature_names
(self)[source]¶ Gets the categories of each datetime feature.
- Returns
Dictionary, where each key-value pair is a column name and a dictionary mapping the unique feature values to their integer encoding.
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
DelayedFeatureTransformer
(date_index=None, max_delay=2, delay_features=True, delay_target=True, gap=1, random_seed=0, **kwargs)[source]¶ Transformer that delays input features and target variable for time series problems.
- Parameters
date_index (str) – Name of the column containing the datetime information used to order the data. Ignored.
max_delay (int) – Maximum number of time units to delay each feature. Defaults to 2.
delay_features (bool) – Whether to delay the input features. Defaults to True.
delay_target (bool) – Whether to delay the target. Defaults to True.
gap (int) – The number of time units between when the features are collected and when the target is collected. For example, if you are predicting the next time step’s target, gap=1. This is only needed because when gap=0, we need to be sure to start the lagging of the target variable at 1. Defaults to 1.
random_seed (int) – Seed for the random number generator. This transformer performs the same regardless of the random seed provided.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Delayed Feature Transformer
needs_fitting
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits the DelayFeatureTransformer.
Fits on X and transforms X
Loads component at file path
Returns the parameters which were used to initialize the component
Saves component at file path
Computes the delayed features for all features in X and y.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits the DelayFeatureTransformer.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)[source]¶ Computes the delayed features for all features in X and y.
For each feature in X, it will add a column to the output dataframe for each delay in the (inclusive) range [1, max_delay]. The values of each delayed feature are simply the original feature shifted forward in time by the delay amount. For example, a delay of 3 units means that the feature value at row n will be taken from the n-3rd row of that feature
If y is not None, it will also compute the delayed values for the target variable.
- Parameters
X (pd.DataFrame or None) – Data to transform. None is expected when only the target variable is being used.
y (pd.Series, or None) – Target.
- Returns
Transformed X.
- Return type
pd.DataFrame
-
class
evalml.pipelines.components.transformers.
DFSTransformer
(index='index', random_seed=0, **kwargs)[source]¶ Featuretools DFS component that generates features for the input features.
- Parameters
index (string) – The name of the column that contains the indices. If no column with this name exists, then featuretools.EntitySet() creates a column with this name to serve as the index column. Defaults to ‘index’.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
DFS Transformer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits the DFSTransformer Transformer component.
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Computes the feature matrix for the input X using featuretools’ dfs algorithm.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits the DFSTransformer Transformer component.
- Parameters
X (pd.DataFrame, np.array) – The input data to transform, of shape [n_samples, n_features]
y (pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)[source]¶ Computes the feature matrix for the input X using featuretools’ dfs algorithm.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data to transform. Has shape [n_samples, n_features]
y (pd.Series, optional) – Ignored.
- Returns
Feature matrix
- Return type
pd.DataFrame
-
class
evalml.pipelines.components.transformers.
DropColumns
(columns=None, random_seed=0, **kwargs)[source]¶ Drops specified columns in input data.
- Parameters
columns (list(string)) – List of column names, used to determine which columns to drop.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Drop Columns Transformer
needs_fitting
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits the transformer by checking if column names are present in the dataset.
Fits on X and transforms X
Loads component at file path
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X by dropping columns.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits the transformer by checking if column names are present in the dataset.
- Parameters
X (pd.DataFrame) – Data to check.
y (pd.Series, optional) – Targets.
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
DropNullColumns
(pct_null_threshold=1.0, random_seed=0, **kwargs)[source]¶ Transformer to drop features whose percentage of NaN values exceeds a specified threshold.
- Parameters
pct_null_threshold (float) – The percentage of NaN values in an input feature to drop. Must be a value between [0, 1] inclusive. If equal to 0.0, will drop columns with any null values. If equal to 1.0, will drop columns with all null values. Defaults to 0.95.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Drop Null Columns Transformer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X by dropping columns that exceed the threshold of null values.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
EmailFeaturizer
(random_seed=0, **kwargs)[source]¶ Transformer that can automatically extract features from emails.
- Parameters
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Email Featurizer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)¶ Transforms data X.
- Parameters
X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.
- Returns
Transformed X
- Return type
pd.DataFrame
-
class
evalml.pipelines.components.transformers.
FeatureSelector
(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]¶ Selects top features based on importance weights.
- Parameters
parameters (dict) – Dictionary of parameters for the component. Defaults to None.
component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Get names of selected features.
Loads component at file path
Returns string name of this component
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
get_names
(self)[source]¶ Get names of selected features.
- Returns
List of the names of features selected
- Return type
list[str]
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
property
name
(cls)¶ Returns string name of this component
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)[source]¶ Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.
- Parameters
X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data. Ignored.
- Returns
Transformed X
- Return type
pd.DataFrame
-
class
evalml.pipelines.components.transformers.
Imputer
(categorical_impute_strategy='most_frequent', categorical_fill_value=None, numeric_impute_strategy='mean', numeric_fill_value=None, random_seed=0, **kwargs)[source]¶ Imputes missing data according to a specified imputation strategy.
- Parameters
categorical_impute_strategy (string) – Impute strategy to use for string, object, boolean, categorical dtypes. Valid values include “most_frequent” and “constant”.
numeric_impute_strategy (string) – Impute strategy to use for numeric columns. Valid values include “mean”, “median”, “most_frequent”, and “constant”.
categorical_fill_value (string) – When categorical_impute_strategy == “constant”, fill_value is used to replace missing data. The default value of None will fill with the string “missing_value”.
numeric_fill_value (int, float) – When numeric_impute_strategy == “constant”, fill_value is used to replace missing data. The default value of None will fill with 0.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “categorical_impute_strategy”: [“most_frequent”], “numeric_impute_strategy”: [“mean”, “median”, “most_frequent”],}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Imputer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X by imputing missing values. ‘None’ values are converted to np.nan before imputation and are
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ - Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are
treated as the same.
- Parameters
X (pd.DataFrame, np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
LinearDiscriminantAnalysis
(n_components=None, random_seed=0, **kwargs)[source]¶ Reduces the number of features by using Linear Discriminant Analysis.
- Parameters
n_components (int) – The number of features to maintain after computation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Linear Discriminant Analysis Transformer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
LogTransformer
(random_seed=0)[source]¶ Applies a log transformation to the target data.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
False
modifies_target
True
name
Log Transformer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits the LogTransformer.
Log transforms the target variable.
Inverts the transformation done by the transform method.
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Log transforms the target variable.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits the LogTransformer.
- Parameters
X (pd.DataFrame or np.ndarray) – Ignored.
y (pd.Series, optional) – Ignored.
- Returns
self
-
fit_transform
(self, X, y=None)[source]¶ Log transforms the target variable.
- Parameters
X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to log transform.
- Returns
- The input features are returned without modification. The target
variable y is log transformed.
- Return type
tuple of pd.DataFrame, pd.Series
-
inverse_transform
(self, y)[source]¶ Inverts the transformation done by the transform method.
- Arguments:
y (pd.Series): Target transformed by this component.
- Returns
Target without the transformation.
- Return type
pd.Seriesø
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)[source]¶ Log transforms the target variable.
- Parameters
X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target data to log transform.
- Returns
- The input features are returned without modification. The target
variable y is log transformed.
- Return type
tuple of pd.DataFrame, pd.Series
-
-
class
evalml.pipelines.components.transformers.
LSA
(random_seed=0, **kwargs)[source]¶ Transformer to calculate the Latent Semantic Analysis Values of text input.
- Parameters
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
LSA Transformer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X by applying the LSA pipeline.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)[source]¶ Transforms data X by applying the LSA pipeline.
- Parameters
X (pd.DataFrame) – The data to transform.
y (pd.Series, optional) – Ignored.
- Returns
- Transformed X. The original column is removed and replaced with two columns of the
format LSA(original_column_name)[feature_number], where feature_number is 0 or 1.
- Return type
pd.DataFrame
-
class
evalml.pipelines.components.transformers.
OneHotEncoder
(top_n=10, features_to_encode=None, categories=None, drop='if_binary', handle_unknown='ignore', handle_missing='error', random_seed=0, **kwargs)[source]¶ A transformer that encodes categorical features in a one-hot numeric array.
- Parameters
top_n (int) – Number of categories per column to encode. If None, all categories will be encoded. Otherwise, the n most frequent will be encoded and all others will be dropped. Defaults to 10.
features_to_encode (list[str]) – List of columns to encode. All other columns will remain untouched. If None, all appropriate columns will be encoded. Defaults to None.
categories (list) – A two dimensional list of categories, where categories[i] is a list of the categories for the column at index i. This can also be None, or “auto” if top_n is not None. Defaults to None.
drop (string, list) – Method (“first” or “if_binary”) to use to drop one category per feature. Can also be a list specifying which categories to drop for each feature. Defaults to ‘if_binary’.
handle_unknown (string) – Whether to ignore or error for unknown categories for a feature encountered during fit or transform. If either top_n or categories is used to limit the number of categories per column, this must be “ignore”. Defaults to “ignore”.
handle_missing (string) – Options for how to handle missing (NaN) values encountered during fit or transform. If this is set to “as_category” and NaN values are within the n most frequent, “nan” values will be encoded as their own column. If this is set to “error”, any missing values encountered will raise an error. Defaults to “error”.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
One Hot Encoder
Methods
Returns a list of the unique categories to be encoded for the particular feature, in order.
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Return feature names for the categorical features after fitting.
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
One-hot encode the input data.
-
categories
(self, feature_name)[source]¶ Returns a list of the unique categories to be encoded for the particular feature, in order.
- Parameters
feature_name (str) – the name of any feature provided to one-hot encoder during fit
- Returns
the unique categories, in the same dtype as they were provided during fit
- Return type
np.ndarray
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
get_feature_names
(self)[source]¶ Return feature names for the categorical features after fitting.
Feature names are formatted as {column name}_{category name}. In the event of a duplicate name, an integer will be added at the end of the feature name to distinguish it.
For example, consider a dataframe with a column called “A” and category “x_y” and another column called “A_x” with “y”. In this example, the feature names would be “A_x_y” and “A_x_y_1”.
- Returns
The feature names after encoding, provided in the same order as input_features.
- Return type
np.ndarray
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
PCA
(variance=0.95, n_components=None, random_seed=0, **kwargs)[source]¶ Reduces the number of features by using Principal Component Analysis (PCA).
- Parameters
variance (float) – The percentage of the original data variance that should be preserved when reducing the number of features. Defaults to 0.95.
n_components (int) – The number of features to maintain after computing SVD. Defaults to None, but will override variance variable if set.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
Real(0.25, 1)}:type: {“variance”
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
PCA Transformer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
PerColumnImputer
(impute_strategies=None, default_impute_strategy='most_frequent', random_seed=0, **kwargs)[source]¶ Imputes missing data according to a specified imputation strategy per column.
- Parameters
impute_strategies (dict) – Column and {“impute_strategy”: strategy, “fill_value”:value} pairings. Valid values for impute strategy include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to None, which uses “most_frequent” for all columns. When impute_strategy == “constant”, fill_value is used to replace missing data. When None, uses 0 when imputing numerical data and “missing_value” for strings or object data types.
default_impute_strategy (str) – Impute strategy to fall back on when none is provided for a certain column. Valid values include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to “most_frequent”.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Per Column Imputer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits imputers on input data
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms input data by imputing missing values.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits imputers on input data
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features] to fit.
y (pd.Series, optional) – The target training data of length [n_samples]. Ignored.
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)[source]¶ Transforms input data by imputing missing values.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features] to transform.
y (pd.Series, optional) – The target training data of length [n_samples]. Ignored.
- Returns
Transformed X
- Return type
pd.DataFrame
-
class
evalml.pipelines.components.transformers.
PolynomialDetrender
(degree=1, random_seed=0, **kwargs)[source]¶ Removes trends from time series by fitting a polynomial to the data.
- Parameters
degree (int) – Degree for the polynomial. If 1, linear model is fit to the data. If 2, quadratic model is fit, etc. Defaults to 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “degree”: Integer(1, 3)}
model_family
ModelFamily.NONE
modifies_features
False
modifies_target
True
name
Polynomial Detrender
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits the PolynomialDetrender.
Removes fitted trend from target variable.
Adds back fitted trend to target variable.
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Removes fitted trend from target variable.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits the PolynomialDetrender.
- Parameters
X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to detrend.
- Returns
self
-
fit_transform
(self, X, y=None)[source]¶ Removes fitted trend from target variable.
- Parameters
X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to detrend.
- Returns
- The first element are the input features returned without modification.
The second element is the target variable y with the fitted trend removed.
- Return type
tuple of pd.DataFrame, pd.Series
-
inverse_transform
(self, y)[source]¶ Adds back fitted trend to target variable.
- Parameters
X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable.
- Returns
- The first element are the input features returned without modification.
The second element is the target variable y with the trend added back.
- Return type
tuple of pd.DataFrame, pd.Series
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)[source]¶ Removes fitted trend from target variable.
- Parameters
X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to detrend.
- Returns
- The input features are returned without modification. The target
variable y is detrended
- Return type
tuple of pd.DataFrame, pd.Series
-
class
evalml.pipelines.components.transformers.
RFClassifierSelectFromModel
(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold=- np.inf, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Selects top features based on importance weights using a Random Forest classifier.
- Parameters
number_features (int) – The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. Defaults to None.
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
percent_features (float) – Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5.
threshold (string or float) – The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median”, then the threshold value is the median of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. Defaults to -np.inf.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “percent_features”: Real(0.01, 1), “threshold”: [“mean”, -np.inf],}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
RF Classifier Select From Model
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Get names of selected features.
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
get_names
(self)¶ Get names of selected features.
- Returns
List of the names of features selected
- Return type
list[str]
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)¶ Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.
- Parameters
X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data. Ignored.
- Returns
Transformed X
- Return type
pd.DataFrame
-
class
evalml.pipelines.components.transformers.
RFRegressorSelectFromModel
(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold=- np.inf, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Selects top features based on importance weights using a Random Forest regressor.
- Parameters
number_features (int) – The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. Defaults to None.
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
percent_features (float) – Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5.
threshold (string or float) – The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median”, then the threshold value is the median of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. Defaults to -np.inf.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “percent_features”: Real(0.01, 1), “threshold”: [“mean”, -np.inf],}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
RF Regressor Select From Model
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Get names of selected features.
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
get_names
(self)¶ Get names of selected features.
- Returns
List of the names of features selected
- Return type
list[str]
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)¶ Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.
- Parameters
X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data. Ignored.
- Returns
Transformed X
- Return type
pd.DataFrame
-
class
evalml.pipelines.components.transformers.
SelectByType
(column_types=None, random_seed=0, **kwargs)[source]¶ Selects columns by specified Woodwork logical type or semantic tag in input data.
- Parameters
column_types (string, ww.LogicalType, list(string), list(ww.LogicalType)) – List of Woodwork types or tags, used to determine which columns to select.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Select Columns By Type Transformer
needs_fitting
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits the transformer by checking if column names are present in the dataset.
Fits on X and transforms X
Loads component at file path
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits the transformer by checking if column names are present in the dataset.
- Parameters
X (pd.DataFrame) – Data to check.
y (pd.Series, optional) – Targets.
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
SelectColumns
(columns=None, random_seed=0, **kwargs)[source]¶ Selects specified columns in input data.
- Parameters
columns (list(string)) – List of column names, used to determine which columns to select.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Select Columns Transformer
needs_fitting
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits the transformer by checking if column names are present in the dataset.
Fits on X and transforms X
Loads component at file path
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X by selecting columns.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits the transformer by checking if column names are present in the dataset.
- Parameters
X (pd.DataFrame) – Data to check.
y (pd.Series, optional) – Targets.
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
SimpleImputer
(impute_strategy='most_frequent', fill_value=None, random_seed=0, **kwargs)[source]¶ Imputes missing data according to a specified imputation strategy.
- Parameters
impute_strategy (string) – Impute strategy to use. Valid values include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types.
fill_value (string) – When impute_strategy == “constant”, fill_value is used to replace missing data. Defaults to 0 when imputing numerical data and “missing_value” for strings or object data types.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “impute_strategy”: [“mean”, “median”, “most_frequent”]}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Simple Imputer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms input by imputing missing values. ‘None’ and np.nan values are treated as the same.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ - Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are
treated as the same.
- Parameters
X (pd.DataFrame or np.ndarray) – the input training data of shape [n_samples, n_features]
y (pd.Series, optional) – the target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series, optional) – Target data.
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
SMOTENCSampler
(sampling_ratio=0.25, k_neighbors_default=5, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ SMOTENC Oversampler component. Uses SMOTENC to generate synthetic samples. Works on a mix of nomerical and categorical columns. Input data must be Woodwork type, and this component is only run during training and not during predict.
- Parameters
sampling_ratio (float) – This is the goal ratio of the minority to majority class, with range (0, 1]. A value of 0.25 means we want a 1:4 ratio of the minority to majority class after oversampling. We will create the a sampling dictionary using this ratio, with the keys corresponding to the class and the values responding to the number of samples. Defaults to 0.25.
k_neighbors_default (int) – The number of nearest neighbors used to construct synthetic samples. This is the default value used, but the actual k_neighbors value might be smaller if there are less samples. Defaults to 5.
n_jobs (int) – The number of CPU cores to use. Defaults to -1.
random_seed (int) – The seed to use for random sampling. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
True
name
SMOTENC Oversampler
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits the Oversampler to the data.
Fit and transform the data using the data sampler. Used during training of the pipeline
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
No transformation needs to be done here.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y)[source]¶ Fits the Oversampler to the data.
- Parameters
X (pd.DataFrame) – Training features
y (pd.Series) – Target features
- Returns
self
-
fit_transform
(self, X, y)¶ Fit and transform the data using the data sampler. Used during training of the pipeline
- Parameters
X (pd.DataFrame) – Training features
y – Target features
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)¶ No transformation needs to be done here.
- Parameters
X (pd.DataFrame) – Training features. Ignored.
y (pd.Series) – Target features. Ignored.
- Returns
X and y data that was passed in.
- Return type
pd.DataFrame, pd.Series
-
class
evalml.pipelines.components.transformers.
SMOTENSampler
(sampling_ratio=0.25, k_neighbors_default=5, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ SMOTEN Oversampler component. Uses SMOTEN to generate synthetic samples. Works for purely categorical datasets. This component is only run during training and not during predict.
- Parameters
sampling_ratio (float) – This is the goal ratio of the minority to majority class, with range (0, 1]. A value of 0.25 means we want a 1:4 ratio of the minority to majority class after oversampling. We will create the a sampling dictionary using this ratio, with the keys corresponding to the class and the values responding to the number of samples. Defaults to 0.25.
k_neighbors_default (int) – The number of nearest neighbors used to construct synthetic samples. This is the default value used, but the actual k_neighbors value might be smaller if there are less samples. Defaults to 5.
n_jobs (int) – The number of CPU cores to use. Defaults to -1.
random_seed (int) – The seed to use for random sampling. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
True
name
SMOTEN Oversampler
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits the Oversampler to the data.
Fit and transform the data using the data sampler. Used during training of the pipeline
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
No transformation needs to be done here.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y)¶ Fits the Oversampler to the data.
- Parameters
X (pd.DataFrame) – Training features
y (pd.Series) – Target features
- Returns
self
-
fit_transform
(self, X, y)¶ Fit and transform the data using the data sampler. Used during training of the pipeline
- Parameters
X (pd.DataFrame) – Training features
y – Target features
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)¶ No transformation needs to be done here.
- Parameters
X (pd.DataFrame) – Training features. Ignored.
y (pd.Series) – Target features. Ignored.
- Returns
X and y data that was passed in.
- Return type
pd.DataFrame, pd.Series
-
class
evalml.pipelines.components.transformers.
SMOTESampler
(sampling_ratio=0.25, k_neighbors_default=5, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ SMOTE Oversampler component. Works on numerical datasets only. This component is only run during training and not during predict.
- Parameters
sampling_ratio (float) – This is the goal ratio of the minority to majority class, with range (0, 1]. A value of 0.25 means we want a 1:4 ratio of the minority to majority class after oversampling. We will create the a sampling dictionary using this ratio, with the keys corresponding to the class and the values responding to the number of samples. Defaults to 0.25.
k_neighbors_default (int) – The number of nearest neighbors used to construct synthetic samples. This is the default value used, but the actual k_neighbors value might be smaller if there are less samples. Defaults to 5.
n_jobs (int) – The number of CPU cores to use. Defaults to -1.
random_seed (int) – The seed to use for random sampling. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
True
name
SMOTE Oversampler
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits the Oversampler to the data.
Fit and transform the data using the data sampler. Used during training of the pipeline
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
No transformation needs to be done here.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y)¶ Fits the Oversampler to the data.
- Parameters
X (pd.DataFrame) – Training features
y (pd.Series) – Target features
- Returns
self
-
fit_transform
(self, X, y)¶ Fit and transform the data using the data sampler. Used during training of the pipeline
- Parameters
X (pd.DataFrame) – Training features
y – Target features
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)¶ No transformation needs to be done here.
- Parameters
X (pd.DataFrame) – Training features. Ignored.
y (pd.Series) – Target features. Ignored.
- Returns
X and y data that was passed in.
- Return type
pd.DataFrame, pd.Series
-
class
evalml.pipelines.components.transformers.
StandardScaler
(random_seed=0, **kwargs)[source]¶ A transformer that standardizes input features by removing the mean and scaling to unit variance.
- Parameters
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Standard Scaler
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
TargetEncoder
(cols=None, smoothing=1.0, handle_unknown='value', handle_missing='value', random_seed=0, **kwargs)[source]¶ A transformer that encodes categorical features into target encodings.
- Parameters
cols (list) – Columns to encode. If None, all string columns will be encoded, otherwise only the columns provided will be encoded. Defaults to None
smoothing (float) – The smoothing factor to apply. The larger this value is, the more influence the expected target value has on the resulting target encodings. Must be strictly larger than 0. Defaults to 1.0
handle_unknown (string) – Determines how to handle unknown categories for a feature encountered. Options are ‘value’, ‘error’, nd ‘return_nan’. Defaults to ‘value’, which replaces with the target mean
handle_missing (string) – Determines how to handle missing values encountered during fit or transform. Options are ‘value’, ‘error’, and ‘return_nan’. Defaults to ‘value’, which replaces with the target mean
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Target Encoder
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Return feature names for the input features after fitting.
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
get_feature_names
(self)[source]¶ Return feature names for the input features after fitting.
- Returns
The feature names after encoding
- Return type
np.array
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
TargetImputer
(impute_strategy='most_frequent', fill_value=None, random_seed=0, **kwargs)[source]¶ Imputes missing target data according to a specified imputation strategy.
- Parameters
impute_strategy (string) – Impute strategy to use. Valid values include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to “most_frequent”.
fill_value (string) – When impute_strategy == “constant”, fill_value is used to replace missing data. Defaults to None which uses 0 when imputing numerical data and “missing_value” for strings or object data types.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “impute_strategy”: [“mean”, “median”, “most_frequent”]}
model_family
ModelFamily.NONE
modifies_features
False
modifies_target
True
name
Target Imputer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits imputer to target data. ‘None’ values are converted to np.nan before imputation and are
Fits on and transforms the input target data.
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms input target data by imputing missing values. ‘None’ and np.nan values are treated as the same.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y)[source]¶ - Fits imputer to target data. ‘None’ values are converted to np.nan before imputation and are
treated as the same.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]. Ignored.
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
-
fit_transform
(self, X, y)[source]¶ Fits on and transforms the input target data.
- Parameters
X (pd.DataFrame) – Features. Ignored.
y (pd.Series) – Target data to impute.
- Returns
The original X, transformed y
- Return type
(pd.DataFrame, pd.Series)
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y)[source]¶ Transforms input target data by imputing missing values. ‘None’ and np.nan values are treated as the same.
- Parameters
X (pd.DataFrame) – Features. Ignored.
y (pd.Series) – Target data to impute.
- Returns
The original X, transformed y
- Return type
(pd.DataFrame, pd.Series)
-
class
evalml.pipelines.components.transformers.
TextFeaturizer
(random_seed=0, **kwargs)[source]¶ Transformer that can automatically featurize text columns using featuretools’ nlp_primitives.
Since models cannot handle non-numeric data, any text must be broken down into features that provide useful information about that text. This component splits each text column into several informative features: Diversity Score, Mean Characters per Word, Polarity Score, and LSA (Latent Semantic Analysis). Calling transform on this component will replace any text columns in the given dataset with these numeric columns.
- Parameters
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Text Featurization Component
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X by creating new features using existing text columns
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
Transformer
(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]¶ A component that may or may not need fitting that transforms data. These components are used before an estimator.
To implement a new Transformer, define your own class which is a subclass of Transformer, including a name and a list of acceptable ranges for any parameters to be tuned during the automl search (hyperparameters). Define an __init__ method which sets up any necessary state and objects. Make sure your __init__ only uses standard keyword arguments and calls super().__init__() with a parameters dict. You may also override the fit, transform, fit_transform and other methods in this class if appropriate.
To see some examples, check out the definitions of any Transformer component.
- Parameters
parameters (dict) – Dictionary of parameters for the component. Defaults to None.
component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Loads component at file path
Returns string name of this component
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
property
name
(cls)¶ Returns string name of this component
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.transformers.
Undersampler
(sampling_ratio=0.25, sampling_ratio_dict=None, min_samples=100, min_percentage=0.1, random_seed=0, **kwargs)[source]¶ Initializes an undersampling transformer to downsample the majority classes in the dataset.
This component is only run during training and not during predict.
- Parameters
sampling_ratio (float) – The smallest minority:majority ratio that is accepted as ‘balanced’. For instance, a 1:4 ratio would be represented as 0.25, while a 1:1 ratio is 1.0. Must be between 0 and 1, inclusive. Defaults to 0.25.
sampling_ratio_dict (dict) – A dictionary specifying the desired balanced ratio for each target value. For instance, in a binary case where class 1 is the minority, we could specify: sampling_ratio_dict={0: 0.5, 1: 1}, which means we would undersample class 0 to have twice the number of samples as class 1 (minority:majority ratio = 0.5), and don’t sample class 1. Overrides sampling_ratio if provided. Defaults to None.
min_samples (int) – The minimum number of samples that we must have for any class, pre or post sampling. If a class must be downsampled, it will not be downsampled past this value. To determine severe imbalance, the minority class must occur less often than this and must have a class ratio below min_percentage. Must be greater than 0. Defaults to 100.
min_percentage (float) – The minimum percentage of the minimum class to total dataset that we tolerate, as long as it is above min_samples. If min_percentage and min_samples are not met, treat this as severely imbalanced, and we will not resample the data. Must be between 0 and 0.5, inclusive. Defaults to 0.1.
random_seed (int) – The seed to use for random sampling. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
True
name
Undersampler
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Resample the data using the sampler. Since our sampler doesn’t need to be fit, we do nothing here.
Fit and transform the data using the undersampler. Used during training of the pipeline
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
No transformation needs to be done here.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y)¶ Resample the data using the sampler. Since our sampler doesn’t need to be fit, we do nothing here.
- Parameters
X (pd.DataFrame) – Training features
y (pd.Series) – Target features
- Returns
self
-
fit_transform
(self, X, y)[source]¶ Fit and transform the data using the undersampler. Used during training of the pipeline
- Parameters
X (pd.DataFrame) – Training features
y – Target features
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)¶ No transformation needs to be done here.
- Parameters
X (pd.DataFrame) – Training features. Ignored.
y (pd.Series) – Target features. Ignored.
- Returns
X and y data that was passed in.
- Return type
pd.DataFrame, pd.Series
-
class
evalml.pipelines.components.transformers.
URLFeaturizer
(random_seed=0, **kwargs)[source]¶ Transformer that can automatically extract features from URL.
- Parameters
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
URL Featurizer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)¶ Transforms data X.
- Parameters
X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data.
- Returns
Transformed X
- Return type
pd.DataFrame