onehot_encoder#
A transformer that encodes categorical features in a one-hot numeric array.
Module Contents#
Classes Summary#
A transformer that encodes categorical features in a one-hot numeric array. |
|
A version of the ComponentBaseMeta class which includes validation on an additional one-hot-encoder-specific method categories. |
Contents#
- class evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder(top_n=10, features_to_encode=None, categories=None, drop='if_binary', handle_unknown='ignore', handle_missing='error', random_seed=0, **kwargs)[source]#
A transformer that encodes categorical features in a one-hot numeric array.
- Parameters
top_n (int) – Number of categories per column to encode. If None, all categories will be encoded. Otherwise, the n most frequent will be encoded and all others will be dropped. Defaults to 10.
features_to_encode (list[str]) – List of columns to encode. All other columns will remain untouched. If None, all appropriate columns will be encoded. Defaults to None.
categories (list) – A two dimensional list of categories, where categories[i] is a list of the categories for the column at index i. This can also be None, or “auto” if top_n is not None. Defaults to None.
drop (string, list) – Method (“first” or “if_binary”) to use to drop one category per feature. Can also be a list specifying which categories to drop for each feature. Defaults to ‘if_binary’.
handle_unknown (string) – Whether to ignore or error for unknown categories for a feature encountered during fit or transform. If either top_n or categories is used to limit the number of categories per column, this must be “ignore”. Defaults to “ignore”.
handle_missing (string) – Options for how to handle missing (NaN) values encountered during fit or transform. If this is set to “as_category” and NaN values are within the n most frequent, “nan” values will be encoded as their own column. If this is set to “error”, any missing values encountered will raise an error. Defaults to “error”.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
modifies_features
True
modifies_target
False
name
One Hot Encoder
training_only
False
Methods
Returns a list of the unique categories to be encoded for the particular feature, in order.
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Fits the one-hot encoder component.
Fits on X and transforms X.
Return feature names for the categorical features after fitting.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Saves component at file path.
One-hot encode the input data.
Updates the parameter dictionary of the component.
- categories(self, feature_name)[source]#
Returns a list of the unique categories to be encoded for the particular feature, in order.
- Parameters
feature_name (str) – The name of any feature provided to one-hot encoder during fit.
- Returns
The unique categories, in the same dtype as they were provided during fit.
- Return type
np.ndarray
- Raises
ValueError – If feature was not provided to one-hot encoder as a training feature.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- fit(self, X, y=None)[source]#
Fits the one-hot encoder component.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
- Raises
ValueError – If encoding a column failed.
- fit_transform(self, X, y=None)#
Fits on X and transforms X.
- Parameters
X (pd.DataFrame) – Data to fit and transform.
y (pd.Series) – Target data.
- Returns
Transformed X.
- Return type
pd.DataFrame
- Raises
MethodPropertyNotFoundError – If transformer does not have a transform method or a component_obj that implements transform.
- get_feature_names(self)[source]#
Return feature names for the categorical features after fitting.
Feature names are formatted as {column name}_{category name}. In the event of a duplicate name, an integer will be added at the end of the feature name to distinguish it.
For example, consider a dataframe with a column called “A” and category “x_y” and another column called “A_x” with “y”. In this example, the feature names would be “A_x_y” and “A_x_y_1”.
- Returns
The feature names after encoding, provided in the same order as input_features.
- Return type
np.ndarray
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- needs_fitting(self)#
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- transform(self, X, y=None)[source]#
One-hot encode the input data.
- Parameters
X (pd.DataFrame) – Features to one-hot encode.
y (pd.Series) – Ignored.
- Returns
Transformed data, where each categorical feature has been encoded into numerical columns using one-hot encoding.
- Return type
pd.DataFrame
- update_parameters(self, update_dict, reset_fit=True)#
Updates the parameter dictionary of the component.
- Parameters
update_dict (dict) – A dict of parameters to update.
reset_fit (bool, optional) – If True, will set _is_fitted to False.
- class evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoderMeta[source]#
A version of the ComponentBaseMeta class which includes validation on an additional one-hot-encoder-specific method categories.
Attributes
FIT_METHODS
[‘fit’, ‘fit_transform’]
METHODS_TO_CHECK
None
PROPERTIES_TO_CHECK
[‘feature_importance’]
Methods
check_for_fit wraps a method that validates if self._is_fitted is True.
Register a virtual subclass of an ABC.
Wrapper for the fit method.
- classmethod check_for_fit(cls, method)#
check_for_fit wraps a method that validates if self._is_fitted is True.
It raises an exception if False and calls and returns the wrapped method if True.
- Parameters
method (callable) – Method to wrap.
- Returns
The wrapped method.
- Raises
ComponentNotYetFittedError – If component is not yet fitted.
- register(cls, subclass)#
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
- classmethod set_fit(cls, method)#
Wrapper for the fit method.