onehot_encoder ========================================================================== .. py:module:: evalml.pipelines.components.transformers.encoders.onehot_encoder .. autoapi-nested-parse:: A transformer that encodes categorical features in a one-hot numeric array. Module Contents --------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoderMeta Contents ~~~~~~~~~~~~~~~~~~~ .. py:class:: OneHotEncoder(top_n=10, features_to_encode=None, categories=None, drop='if_binary', handle_unknown='ignore', handle_missing='error', random_seed=0, **kwargs) A transformer that encodes categorical features in a one-hot numeric array. :param top_n: Number of categories per column to encode. If None, all categories will be encoded. Otherwise, the `n` most frequent will be encoded and all others will be dropped. Defaults to 10. :type top_n: int :param features_to_encode: List of columns to encode. All other columns will remain untouched. If None, all appropriate columns will be encoded. Defaults to None. :type features_to_encode: list[str] :param categories: A two dimensional list of categories, where `categories[i]` is a list of the categories for the column at index `i`. This can also be `None`, or `"auto"` if `top_n` is not None. Defaults to None. :type categories: list :param drop: Method ("first" or "if_binary") to use to drop one category per feature. Can also be a list specifying which categories to drop for each feature. Defaults to 'if_binary'. :type drop: string, list :param handle_unknown: Whether to ignore or error for unknown categories for a feature encountered during `fit` or `transform`. If either `top_n` or `categories` is used to limit the number of categories per column, this must be "ignore". Defaults to "ignore". :type handle_unknown: string :param handle_missing: Options for how to handle missing (NaN) values encountered during `fit` or `transform`. If this is set to "as_category" and NaN values are within the `n` most frequent, "nan" values will be encoded as their own column. If this is set to "error", any missing values encountered will raise an error. Defaults to "error". :type handle_missing: string :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - One Hot Encoder * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder.categories evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder.clone evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder.default_parameters evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder.describe evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder.fit evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder.fit_transform evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder.get_feature_names evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder.load evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder.needs_fitting evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder.parameters evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder.save evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoder.transform .. py:method:: categories(self, feature_name) Returns a list of the unique categories to be encoded for the particular feature, in order. :param feature_name: The name of any feature provided to one-hot encoder during fit. :type feature_name: str :returns: The unique categories, in the same dtype as they were provided during fit. :rtype: np.ndarray :raises ValueError: If feature was not provided to one-hot encoder as a training feature. .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the one-hot encoder component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self :raises ValueError: If encoding a column failed. .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: get_feature_names(self) Return feature names for the categorical features after fitting. Feature names are formatted as {column name}_{category name}. In the event of a duplicate name, an integer will be added at the end of the feature name to distinguish it. For example, consider a dataframe with a column called "A" and category "x_y" and another column called "A_x" with "y". In this example, the feature names would be "A_x_y" and "A_x_y_1". :returns: The feature names after encoding, provided in the same order as input_features. :rtype: np.ndarray .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) One-hot encode the input data. :param X: Features to one-hot encode. :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series :returns: Transformed data, where each categorical feature has been encoded into numerical columns using one-hot encoding. :rtype: pd.DataFrame .. py:class:: OneHotEncoderMeta A version of the ComponentBaseMeta class which includes validation on an additional one-hot-encoder-specific method `categories`. **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **FIT_METHODS** - ['fit', 'fit_transform'] * - **METHODS_TO_CHECK** - None * - **PROPERTIES_TO_CHECK** - ['feature_importance'] **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoderMeta.check_for_fit evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoderMeta.register evalml.pipelines.components.transformers.encoders.onehot_encoder.OneHotEncoderMeta.set_fit .. py:method:: check_for_fit(cls, method) :classmethod: `check_for_fit` wraps a method that validates if `self._is_fitted` is `True`. It raises an exception if `False` and calls and returns the wrapped method if `True`. :param method: Method to wrap. :type method: callable :returns: The wrapped method. :raises ComponentNotYetFittedError: If component is not yet fitted. .. py:method:: register(cls, subclass) Register a virtual subclass of an ABC. Returns the subclass, to allow usage as a class decorator. .. py:method:: set_fit(cls, method) :classmethod: Wrapper for the fit method.