evalml.pipelines.components.OneHotEncoder.__init__

OneHotEncoder.__init__(top_n=10, features_to_encode=None, categories=None, drop='if_binary', handle_unknown='ignore', handle_missing='error', random_seed=0, **kwargs)[source]

Initalizes an transformer that encodes categorical features in a one-hot numeric array.”

Parameters
  • top_n (int) – Number of categories per column to encode. If None, all categories will be encoded. Otherwise, the n most frequent will be encoded and all others will be dropped. Defaults to 10.

  • features_to_encode (list[str]) – List of columns to encode. All other columns will remain untouched. If None, all appropriate columns will be encoded. Defaults to None.

  • categories (list) – A two dimensional list of categories, where categories[i] is a list of the categories for the column at index i. This can also be None, or “auto” if top_n is not None. Defaults to None.

  • drop (string, list) – Method (“first” or “if_binary”) to use to drop one category per feature. Can also be a list specifying which categories to drop for each feature. Defaults to ‘if_binary’.

  • handle_unknown (string) – Whether to ignore or error for unknown categories for a feature encountered during fit or transform. If either top_n or categories is used to limit the number of categories per column, this must be “ignore”. Defaults to “ignore”.

  • handle_missing (string) – Options for how to handle missing (NaN) values encountered during fit or transform. If this is set to “as_category” and NaN values are within the n most frequent, “nan” values will be encoded as their own column. If this is set to “error”, any missing values encountered will raise an error. Defaults to “error”.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.