evalml.pipelines.components.PerColumnImputer.__init__

PerColumnImputer.__init__(impute_strategies=None, default_impute_strategy='most_frequent', random_seed=0, **kwargs)[source]

Initializes a transformer that imputes missing data according to the specified imputation strategy per column.”

Parameters
  • impute_strategies (dict) –

    Column and {“impute_strategy”: strategy, “fill_value”:value} pairings. Valid values for impute strategy include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to “most_frequent” for all columns.

    When impute_strategy == “constant”, fill_value is used to replace missing data. Defaults to 0 when imputing numerical data and “missing_value” for strings or object data types.

  • default_impute_strategy (str) – Impute strategy to fall back on when none is provided for a certain column. Valid values include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to “most_frequent”

  • random_seed (int) – Seed for the random number generator. Defaults to 0.