default_data_checks

Module Contents

Classes Summary

DefaultDataChecks

A collection of basic data checks that is used by AutoML by default.

Contents

class evalml.data_checks.default_data_checks.DefaultDataChecks(problem_type, objective, n_splits=3)[source]

A collection of basic data checks that is used by AutoML by default. Includes:

  • HighlyNullDataCheck

  • HighlyNullRowsDataCheck

  • IDColumnsDataCheck

  • TargetLeakageDataCheck

  • InvalidTargetDataCheck

  • NoVarianceDataCheck

  • ClassImbalanceDataCheck (for classification problem types)

  • DateTimeNaNDataCheck

  • NaturalLanguageNaNDataCheck

Parameters
  • problem_type (str) – The problem type that is being validated. Can be regression, binary, or multiclass.

  • objective (str or ObjectiveBase) – Name or instance of the objective class.

  • n_splits (int) – The number of splits as determined by the data splitter being used. Defaults to 3.

Methods

validate

Inspects and validates the input data against data checks and returns a list of warnings and errors if applicable.

validate(self, X, y=None)

Inspects and validates the input data against data checks and returns a list of warnings and errors if applicable.

Parameters
  • X (pd.DataFrame, np.ndarray) – The input data of shape [n_samples, n_features]

  • y (pd.Series, np.ndarray) – The target data of length [n_samples]

Returns

Dictionary containing DataCheckMessage objects

Return type

dict