default_data_checks¶
Module Contents¶
Classes Summary¶
A collection of basic data checks that is used by AutoML by default. |
Contents¶
-
class
evalml.data_checks.default_data_checks.
DefaultDataChecks
(problem_type, objective, n_splits=3)[source]¶ A collection of basic data checks that is used by AutoML by default. Includes:
HighlyNullDataCheck
HighlyNullRowsDataCheck
IDColumnsDataCheck
TargetLeakageDataCheck
InvalidTargetDataCheck
NoVarianceDataCheck
ClassImbalanceDataCheck (for classification problem types)
DateTimeNaNDataCheck
NaturalLanguageNaNDataCheck
- Parameters
problem_type (str) – The problem type that is being validated. Can be regression, binary, or multiclass.
objective (str or ObjectiveBase) – Name or instance of the objective class.
n_splits (int) – The number of splits as determined by the data splitter being used. Defaults to 3.
Methods
Inspects and validates the input data against data checks and returns a list of warnings and errors if applicable.
-
validate
(self, X, y=None)¶ Inspects and validates the input data against data checks and returns a list of warnings and errors if applicable.
- Parameters
X (pd.DataFrame, np.ndarray) – The input data of shape [n_samples, n_features]
y (pd.Series, np.ndarray) – The target data of length [n_samples]
- Returns
Dictionary containing DataCheckMessage objects
- Return type
dict