default_data_checks¶
A default set of data checks that can be used for a variety of datasets.
Module Contents¶
Classes Summary¶
A collection of basic data checks that is used by AutoML by default. |
Contents¶
-
class
evalml.data_checks.default_data_checks.
DefaultDataChecks
(problem_type, objective, n_splits=3, datetime_column=None)[source]¶ A collection of basic data checks that is used by AutoML by default.
Includes:
HighlyNullDataCheck
HighlyNullRowsDataCheck
IDColumnsDataCheck
TargetLeakageDataCheck
InvalidTargetDataCheck
NoVarianceDataCheck
ClassImbalanceDataCheck (for classification problem types)
DateTimeNaNDataCheck
NaturalLanguageNaNDataCheck
TargetDistributionDataCheck (for regression problem types)
DateTimeFormatDataCheck (for time series problem types)
- Parameters
problem_type (str) – The problem type that is being validated. Can be regression, binary, or multiclass.
objective (str or ObjectiveBase) – Name or instance of the objective class.
n_splits (int) – The number of splits as determined by the data splitter being used. Defaults to 3.
datetime_column (str) – The name of the column containing datetime information to be used for time series problems.
to "index" indicating that the datetime information is in the index of X or y. (Default) –
Methods
Inspect and validate the input data against data checks and returns a list of warnings and errors if applicable.
-
validate
(self, X, y=None)¶ Inspect and validate the input data against data checks and returns a list of warnings and errors if applicable.
- Parameters
X (pd.DataFrame, np.ndarray) – The input data of shape [n_samples, n_features]
y (pd.Series, np.ndarray) – The target data of length [n_samples]
- Returns
Dictionary containing DataCheckMessage objects
- Return type
dict