ClassImbalanceDataCheck.
validate
Ignores NaN values in target labels if they appear.
X (ww.DataTable, pd.DataFrame, np.ndarray) – Features. Ignored.
y (ww.DataColumn, pd.Series, np.ndarray) – Target labels to check for imbalanced data.
and DataCheckErrors if the number of values for each target is below 2 * num_cv_folds.
dict
Example
>>> import pandas as pd >>> X = pd.DataFrame() >>> y = pd.Series([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]) >>> target_check = ClassImbalanceDataCheck(threshold=0.10) >>> assert target_check.validate(X, y) == {"errors": [{"message": "The number of instances of these targets is less than 2 * the number of cross folds = 6 instances: [0]", "data_check_name": "ClassImbalanceDataCheck", "level": "error", "code": "CLASS_IMBALANCE_BELOW_FOLDS", "details": {"target_values": [0]}}], "warnings": [{"message": "The following labels fall below 10% of the target: [0]", "data_check_name": "ClassImbalanceDataCheck", "level": "warning", "code": "CLASS_IMBALANCE_BELOW_THRESHOLD", "details": {"target_values": [0]}}]}