ClassImbalanceDataCheck.
validate
Checks if any target labels are imbalanced beyond a threshold for binary and multiclass problems Ignores nan values in target labels if they appear
X (pd.DataFrame, pd.Series, np.array, list) – Features. Ignored.
y – Target labels to check for imbalanced data.
and DataCheckErrors if the number of values for each target is below 2 * num_cv_folds.
list (DataCheckWarning, DataCheckError)
Example
>>> X = pd.DataFrame({}) >>> y = pd.Series([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]) >>> target_check = ClassImbalanceDataCheck(threshold=0.10) >>> assert target_check.validate(X, y) == [DataCheckError("The number of instances of these targets is less than 2 * the number of cross folds = 6 instances: [0]", "ClassImbalanceDataCheck"), DataCheckWarning("The following labels fall below 10% of the target: [0]", "ClassImbalanceDataCheck")]