evalml.data_checks.ClassImbalanceDataCheck.validate

ClassImbalanceDataCheck.validate(X, y)[source]

Checks if any target labels are imbalanced beyond a threshold for binary and multiclass problems Ignores nan values in target labels if they appear

Parameters
  • X (pd.DataFrame, pd.Series, np.array, list) – Features. Ignored.

  • y – Target labels to check for imbalanced data.

Returns

list with DataCheckWarnings if imbalance in classes is less than the threshold.

Return type

list (DataCheckWarning)

Example

>>> X = pd.DataFrame({})
>>> y = pd.Series([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
>>> target_check = ClassImbalanceDataCheck(threshold=0.10)
>>> assert target_check.validate(X, y) == [DataCheckWarning("The following labels fall below 10% of the target: [0]", "ClassImbalanceDataCheck")]