evalml.data_checks.ClassImbalanceDataCheck.__init__

ClassImbalanceDataCheck.__init__(threshold=0.1, min_samples=100, num_cv_folds=3)[source]
Check if any of the target labels are imbalanced, or if the number of values for each target

are below 2 times the number of cv folds

Parameters
  • threshold (float) – The minimum threshold allowed for class imbalance before a warning is raised. A perfectly balanced dataset would have a threshold of (1/n_classes), ie 0.50 for binary classes. Defaults to 0.10

  • min_samples (int) – The minimum number of samples per accepted class. If the minority class is both below the threshold and min_samples, then we consider this severely imbalanced. Must be greater than 0. Defaults to 100.

  • num_cv_folds (int) – The number of cross-validation folds. Must be positive. Choose 0 to ignore this warning.