evalml.data_checks.ClassImbalanceDataCheck.__init__¶
-
ClassImbalanceDataCheck.
__init__
(threshold=0.1, min_samples=100, num_cv_folds=3)[source]¶ - Check if any of the target labels are imbalanced, or if the number of values for each target
are below 2 times the number of cv folds
- Parameters
threshold (float) – The minimum threshold allowed for class imbalance before a warning is raised. This threshold is calculated by comparing the number of samples in each class to the sum of samples in that class and the majority class. For example, a multiclass case with [900, 900, 100] samples per classes 0, 1, and 2, respectively, would have a 0.10 threshold for class 2 (100 / (900 + 100)). Defaults to 0.10.
min_samples (int) – The minimum number of samples per accepted class. If the minority class is both below the threshold and min_samples, then we consider this severely imbalanced. Must be greater than 0. Defaults to 100.
num_cv_folds (int) – The number of cross-validation folds. Must be positive. Choose 0 to ignore this warning.