evalml.data_checks.ClassImbalanceDataCheck.__init__¶
-
ClassImbalanceDataCheck.
__init__
(threshold=0.1, min_samples=100, num_cv_folds=3)[source]¶ - Check if any of the target labels are imbalanced, or if the number of values for each target
are below 2 times the number of cv folds
- Parameters
threshold (float) – The minimum threshold allowed for class imbalance before a warning is raised. A perfectly balanced dataset would have a threshold of (1/n_classes), ie 0.50 for binary classes. Defaults to 0.10
min_samples (int) – The minimum number of samples per accepted class. If the minority class is both below the threshold and min_samples, then we consider this severely imbalanced. Must be greater than 0. Defaults to 100.
num_cv_folds (int) – The number of cross-validation folds. Must be positive. Choose 0 to ignore this warning.