evalml.preprocessing.BalancedClassificationDataCVSplit.__init__

BalancedClassificationDataCVSplit.__init__(balanced_ratio=4, min_samples=100, min_percentage=0.1, n_splits=3, shuffle=True, random_seed=0)[source]

Create Balanced Classification Data CV splitter

Parameters
  • balanced_ratio (float) – The largest majority:minority ratio that is accepted as ‘balanced’. For instance, a 4:1 ratio would be represented as 4, while a 6:5 ratio is 1.2. Must be greater than or equal to 1 (or 1:1). Defaults to 4.

  • min_samples (int) – The minimum number of samples that we must have for any class, pre or post sampling. If a class must be downsampled, it will not be downsampled past this value. To determine severe imbalance, the minority class must occur less often than this and must have a class ratio below min_percentage. Must be greater than 0. Defaults to 100.

  • min_percentage (float) – The minimum percentage of the minimum class to total dataset that we tolerate, as long as it is above min_samples. If min_percentage and min_samples are not met, treat this as severely imbalanced, and we will not resample the data. Must be between 0 and 0.5, inclusive. Defaults to 0.1.

  • n_splits (int) – The number of splits to use for cross validation. Defaults to 3.

  • shuffle (bool) – Whether or not to shuffle the data before splitting. Defaults to True.

  • random_seed (int) – The seed to use for random sampling. Defaults to 0.