oversampler¶

SMOTE Oversampler component. Will automatically select whether to use SMOTE, SMOTEN, or SMOTENC based on inputs to the component.

Module Contents¶

Classes Summary¶

Oversampler

SMOTE Oversampler component. Will automatically select whether to use SMOTE, SMOTEN, or SMOTENC based on inputs to the component.

Contents¶

class evalml.pipelines.components.transformers.samplers.oversampler.Oversampler(sampling_ratio=0.25, sampling_ratio_dict=None, k_neighbors_default=5, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

SMOTE Oversampler component. Will automatically select whether to use SMOTE, SMOTEN, or SMOTENC based on inputs to the component.

Parameters

sampling_ratio (float) – This is the goal ratio of the minority to majority class, with range (0, 1]. A value of 0.25 means we want a 1:4 ratio of the minority to majority class after oversampling. We will create the a sampling dictionary using this ratio, with the keys corresponding to the class and the values responding to the number of samples. Defaults to 0.25.
sampling_ratio_dict (dict) – A dictionary specifying the desired balanced ratio for each target value. For instance, in a binary case where class 1 is the minority, we could specify: sampling_ratio_dict={0: 0.5, 1: 1}, which means we would undersample class 0 to have twice the number of samples as class 1 (minority:majority ratio = 0.5), and don’t sample class 1. Overrides sampling_ratio if provided. Defaults to None.
k_neighbors_default (int) – The number of nearest neighbors used to construct synthetic samples. This is the default value used, but the actual k_neighbors value might be smaller if there are less samples. Defaults to 5.
n_jobs (int) – The number of CPU cores to use. Defaults to -1.
random_seed (int) – The seed to use for random sampling. Defaults to 0.

Attributes

hyperparameter_ranges	{}
modifies_features	True
modifies_target	True
name	Oversampler
training_only	True

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters.
`fit`	Fits oversampler to data.
`fit_transform`	Fit and transform data using the sampler component.
`load`	Loads component at file path.
`needs_fitting`	Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
`parameters`	Returns the parameters which were used to initialize the component.
`save`	Saves component at file path.
`transform`	Transforms the input data by sampling the data.

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: Default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters.

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)[source]¶

Fits oversampler to data.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

fit_transform(self, X, y)¶

Fit and transform data using the sampler component.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

(pd.DataFrame, pd.Series)

static load(file_path)¶

Loads component at file path.

Parameters: file_path (str) – Location to load file.
Returns: ComponentBase object

needs_fitting(self)¶

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns: True.

property parameters(self)¶: Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path.

Parameters

file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)¶

Transforms the input data by sampling the data.

Parameters

X (pd.DataFrame) – Training features.
y (pd.Series) – Target.

Returns

Transformed features and target.

Return type

pd.DataFrame, pd.Series

base_sampler

undersampler