oversampler ======================================================================= .. py:module:: evalml.pipelines.components.transformers.samplers.oversampler .. autoapi-nested-parse:: SMOTE Oversampler component. Will automatically select whether to use SMOTE, SMOTEN, or SMOTENC based on inputs to the component. Module Contents --------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.pipelines.components.transformers.samplers.oversampler.Oversampler Contents ~~~~~~~~~~~~~~~~~~~ .. py:class:: Oversampler(sampling_ratio=0.25, sampling_ratio_dict=None, k_neighbors_default=5, n_jobs=-1, random_seed=0, **kwargs) SMOTE Oversampler component. Will automatically select whether to use SMOTE, SMOTEN, or SMOTENC based on inputs to the component. :param sampling_ratio: This is the goal ratio of the minority to majority class, with range (0, 1]. A value of 0.25 means we want a 1:4 ratio of the minority to majority class after oversampling. We will create the a sampling dictionary using this ratio, with the keys corresponding to the class and the values responding to the number of samples. Defaults to 0.25. :type sampling_ratio: float :param sampling_ratio_dict: A dictionary specifying the desired balanced ratio for each target value. For instance, in a binary case where class 1 is the minority, we could specify: `sampling_ratio_dict={0: 0.5, 1: 1}`, which means we would undersample class 0 to have twice the number of samples as class 1 (minority:majority ratio = 0.5), and don't sample class 1. Overrides sampling_ratio if provided. Defaults to None. :type sampling_ratio_dict: dict :param k_neighbors_default: The number of nearest neighbors used to construct synthetic samples. This is the default value used, but the actual k_neighbors value might be smaller if there are less samples. Defaults to 5. :type k_neighbors_default: int :param n_jobs: The number of CPU cores to use. Defaults to -1. :type n_jobs: int :param random_seed: The seed to use for random sampling. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - None * - **modifies_features** - True * - **modifies_target** - True * - **name** - Oversampler * - **training_only** - True **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.samplers.oversampler.Oversampler.clone evalml.pipelines.components.transformers.samplers.oversampler.Oversampler.default_parameters evalml.pipelines.components.transformers.samplers.oversampler.Oversampler.describe evalml.pipelines.components.transformers.samplers.oversampler.Oversampler.fit evalml.pipelines.components.transformers.samplers.oversampler.Oversampler.fit_transform evalml.pipelines.components.transformers.samplers.oversampler.Oversampler.load evalml.pipelines.components.transformers.samplers.oversampler.Oversampler.needs_fitting evalml.pipelines.components.transformers.samplers.oversampler.Oversampler.parameters evalml.pipelines.components.transformers.samplers.oversampler.Oversampler.save evalml.pipelines.components.transformers.samplers.oversampler.Oversampler.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y) Fits oversampler to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y) Fit and transform data using the sampler component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: (pd.DataFrame, pd.Series) .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms the input data by sampling the data. :param X: Training features. :type X: pd.DataFrame :param y: Target. :type y: pd.Series :returns: Transformed features and target. :rtype: pd.DataFrame, pd.Series