utils
===========================================

.. py:module:: evalml.pipelines.components.utils

.. autoapi-nested-parse::

   Utility methods for EvalML components.


Module Contents
---------------

Classes Summary
~~~~~~~~~~~~~~~

.. autoapisummary::

   evalml.pipelines.components.utils.WrappedSKClassifier
   evalml.pipelines.components.utils.WrappedSKRegressor


Functions
~~~~~~~~~

.. autoapisummary::
   :nosignatures:

   evalml.pipelines.components.utils.all_components
   evalml.pipelines.components.utils.allowed_model_families
   evalml.pipelines.components.utils.drop_natural_language_columns
   evalml.pipelines.components.utils.estimator_unable_to_handle_nans
   evalml.pipelines.components.utils.generate_component_code
   evalml.pipelines.components.utils.get_estimators
   evalml.pipelines.components.utils.handle_component_class
   evalml.pipelines.components.utils.make_balancing_dictionary
   evalml.pipelines.components.utils.scikit_learn_wrapped_estimator
   evalml.pipelines.components.utils.set_boolean_columns_to_categorical


Contents
~~~~~~~~~~~~~~~~~~~
.. py:function:: all_components()

   Get all available components.


.. py:function:: allowed_model_families(problem_type)

   List the model types allowed for a particular problem type.

   :param problem_type: ProblemTypes enum or string.
   :type problem_type: ProblemTypes or str

   :returns: A list of model families.
   :rtype: list[ModelFamily]


.. py:function:: drop_natural_language_columns(X)

   Drops natural language columns from dataframes for the imputers.

   :param X: The dataframe that we want to impute on.
   :type X: pd.Dataframe

   :returns: the dataframe with any natural language columns dropped.
             list: list of all the columns that are considered natural language.
   :rtype: pd.Dataframe


.. py:function:: estimator_unable_to_handle_nans(estimator_class)

   If True, provided estimator class is unable to handle NaN values as an input.

   :param estimator_class: Estimator class
   :type estimator_class: Estimator

   :raises ValueError: If estimator is not a valid estimator class.

   :returns: True if estimator class is unable to process NaN values, False otherwise.
   :rtype: bool


.. py:function:: generate_component_code(element)

   Creates and returns a string that contains the Python imports and code required for running the EvalML component.

   :param element: The instance of the component to generate string Python code for.
   :type element: component instance

   :returns: String representation of Python code that can be run separately in order to recreate the component instance.
             Does not include code for custom component implementation.

   :raises ValueError: If the input element is not a component instance.

   .. rubric:: Examples

   >>> from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor
   >>> assert generate_component_code(DecisionTreeRegressor()) == "from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor\n\ndecisionTreeRegressor = DecisionTreeRegressor(**{'criterion': 'mse', 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0})"
   ...
   >>> from evalml.pipelines.components.transformers.imputers.simple_imputer import SimpleImputer
   >>> assert generate_component_code(SimpleImputer()) == "from evalml.pipelines.components.transformers.imputers.simple_imputer import SimpleImputer\n\nsimpleImputer = SimpleImputer(**{'impute_strategy': 'most_frequent', 'fill_value': None})"


.. py:function:: get_estimators(problem_type, model_families=None)

   Returns the estimators allowed for a particular problem type.

   Can also optionally filter by a list of model types.

   :param problem_type: Problem type to filter for.
   :type problem_type: ProblemTypes or str
   :param model_families: Model families to filter for.
   :type model_families: list[ModelFamily] or list[str]

   :returns: A list of estimator subclasses.
   :rtype: list[class]

   :raises TypeError: If the model_families parameter is not a list.
   :raises RuntimeError: If a model family is not valid for the problem type.


.. py:function:: handle_component_class(component_class)

   Standardizes input from a string name to a ComponentBase subclass if necessary.

   If a str is provided, will attempt to look up a ComponentBase class by that name and
   return a new instance. Otherwise if a ComponentBase subclass or Component instance is provided,
   will return that without modification.

   :param component_class: Input to be standardized.
   :type component_class: str, ComponentBase

   :returns: ComponentBase

   :raises ValueError: If input is not a valid component class.
   :raises MissingComponentError: If the component cannot be found.

   .. rubric:: Examples

   >>> from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor
   >>> handle_component_class(DecisionTreeRegressor)
   <class 'evalml.pipelines.components.estimators.regressors.decision_tree_regressor.DecisionTreeRegressor'>
   >>> handle_component_class("Random Forest Regressor")
   <class 'evalml.pipelines.components.estimators.regressors.rf_regressor.RandomForestRegressor'>


.. py:function:: make_balancing_dictionary(y, sampling_ratio)

   Makes dictionary for oversampler components. Find ratio of each class to the majority. If the ratio is smaller than the sampling_ratio, we want to oversample, otherwise, we don't want to sample at all, and we leave the data as is.

   :param y: Target data.
   :type y: pd.Series
   :param sampling_ratio: The balanced ratio we want the samples to meet.
   :type sampling_ratio: float

   :returns: Dictionary where keys are the classes, and the corresponding values are the counts of samples
             for each class that will satisfy sampling_ratio.
   :rtype: dict

   :raises ValueError: If sampling ratio is not in the range (0, 1] or the target is empty.

   .. rubric:: Examples

   >>> import pandas as pd
   >>> y = pd.Series([1] * 4 + [2] * 8 + [3])
   >>> assert make_balancing_dictionary(y, 0.5) == {2: 8, 1: 4, 3: 4}
   >>> assert make_balancing_dictionary(y, 0.9) == {2: 8, 1: 7, 3: 7}
   >>> assert make_balancing_dictionary(y, 0.1) == {2: 8, 1: 4, 3: 1}


.. py:function:: scikit_learn_wrapped_estimator(evalml_obj)

   Wraps an EvalML object as a scikit-learn estimator.


.. py:function:: set_boolean_columns_to_categorical(X)

   Sets boolean columns to categorical for the imputer.

   :param X: The dataframe that we want to impute on.
   :type X: pd.Dataframe

   :returns: the dataframe with any of its ww columns that are boolean set to categorical.
   :rtype: pd.Dataframe


.. py:class:: WrappedSKClassifier(pipeline)


   Scikit-learn classifier wrapper class.


   **Methods**

   .. autoapisummary::
      :nosignatures:

      evalml.pipelines.components.utils.WrappedSKClassifier.fit
      evalml.pipelines.components.utils.WrappedSKClassifier.get_params
      evalml.pipelines.components.utils.WrappedSKClassifier.predict
      evalml.pipelines.components.utils.WrappedSKClassifier.predict_proba
      evalml.pipelines.components.utils.WrappedSKClassifier.score
      evalml.pipelines.components.utils.WrappedSKClassifier.set_params

   .. py:method:: fit(self, X, y)

      Fits component to data.

      :param X: The input training data of shape [n_samples, n_features].
      :type X: pd.DataFrame or np.ndarray
      :param y: The target training data of length [n_samples].
      :type y: pd.Series, optional

      :returns: self


   .. py:method:: get_params(self, deep=True)

      Get parameters for this estimator.

      :param deep: If True, will return the parameters for this estimator and
                   contained subobjects that are estimators.
      :type deep: bool, default=True

      :returns: **params** -- Parameter names mapped to their values.
      :rtype: dict


   .. py:method:: predict(self, X)

      Make predictions using selected features.

      :param X: Features
      :type X: pd.DataFrame

      :returns: Predicted values.
      :rtype: np.ndarray


   .. py:method:: predict_proba(self, X)

      Make probability estimates for labels.

      :param X: Features.
      :type X: pd.DataFrame

      :returns: Probability estimates.
      :rtype: np.ndarray


   .. py:method:: score(self, X, y, sample_weight=None)

      Return the mean accuracy on the given test data and labels.

      In multi-label classification, this is the subset accuracy
      which is a harsh metric since you require for each sample that
      each label set be correctly predicted.

      :param X: Test samples.
      :type X: array-like of shape (n_samples, n_features)
      :param y: True labels for `X`.
      :type y: array-like of shape (n_samples,) or (n_samples, n_outputs)
      :param sample_weight: Sample weights.
      :type sample_weight: array-like of shape (n_samples,), default=None

      :returns: **score** -- Mean accuracy of ``self.predict(X)`` wrt. `y`.
      :rtype: float


   .. py:method:: set_params(self, **params)

      Set the parameters of this estimator.

      The method works on simple estimators as well as on nested objects
      (such as :class:`~sklearn.pipeline.Pipeline`). The latter have
      parameters of the form ``<component>__<parameter>`` so that it's
      possible to update each component of a nested object.

      :param \*\*params: Estimator parameters.
      :type \*\*params: dict

      :returns: **self** -- Estimator instance.
      :rtype: estimator instance


.. py:class:: WrappedSKRegressor(pipeline)


   Scikit-learn regressor wrapper class.


   **Methods**

   .. autoapisummary::
      :nosignatures:

      evalml.pipelines.components.utils.WrappedSKRegressor.fit
      evalml.pipelines.components.utils.WrappedSKRegressor.get_params
      evalml.pipelines.components.utils.WrappedSKRegressor.predict
      evalml.pipelines.components.utils.WrappedSKRegressor.score
      evalml.pipelines.components.utils.WrappedSKRegressor.set_params

   .. py:method:: fit(self, X, y)

      Fits component to data.

      :param X: the input training data of shape [n_samples, n_features]
      :type X: pd.DataFrame or np.ndarray
      :param y: the target training data of length [n_samples]
      :type y: pd.Series, optional

      :returns: self


   .. py:method:: get_params(self, deep=True)

      Get parameters for this estimator.

      :param deep: If True, will return the parameters for this estimator and
                   contained subobjects that are estimators.
      :type deep: bool, default=True

      :returns: **params** -- Parameter names mapped to their values.
      :rtype: dict


   .. py:method:: predict(self, X)

      Make predictions using selected features.

      :param X: Features.
      :type X: pd.DataFrame

      :returns: Predicted values.
      :rtype: np.ndarray


   .. py:method:: score(self, X, y, sample_weight=None)

      Return the coefficient of determination of the prediction.

      The coefficient of determination :math:`R^2` is defined as
      :math:`(1 - \frac{u}{v})`, where :math:`u` is the residual
      sum of squares ``((y_true - y_pred)** 2).sum()`` and :math:`v`
      is the total sum of squares ``((y_true - y_true.mean()) ** 2).sum()``.
      The best possible score is 1.0 and it can be negative (because the
      model can be arbitrarily worse). A constant model that always predicts
      the expected value of `y`, disregarding the input features, would get
      a :math:`R^2` score of 0.0.

      :param X: Test samples. For some estimators this may be a precomputed
                kernel matrix or a list of generic objects instead with shape
                ``(n_samples, n_samples_fitted)``, where ``n_samples_fitted``
                is the number of samples used in the fitting for the estimator.
      :type X: array-like of shape (n_samples, n_features)
      :param y: True values for `X`.
      :type y: array-like of shape (n_samples,) or (n_samples, n_outputs)
      :param sample_weight: Sample weights.
      :type sample_weight: array-like of shape (n_samples,), default=None

      :returns: **score** -- :math:`R^2` of ``self.predict(X)`` wrt. `y`.
      :rtype: float

      .. rubric:: Notes

      The :math:`R^2` score used when calling ``score`` on a regressor uses
      ``multioutput='uniform_average'`` from version 0.23 to keep consistent
      with default value of :func:`~sklearn.metrics.r2_score`.
      This influences the ``score`` method of all the multioutput
      regressors (except for
      :class:`~sklearn.multioutput.MultiOutputRegressor`).


   .. py:method:: set_params(self, **params)

      Set the parameters of this estimator.

      The method works on simple estimators as well as on nested objects
      (such as :class:`~sklearn.pipeline.Pipeline`). The latter have
      parameters of the form ``<component>__<parameter>`` so that it's
      possible to update each component of a nested object.

      :param \*\*params: Estimator parameters.
      :type \*\*params: dict

      :returns: **self** -- Estimator instance.
      :rtype: estimator instance