utils
===========================================

.. py:module:: evalml.pipelines.components.utils

.. autoapi-nested-parse::

   Utility methods for EvalML components.


Module Contents
---------------

Classes Summary
~~~~~~~~~~~~~~~

.. autoapisummary::

   evalml.pipelines.components.utils.WrappedSKClassifier
   evalml.pipelines.components.utils.WrappedSKRegressor


Functions
~~~~~~~~~

.. autoapisummary::
   :nosignatures:

   evalml.pipelines.components.utils.all_components
   evalml.pipelines.components.utils.allowed_model_families
   evalml.pipelines.components.utils.convert_bool_to_double
   evalml.pipelines.components.utils.estimator_unable_to_handle_nans
   evalml.pipelines.components.utils.generate_component_code
   evalml.pipelines.components.utils.get_estimators
   evalml.pipelines.components.utils.get_prediction_intevals_for_tree_regressors
   evalml.pipelines.components.utils.handle_component_class
   evalml.pipelines.components.utils.handle_float_categories_for_catboost
   evalml.pipelines.components.utils.make_balancing_dictionary
   evalml.pipelines.components.utils.match_indices
   evalml.pipelines.components.utils.scikit_learn_wrapped_estimator


Contents
~~~~~~~~~~~~~~~~~~~
.. py:function:: all_components()

   Get all available components.


.. py:function:: allowed_model_families(problem_type)

   List the model types allowed for a particular problem type.

   :param problem_type: ProblemTypes enum or string.
   :type problem_type: ProblemTypes or str

   :returns: A list of model families.
   :rtype: list[ModelFamily]


.. py:function:: convert_bool_to_double(data: pandas.DataFrame, include_ints: bool = False) -> pandas.DataFrame

   Converts all boolean columns in dataframe to doubles. If include_ints, converts all integer columns to doubles as well.

   :param data: Input dataframe.
   :type data: pd.DataFrame
   :param include_ints: If True, converts all integer columns to doubles as well. Defaults to False.
   :type include_ints: bool

   :returns: Input dataframe with all boolean-valued columns converted to doubles.
   :rtype: pd.DataFrame


.. py:function:: estimator_unable_to_handle_nans(estimator_class)

   If True, provided estimator class is unable to handle NaN values as an input.

   :param estimator_class: Estimator class
   :type estimator_class: Estimator

   :raises ValueError: If estimator is not a valid estimator class.

   :returns: True if estimator class is unable to process NaN values, False otherwise.
   :rtype: bool


.. py:function:: generate_component_code(element)

   Creates and returns a string that contains the Python imports and code required for running the EvalML component.

   :param element: The instance of the component to generate string Python code for.
   :type element: component instance

   :returns: String representation of Python code that can be run separately in order to recreate the component instance.
             Does not include code for custom component implementation.

   :raises ValueError: If the input element is not a component instance.

   .. rubric:: Examples

   >>> from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor
   >>> assert generate_component_code(DecisionTreeRegressor()) == "from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor\n\ndecisionTreeRegressor = DecisionTreeRegressor(**{'criterion': 'squared_error', 'max_features': 'sqrt', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0})"
   ...
   >>> from evalml.pipelines.components.transformers.imputers.simple_imputer import SimpleImputer
   >>> assert generate_component_code(SimpleImputer()) == "from evalml.pipelines.components.transformers.imputers.simple_imputer import SimpleImputer\n\nsimpleImputer = SimpleImputer(**{'impute_strategy': 'most_frequent', 'fill_value': None})"


.. py:function:: get_estimators(problem_type, model_families=None, excluded_model_families=None)

   Returns the estimators allowed for a particular problem type.

   Can also optionally filter by a list of model types.

   :param problem_type: Problem type to filter for.
   :type problem_type: ProblemTypes or str
   :param model_families: Model families to filter for.
   :type model_families: list(str, ModelFamily)
   :param excluded_model_families: A list of model families to exclude from the results.
   :type excluded_model_families: list(str, ModelFamily)

   :returns: A list of estimator subclasses.
   :rtype: list[class]

   :raises TypeError: If the model_families parameter is not a list.
   :raises RuntimeError: If a model family is not valid for the problem type.


.. py:function:: get_prediction_intevals_for_tree_regressors(X: pandas.DataFrame, predictions: pandas.Series, coverage: List[float], estimators: List[evalml.pipelines.components.estimators.estimator.Estimator]) -> Dict[str, pandas.Series]

   Find the prediction intervals for tree-based regressors.

   :param X: Data of shape [n_samples, n_features].
   :type X: pd.DataFrame
   :param predictions: Predictions from the regressor.
   :type predictions: pd.Series
   :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the
                    prediction interval should be calculated for.
   :type coverage: list[float]
   :param estimators: Collection of fitted sub-estimators.
   :type estimators: list

   :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
   :rtype: dict


.. py:function:: handle_component_class(component_class)

   Standardizes input from a string name to a ComponentBase subclass if necessary.

   If a str is provided, will attempt to look up a ComponentBase class by that name and
   return a new instance. Otherwise if a ComponentBase subclass or Component instance is provided,
   will return that without modification.

   :param component_class: Input to be standardized.
   :type component_class: str, ComponentBase

   :returns: ComponentBase

   :raises ValueError: If input is not a valid component class.
   :raises MissingComponentError: If the component cannot be found.

   .. rubric:: Examples

   >>> from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor
   >>> handle_component_class(DecisionTreeRegressor)
   <class 'evalml.pipelines.components.estimators.regressors.decision_tree_regressor.DecisionTreeRegressor'>
   >>> handle_component_class("Random Forest Regressor")
   <class 'evalml.pipelines.components.estimators.regressors.rf_regressor.RandomForestRegressor'>


.. py:function:: handle_float_categories_for_catboost(X)

   Updates input data to be compatible with CatBoost estimators.

   CatBoost cannot handle data in X that is the Categorical Woodwork logical type with floating point categories.
   This utility determines if the floating point categories can be converted to integers
   without truncating any data, and if they can be, converts them to int64 categories.
   Will not attempt to use values that are truly floating points.

   :param X: Input data to CatBoost that has Woodwork initialized
   :type X: pd.DataFrame

   :returns:

             Input data with exact same Woodwork typing info as the original but with any float categories
                 converted to be int64 when possible.
   :rtype: DataFrame

   :raises ValueError: if the numeric categories are actual floats that cannot be converted to integers
       without truncating data


.. py:function:: make_balancing_dictionary(y, sampling_ratio)

   Makes dictionary for oversampler components. Find ratio of each class to the majority. If the ratio is smaller than the sampling_ratio, we want to oversample, otherwise, we don't want to sample at all, and we leave the data as is.

   :param y: Target data.
   :type y: pd.Series
   :param sampling_ratio: The balanced ratio we want the samples to meet.
   :type sampling_ratio: float

   :returns: Dictionary where keys are the classes, and the corresponding values are the counts of samples
             for each class that will satisfy sampling_ratio.
   :rtype: dict

   :raises ValueError: If sampling ratio is not in the range (0, 1] or the target is empty.

   .. rubric:: Examples

   >>> import pandas as pd
   >>> y = pd.Series([1] * 4 + [2] * 8 + [3])
   >>> assert make_balancing_dictionary(y, 0.5) == {2: 8, 1: 4, 3: 4}
   >>> assert make_balancing_dictionary(y, 0.9) == {2: 8, 1: 7, 3: 7}
   >>> assert make_balancing_dictionary(y, 0.1) == {2: 8, 1: 4, 3: 1}


.. py:function:: match_indices(X: pandas.DataFrame, y: pandas.Series) -> Tuple[pandas.DataFrame, Union[pandas.Series, pandas.DataFrame]]

   Matches index from the passed dataframe to the passed series.

   :param X: Dataframe to match index from.
   :type X: pd.DataFrame
   :param y: Series to match the index to.
   :type y: pd.Series

   Returns: Tuple(pd.DataFrame, pd.Series): DataFrame and Series with matching indicies.


.. py:function:: scikit_learn_wrapped_estimator(evalml_obj)

   Wraps an EvalML object as a scikit-learn estimator.


.. py:class:: WrappedSKClassifier(pipeline)


   Scikit-learn classifier wrapper class.


   **Methods**

   .. autoapisummary::
      :nosignatures:

      evalml.pipelines.components.utils.WrappedSKClassifier.fit
      evalml.pipelines.components.utils.WrappedSKClassifier.get_metadata_routing
      evalml.pipelines.components.utils.WrappedSKClassifier.get_params
      evalml.pipelines.components.utils.WrappedSKClassifier.predict
      evalml.pipelines.components.utils.WrappedSKClassifier.predict_proba
      evalml.pipelines.components.utils.WrappedSKClassifier.score
      evalml.pipelines.components.utils.WrappedSKClassifier.set_params

   .. py:method:: fit(self, X, y)

      Fits component to data.

      :param X: The input training data of shape [n_samples, n_features].
      :type X: pd.DataFrame or np.ndarray
      :param y: The target training data of length [n_samples].
      :type y: pd.Series, optional

      :returns: self


   .. py:method:: get_metadata_routing(self)

      Get metadata routing of this object.

      Please check :ref:`User Guide <metadata_routing>` on how the routing
      mechanism works.

      :returns: **routing** -- A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating
                routing information.
      :rtype: MetadataRequest


   .. py:method:: get_params(self, deep=True)

      Get parameters for this estimator.

      :param deep: If True, will return the parameters for this estimator and
                   contained subobjects that are estimators.
      :type deep: bool, default=True

      :returns: **params** -- Parameter names mapped to their values.
      :rtype: dict


   .. py:method:: predict(self, X)

      Make predictions using selected features.

      :param X: Features
      :type X: pd.DataFrame

      :returns: Predicted values.
      :rtype: np.ndarray


   .. py:method:: predict_proba(self, X)

      Make probability estimates for labels.

      :param X: Features.
      :type X: pd.DataFrame

      :returns: Probability estimates.
      :rtype: np.ndarray


   .. py:method:: score(self, X, y, sample_weight=None)

      Return the mean accuracy on the given test data and labels.

      In multi-label classification, this is the subset accuracy
      which is a harsh metric since you require for each sample that
      each label set be correctly predicted.

      :param X: Test samples.
      :type X: array-like of shape (n_samples, n_features)
      :param y: True labels for `X`.
      :type y: array-like of shape (n_samples,) or (n_samples, n_outputs)
      :param sample_weight: Sample weights.
      :type sample_weight: array-like of shape (n_samples,), default=None

      :returns: **score** -- Mean accuracy of ``self.predict(X)`` w.r.t. `y`.
      :rtype: float


   .. py:method:: set_params(self, **params)

      Set the parameters of this estimator.

      The method works on simple estimators as well as on nested objects
      (such as :class:`~sklearn.pipeline.Pipeline`). The latter have
      parameters of the form ``<component>__<parameter>`` so that it's
      possible to update each component of a nested object.

      :param \*\*params: Estimator parameters.
      :type \*\*params: dict

      :returns: **self** -- Estimator instance.
      :rtype: estimator instance


.. py:class:: WrappedSKRegressor(pipeline)


   Scikit-learn regressor wrapper class.


   **Methods**

   .. autoapisummary::
      :nosignatures:

      evalml.pipelines.components.utils.WrappedSKRegressor.fit
      evalml.pipelines.components.utils.WrappedSKRegressor.get_metadata_routing
      evalml.pipelines.components.utils.WrappedSKRegressor.get_params
      evalml.pipelines.components.utils.WrappedSKRegressor.predict
      evalml.pipelines.components.utils.WrappedSKRegressor.score
      evalml.pipelines.components.utils.WrappedSKRegressor.set_params

   .. py:method:: fit(self, X, y)

      Fits component to data.

      :param X: the input training data of shape [n_samples, n_features]
      :type X: pd.DataFrame or np.ndarray
      :param y: the target training data of length [n_samples]
      :type y: pd.Series, optional

      :returns: self


   .. py:method:: get_metadata_routing(self)

      Get metadata routing of this object.

      Please check :ref:`User Guide <metadata_routing>` on how the routing
      mechanism works.

      :returns: **routing** -- A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating
                routing information.
      :rtype: MetadataRequest


   .. py:method:: get_params(self, deep=True)

      Get parameters for this estimator.

      :param deep: If True, will return the parameters for this estimator and
                   contained subobjects that are estimators.
      :type deep: bool, default=True

      :returns: **params** -- Parameter names mapped to their values.
      :rtype: dict


   .. py:method:: predict(self, X)

      Make predictions using selected features.

      :param X: Features.
      :type X: pd.DataFrame

      :returns: Predicted values.
      :rtype: np.ndarray


   .. py:method:: score(self, X, y, sample_weight=None)

      Return the coefficient of determination of the prediction.

      The coefficient of determination :math:`R^2` is defined as
      :math:`(1 - \frac{u}{v})`, where :math:`u` is the residual
      sum of squares ``((y_true - y_pred)** 2).sum()`` and :math:`v`
      is the total sum of squares ``((y_true - y_true.mean()) ** 2).sum()``.
      The best possible score is 1.0 and it can be negative (because the
      model can be arbitrarily worse). A constant model that always predicts
      the expected value of `y`, disregarding the input features, would get
      a :math:`R^2` score of 0.0.

      :param X: Test samples. For some estimators this may be a precomputed
                kernel matrix or a list of generic objects instead with shape
                ``(n_samples, n_samples_fitted)``, where ``n_samples_fitted``
                is the number of samples used in the fitting for the estimator.
      :type X: array-like of shape (n_samples, n_features)
      :param y: True values for `X`.
      :type y: array-like of shape (n_samples,) or (n_samples, n_outputs)
      :param sample_weight: Sample weights.
      :type sample_weight: array-like of shape (n_samples,), default=None

      :returns: **score** -- :math:`R^2` of ``self.predict(X)`` w.r.t. `y`.
      :rtype: float

      .. rubric:: Notes

      The :math:`R^2` score used when calling ``score`` on a regressor uses
      ``multioutput='uniform_average'`` from version 0.23 to keep consistent
      with default value of :func:`~sklearn.metrics.r2_score`.
      This influences the ``score`` method of all the multioutput
      regressors (except for
      :class:`~sklearn.multioutput.MultiOutputRegressor`).


   .. py:method:: set_params(self, **params)

      Set the parameters of this estimator.

      The method works on simple estimators as well as on nested objects
      (such as :class:`~sklearn.pipeline.Pipeline`). The latter have
      parameters of the form ``<component>__<parameter>`` so that it's
      possible to update each component of a nested object.

      :param \*\*params: Estimator parameters.
      :type \*\*params: dict

      :returns: **self** -- Estimator instance.
      :rtype: estimator instance