utils
=============================

.. py:module:: evalml.automl.utils

.. autoapi-nested-parse::

   Utilities useful in AutoML.


Module Contents
---------------


Functions
~~~~~~~~~

.. autoapisummary::
   :nosignatures:

   evalml.automl.utils.check_all_pipeline_names_unique
   evalml.automl.utils.get_best_sampler_for_data
   evalml.automl.utils.get_default_primary_search_objective
   evalml.automl.utils.get_pipelines_from_component_graphs
   evalml.automl.utils.make_data_splitter
   evalml.automl.utils.tune_binary_threshold


Attributes Summary
~~~~~~~~~~~~~~~~~~~

.. autoapisummary::

   evalml.automl.utils.AutoMLConfig


Contents
~~~~~~~~~~~~~~~~~~~
.. py:data:: AutoMLConfig
   

.. py:function:: check_all_pipeline_names_unique(pipelines)

   Checks whether all the pipeline names are unique.

   :param pipelines: List of pipelines to check if all names are unique.
   :type pipelines: list[PipelineBase]

   :raises ValueError: If any pipeline names are duplicated.


.. py:function:: get_best_sampler_for_data(X, y, sampler_method, sampler_balanced_ratio)

   Returns the name of the sampler component to use for AutoMLSearch.

   :param X: The input feature data
   :type X: pd.DataFrame
   :param y: The input target data
   :type y: pd.Series
   :param sampler_method: The sampler_type argument passed to AutoMLSearch
   :type sampler_method: str
   :param sampler_balanced_ratio: The ratio of min:majority targets that we would consider balanced,
                                  or should balance the classes to.
   :type sampler_balanced_ratio: float

   :returns: The string name of the sampling component to use, or None if no sampler is necessary
   :rtype: str, None


.. py:function:: get_default_primary_search_objective(problem_type)

   Get the default primary search objective for a problem type.

   :param problem_type: Problem type of interest.
   :type problem_type: str or ProblemType

   :returns: primary objective instance for the problem type.
   :rtype: ObjectiveBase


.. py:function:: get_pipelines_from_component_graphs(component_graphs_dict, problem_type, parameters=None, random_seed=0)

   Returns created pipelines from passed component graphs based on the specified problem type.

   :param component_graphs_dict: The dict of component graphs.
   :type component_graphs_dict: dict
   :param problem_type: The problem type for which pipelines will be created.
   :type problem_type: str or ProblemType
   :param parameters: Pipeline-level parameters that should be passed to the proposed pipelines. Defaults to None.
   :type parameters: dict
   :param random_seed: Random seed. Defaults to 0.
   :type random_seed: int

   :returns: List of pipelines made from the passed component graphs.
   :rtype: list


.. py:function:: make_data_splitter(X, y, problem_type, problem_configuration=None, n_splits=3, shuffle=True, random_seed=0)

   Given the training data and ML problem parameters, compute a data splitting method to use during AutoML search.

   :param X: The input training data of shape [n_samples, n_features].
   :type X: pd.DataFrame
   :param y: The target training data of length [n_samples].
   :type y: pd.Series
   :param problem_type: The type of machine learning problem.
   :type problem_type: ProblemType
   :param problem_configuration: Additional parameters needed to configure the search. For example,
                                 in time series problems, values should be passed in for the time_index, gap, and max_delay variables. Defaults to None.
   :type problem_configuration: dict, None
   :param n_splits: The number of CV splits, if applicable. Defaults to 3.
   :type n_splits: int, None
   :param shuffle: Whether or not to shuffle the data before splitting, if applicable. Defaults to True.
   :type shuffle: bool
   :param random_seed: Seed for the random number generator. Defaults to 0.
   :type random_seed: int

   :returns: Data splitting method.
   :rtype: sklearn.model_selection.BaseCrossValidator

   :raises ValueError: If problem_configuration is not given for a time-series problem.


.. py:function:: tune_binary_threshold(pipeline, objective, problem_type, X_threshold_tuning, y_threshold_tuning, X=None, y=None)

   Tunes the threshold of a binary pipeline to the X and y thresholding data.

   :param pipeline: Pipeline instance to threshold.
   :type pipeline: Pipeline
   :param objective: The objective we want to tune with. If not tuneable and best_pipeline is True, will use F1.
   :type objective: ObjectiveBase
   :param problem_type: The problem type of the pipeline.
   :type problem_type: ProblemType
   :param X_threshold_tuning: Features to which the pipeline will be tuned.
   :type X_threshold_tuning: pd.DataFrame
   :param y_threshold_tuning: Target data to which the pipeline will be tuned.
   :type y_threshold_tuning: pd.Series
   :param X: Features to which the pipeline will be trained (used for time series binary). Defaults to None.
   :type X: pd.DataFrame
   :param y: Target to which the pipeline will be trained (used for time series binary). Defaults to None.
   :type y: pd.Series