utils ============================= .. py:module:: evalml.automl.utils .. autoapi-nested-parse:: Utilities useful in AutoML. Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: :nosignatures: evalml.automl.utils.check_all_pipeline_names_unique evalml.automl.utils.get_best_sampler_for_data evalml.automl.utils.get_default_primary_search_objective evalml.automl.utils.get_pipelines_from_component_graphs evalml.automl.utils.make_data_splitter evalml.automl.utils.tune_binary_threshold Attributes Summary ~~~~~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.automl.utils.AutoMLConfig Contents ~~~~~~~~~~~~~~~~~~~ .. py:data:: AutoMLConfig .. py:function:: check_all_pipeline_names_unique(pipelines) Checks whether all the pipeline names are unique. :param pipelines: List of pipelines to check if all names are unique. :type pipelines: list[PipelineBase] :raises ValueError: If any pipeline names are duplicated. .. py:function:: get_best_sampler_for_data(X, y, sampler_method, sampler_balanced_ratio) Returns the name of the sampler component to use for AutoMLSearch. :param X: The input feature data :type X: pd.DataFrame :param y: The input target data :type y: pd.Series :param sampler_method: The sampler_type argument passed to AutoMLSearch :type sampler_method: str :param sampler_balanced_ratio: The ratio of min:majority targets that we would consider balanced, or should balance the classes to. :type sampler_balanced_ratio: float :returns: The string name of the sampling component to use, or None if no sampler is necessary :rtype: str, None .. py:function:: get_default_primary_search_objective(problem_type) Get the default primary search objective for a problem type. :param problem_type: Problem type of interest. :type problem_type: str or ProblemType :returns: primary objective instance for the problem type. :rtype: ObjectiveBase .. py:function:: get_pipelines_from_component_graphs(component_graphs_dict, problem_type, parameters=None, random_seed=0) Returns created pipelines from passed component graphs based on the specified problem type. :param component_graphs_dict: The dict of component graphs. :type component_graphs_dict: dict :param problem_type: The problem type for which pipelines will be created. :type problem_type: str or ProblemType :param parameters: Pipeline-level parameters that should be passed to the proposed pipelines. Defaults to None. :type parameters: dict :param random_seed: Random seed. Defaults to 0. :type random_seed: int :returns: List of pipelines made from the passed component graphs. :rtype: list .. py:function:: make_data_splitter(X, y, problem_type, problem_configuration=None, n_splits=3, shuffle=True, random_seed=0) Given the training data and ML problem parameters, compute a data splitting method to use during AutoML search. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :param problem_type: The type of machine learning problem. :type problem_type: ProblemType :param problem_configuration: Additional parameters needed to configure the search. For example, in time series problems, values should be passed in for the time_index, gap, and max_delay variables. Defaults to None. :type problem_configuration: dict, None :param n_splits: The number of CV splits, if applicable. Defaults to 3. :type n_splits: int, None :param shuffle: Whether or not to shuffle the data before splitting, if applicable. Defaults to True. :type shuffle: bool :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: Data splitting method. :rtype: sklearn.model_selection.BaseCrossValidator :raises ValueError: If problem_configuration is not given for a time-series problem. .. py:function:: tune_binary_threshold(pipeline, objective, problem_type, X_threshold_tuning, y_threshold_tuning, X=None, y=None) Tunes the threshold of a binary pipeline to the X and y thresholding data. :param pipeline: Pipeline instance to threshold. :type pipeline: Pipeline :param objective: The objective we want to tune with. If not tuneable and best_pipeline is True, will use F1. :type objective: ObjectiveBase :param problem_type: The problem type of the pipeline. :type problem_type: ProblemType :param X_threshold_tuning: Features to which the pipeline will be tuned. :type X_threshold_tuning: pd.DataFrame :param y_threshold_tuning: Target data to which the pipeline will be tuned. :type y_threshold_tuning: pd.Series :param X: Features to which the pipeline will be trained (used for time series binary). Defaults to None. :type X: pd.DataFrame :param y: Target to which the pipeline will be trained (used for time series binary). Defaults to None. :type y: pd.Series