classification_pipeline ================================================== .. py:module:: evalml.pipelines.classification_pipeline .. autoapi-nested-parse:: Pipeline subclass for all classification pipelines. Module Contents --------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.pipelines.classification_pipeline.ClassificationPipeline Contents ~~~~~~~~~~~~~~~~~~~ .. py:class:: ClassificationPipeline(component_graph, parameters=None, custom_name=None, random_seed=0) Pipeline subclass for all classification pipelines. :param component_graph: List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component's index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"] :type component_graph: list or dict :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param custom_name: Custom name for the pipeline. Defaults to None. :type custom_name: str :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **problem_type** - None **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.classification_pipeline.ClassificationPipeline.can_tune_threshold_with_objective evalml.pipelines.classification_pipeline.ClassificationPipeline.classes_ evalml.pipelines.classification_pipeline.ClassificationPipeline.clone evalml.pipelines.classification_pipeline.ClassificationPipeline.create_objectives evalml.pipelines.classification_pipeline.ClassificationPipeline.custom_name evalml.pipelines.classification_pipeline.ClassificationPipeline.describe evalml.pipelines.classification_pipeline.ClassificationPipeline.feature_importance evalml.pipelines.classification_pipeline.ClassificationPipeline.fit evalml.pipelines.classification_pipeline.ClassificationPipeline.fit_transform evalml.pipelines.classification_pipeline.ClassificationPipeline.get_component evalml.pipelines.classification_pipeline.ClassificationPipeline.get_hyperparameter_ranges evalml.pipelines.classification_pipeline.ClassificationPipeline.graph evalml.pipelines.classification_pipeline.ClassificationPipeline.graph_dict evalml.pipelines.classification_pipeline.ClassificationPipeline.graph_feature_importance evalml.pipelines.classification_pipeline.ClassificationPipeline.inverse_transform evalml.pipelines.classification_pipeline.ClassificationPipeline.load evalml.pipelines.classification_pipeline.ClassificationPipeline.model_family evalml.pipelines.classification_pipeline.ClassificationPipeline.name evalml.pipelines.classification_pipeline.ClassificationPipeline.new evalml.pipelines.classification_pipeline.ClassificationPipeline.parameters evalml.pipelines.classification_pipeline.ClassificationPipeline.predict evalml.pipelines.classification_pipeline.ClassificationPipeline.predict_proba evalml.pipelines.classification_pipeline.ClassificationPipeline.save evalml.pipelines.classification_pipeline.ClassificationPipeline.score evalml.pipelines.classification_pipeline.ClassificationPipeline.summary evalml.pipelines.classification_pipeline.ClassificationPipeline.transform evalml.pipelines.classification_pipeline.ClassificationPipeline.transform_all_but_final .. py:method:: can_tune_threshold_with_objective(self, objective) Determine whether the threshold of a binary classification pipeline can be tuned. :param objective: Primary AutoMLSearch objective. :type objective: ObjectiveBase :returns: True if the pipeline threshold can be tuned. :rtype: bool .. py:method:: classes_(self) :property: Gets the class names for the pipeline. Will return None before pipeline is fit. .. py:method:: clone(self) Constructs a new pipeline with the same components, parameters, and random seed. :returns: A new instance of this pipeline with identical components, parameters, and random seed. .. py:method:: create_objectives(objectives) :staticmethod: Create objective instances from a list of strings or objective classes. .. py:method:: custom_name(self) :property: Custom name of the pipeline. .. py:method:: describe(self, return_dict=False) Outputs pipeline details including component parameters. :param return_dict: If True, return dictionary of information about pipeline. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None. :rtype: dict .. py:method:: feature_importance(self) :property: Importance associated with each feature. Features dropped by the feature selection are excluded. :returns: Feature names and their corresponding importance :rtype: pd.DataFrame .. py:method:: fit(self, X, y) Build a classification model. For string and categorical targets, classes are sorted by sorted(set(y)) and then are mapped to values between 0 and n_classes-1. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: The target training labels of length [n_samples] :type y: pd.Series, np.ndarray :returns: self :raises ValueError: If the number of unique classes in y are not appropriate for the type of pipeline. :raises TypeError: If the dtype is boolean but pd.NA exists in the series. :raises Exception: For all other exceptions. .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: get_component(self, name) Returns component by name. :param name: Name of component. :type name: str :returns: Component to return :rtype: Component .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters) Returns hyperparameter ranges from all components as a dictionary. :param custom_hyperparameters: Custom hyperparameters for the pipeline. :type custom_hyperparameters: dict :returns: Dictionary of hyperparameter ranges for each component in the pipeline. :rtype: dict .. py:method:: graph(self, filepath=None) Generate an image representing the pipeline graph. :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. :raises ValueError: If path is not writeable. .. py:method:: graph_dict(self) Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases. x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}}, "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...], "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]} :returns: A dictionary representing the DAG structure. :rtype: dag_dict (dict) .. py:method:: graph_feature_importance(self, importance_threshold=0) Generate a bar graph of the pipeline's feature importance. :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero. :type importance_threshold: float, optional :returns: A bar graph showing features and their corresponding importance. :rtype: plotly.Figure :raises ValueError: If importance threshold is not valid. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: Final component features. :type y: pd.Series :returns: The inverse transform of the target. :rtype: pd.Series .. py:method:: load(file_path: Union[str, io.BytesIO]) :staticmethod: Loads pipeline at file path. :param file_path: load filepath or a BytesIO object. :type file_path: str|BytesIO :returns: PipelineBase object .. py:method:: model_family(self) :property: Returns model family of this pipeline. .. py:method:: name(self) :property: Name of the pipeline. .. py:method:: new(self, parameters, random_seed=0) Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: A new instance of this pipeline with identical components. .. py:method:: parameters(self) :property: Parameter dictionary for this pipeline. :returns: Dictionary of all component parameters. :rtype: dict .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None) Make predictions using selected features. Note: we cast y as ints first to address boolean values that may be returned from calculating predictions which we would not be able to otherwise transform if we originally had integer targets. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param objective: The objective to use to make predictions. :type objective: Object or string :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series :returns: Estimated labels. :rtype: pd.Series .. py:method:: predict_proba(self, X, X_train=None, y_train=None) Make probability estimates for labels. :param X: Data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series or None :returns: Probability estimates :rtype: pd.DataFrame :raises ValueError: If final component is not an estimator. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves pipeline at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None) Evaluate model performance on objectives. :param X: Data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: True labels of length [n_samples] :type y: pd.Series :param objectives: List of objectives to score :type objectives: list :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series :returns: Ordered dictionary of objective scores. :rtype: dict .. py:method:: summary(self) :property: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder :returns: A string describing the pipeline structure. .. py:method:: transform(self, X, y=None) Transform the input. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None) Transforms the data by applying all pre-processing components. :param X: Input data to the pipeline to transform. :type X: pd.DataFrame :param y: Targets corresponding to X. Optional. :type y: pd.Series or None :param X_train: Training data. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Only used for time series. :type y_train: pd.Series or None :returns: New transformed features. :rtype: pd.DataFrame