classification_pipeline
==================================================

.. py:module:: evalml.pipelines.classification_pipeline

.. autoapi-nested-parse::

   Pipeline subclass for all classification pipelines.


Module Contents
---------------

Classes Summary
~~~~~~~~~~~~~~~

.. autoapisummary::

   evalml.pipelines.classification_pipeline.ClassificationPipeline


Contents
~~~~~~~~~~~~~~~~~~~
.. py:class:: ClassificationPipeline(component_graph, parameters=None, custom_name=None, random_seed=0)


   Pipeline subclass for all classification pipelines.

   :param component_graph: List of components in order. Accepts strings or ComponentBase subclasses in the list.
                           Note that when duplicate components are specified in a list, the duplicate component names will be modified with the
                           component's index in the list. For example, the component graph
                           [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names
                           ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"]
   :type component_graph: list or dict
   :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values.
                      An empty dictionary or None implies using all default values for component parameters. Defaults to None.
   :type parameters: dict
   :param custom_name: Custom name for the pipeline. Defaults to None.
   :type custom_name: str
   :param random_seed: Seed for the random number generator. Defaults to 0.
   :type random_seed: int


   **Attributes**

   .. list-table::
      :widths: 15 85
      :header-rows: 0

      * - **problem_type**
        - None


   **Methods**

   .. autoapisummary::
      :nosignatures:

      evalml.pipelines.classification_pipeline.ClassificationPipeline.can_tune_threshold_with_objective
      evalml.pipelines.classification_pipeline.ClassificationPipeline.classes_
      evalml.pipelines.classification_pipeline.ClassificationPipeline.clone
      evalml.pipelines.classification_pipeline.ClassificationPipeline.create_objectives
      evalml.pipelines.classification_pipeline.ClassificationPipeline.custom_name
      evalml.pipelines.classification_pipeline.ClassificationPipeline.describe
      evalml.pipelines.classification_pipeline.ClassificationPipeline.feature_importance
      evalml.pipelines.classification_pipeline.ClassificationPipeline.fit
      evalml.pipelines.classification_pipeline.ClassificationPipeline.fit_transform
      evalml.pipelines.classification_pipeline.ClassificationPipeline.get_component
      evalml.pipelines.classification_pipeline.ClassificationPipeline.get_hyperparameter_ranges
      evalml.pipelines.classification_pipeline.ClassificationPipeline.graph
      evalml.pipelines.classification_pipeline.ClassificationPipeline.graph_dict
      evalml.pipelines.classification_pipeline.ClassificationPipeline.graph_feature_importance
      evalml.pipelines.classification_pipeline.ClassificationPipeline.inverse_transform
      evalml.pipelines.classification_pipeline.ClassificationPipeline.load
      evalml.pipelines.classification_pipeline.ClassificationPipeline.model_family
      evalml.pipelines.classification_pipeline.ClassificationPipeline.name
      evalml.pipelines.classification_pipeline.ClassificationPipeline.new
      evalml.pipelines.classification_pipeline.ClassificationPipeline.parameters
      evalml.pipelines.classification_pipeline.ClassificationPipeline.predict
      evalml.pipelines.classification_pipeline.ClassificationPipeline.predict_proba
      evalml.pipelines.classification_pipeline.ClassificationPipeline.save
      evalml.pipelines.classification_pipeline.ClassificationPipeline.score
      evalml.pipelines.classification_pipeline.ClassificationPipeline.summary
      evalml.pipelines.classification_pipeline.ClassificationPipeline.transform
      evalml.pipelines.classification_pipeline.ClassificationPipeline.transform_all_but_final

   .. py:method:: can_tune_threshold_with_objective(self, objective)

      Determine whether the threshold of a binary classification pipeline can be tuned.

      :param objective: Primary AutoMLSearch objective.
      :type objective: ObjectiveBase

      :returns: True if the pipeline threshold can be tuned.
      :rtype: bool


   .. py:method:: classes_(self)
      :property:

      Gets the class names for the pipeline. Will return None before pipeline is fit.


   .. py:method:: clone(self)

      Constructs a new pipeline with the same components, parameters, and random seed.

      :returns: A new instance of this pipeline with identical components, parameters, and random seed.


   .. py:method:: create_objectives(objectives)
      :staticmethod:

      Create objective instances from a list of strings or objective classes.


   .. py:method:: custom_name(self)
      :property:

      Custom name of the pipeline.


   .. py:method:: describe(self, return_dict=False)

      Outputs pipeline details including component parameters.

      :param return_dict: If True, return dictionary of information about pipeline. Defaults to False.
      :type return_dict: bool

      :returns: Dictionary of all component parameters if return_dict is True, else None.
      :rtype: dict


   .. py:method:: feature_importance(self)
      :property:

      Importance associated with each feature. Features dropped by the feature selection are excluded.

      :returns: Feature names and their corresponding importance
      :rtype: pd.DataFrame


   .. py:method:: fit(self, X, y)

      Build a classification model. For string and categorical targets, classes are sorted by sorted(set(y)) and then are mapped to values between 0 and n_classes-1.

      :param X: The input training data of shape [n_samples, n_features]
      :type X: pd.DataFrame or np.ndarray
      :param y: The target training labels of length [n_samples]
      :type y: pd.Series, np.ndarray

      :returns: self

      :raises ValueError: If the number of unique classes in y are not appropriate for the type of pipeline.
      :raises TypeError: If the dtype is boolean but pd.NA exists in the series.
      :raises Exception: For all other exceptions.


   .. py:method:: fit_transform(self, X, y)

      Fit and transform all components in the component graph, if all components are Transformers.

      :param X: Input features of shape [n_samples, n_features].
      :type X: pd.DataFrame
      :param y: The target data of length [n_samples].
      :type y: pd.Series

      :returns: Transformed output.
      :rtype: pd.DataFrame

      :raises ValueError: If final component is an Estimator.


   .. py:method:: get_component(self, name)

      Returns component by name.

      :param name: Name of component.
      :type name: str

      :returns: Component to return
      :rtype: Component


   .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters)

      Returns hyperparameter ranges from all components as a dictionary.

      :param custom_hyperparameters: Custom hyperparameters for the pipeline.
      :type custom_hyperparameters: dict

      :returns: Dictionary of hyperparameter ranges for each component in the pipeline.
      :rtype: dict


   .. py:method:: graph(self, filepath=None)

      Generate an image representing the pipeline graph.

      :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
      :type filepath: str, optional

      :returns: Graph object that can be directly displayed in Jupyter notebooks.
      :rtype: graphviz.Digraph

      :raises RuntimeError: If graphviz is not installed.
      :raises ValueError: If path is not writeable.


   .. py:method:: graph_dict(self)

      Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases.

      x_edges specifies from which component feature data is being passed.
      y_edges specifies from which component target data is being passed.
      This can be used to build graphs across a variety of visualization tools.
      Template:
      {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}},
      "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...],
      "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]}

      :returns: A dictionary representing the DAG structure.
      :rtype: dag_dict (dict)


   .. py:method:: graph_feature_importance(self, importance_threshold=0)

      Generate a bar graph of the pipeline's feature importance.

      :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.
      :type importance_threshold: float, optional

      :returns: A bar graph showing features and their corresponding importance.
      :rtype: plotly.Figure

      :raises ValueError: If importance threshold is not valid.


   .. py:method:: inverse_transform(self, y)

      Apply component inverse_transform methods to estimator predictions in reverse order.

      Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd).

      :param y: Final component features.
      :type y: pd.Series

      :returns: The inverse transform of the target.
      :rtype: pd.Series


   .. py:method:: load(file_path: Union[str, io.BytesIO])
      :staticmethod:

      Loads pipeline at file path.

      :param file_path: load filepath or a BytesIO object.
      :type file_path: str|BytesIO

      :returns: PipelineBase object


   .. py:method:: model_family(self)
      :property:

      Returns model family of this pipeline.


   .. py:method:: name(self)
      :property:

      Name of the pipeline.


   .. py:method:: new(self, parameters, random_seed=0)

      Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method.

      :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values.
                         An empty dictionary or None implies using all default values for component parameters. Defaults to None.
      :type parameters: dict
      :param random_seed: Seed for the random number generator. Defaults to 0.
      :type random_seed: int

      :returns: A new instance of this pipeline with identical components.


   .. py:method:: parameters(self)
      :property:

      Parameter dictionary for this pipeline.

      :returns: Dictionary of all component parameters.
      :rtype: dict


   .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None)

      Make predictions using selected features.

      Note: we cast y as ints first to address boolean values that may be returned from
      calculating predictions which we would not be able to otherwise transform if we
      originally had integer targets.

      :param X: Data of shape [n_samples, n_features].
      :type X: pd.DataFrame
      :param objective: The objective to use to make predictions.
      :type objective: Object or string
      :param X_train: Training data. Ignored. Only used for time series.
      :type X_train: pd.DataFrame
      :param y_train: Training labels. Ignored. Only used for time series.
      :type y_train: pd.Series

      :returns: Estimated labels.
      :rtype: pd.Series


   .. py:method:: predict_proba(self, X, X_train=None, y_train=None)

      Make probability estimates for labels.

      :param X: Data of shape [n_samples, n_features]
      :type X: pd.DataFrame or np.ndarray
      :param X_train: Training data. Ignored. Only used for time series.
      :type X_train: pd.DataFrame or np.ndarray or None
      :param y_train: Training labels. Ignored. Only used for time series.
      :type y_train: pd.Series or None

      :returns: Probability estimates
      :rtype: pd.DataFrame

      :raises ValueError: If final component is not an estimator.


   .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)

      Saves pipeline at file path.

      :param file_path: Location to save file.
      :type file_path: str
      :param pickle_protocol: The pickle data stream format.
      :type pickle_protocol: int


   .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None)

      Evaluate model performance on objectives.

      :param X: Data of shape [n_samples, n_features]
      :type X: pd.DataFrame
      :param y: True labels of length [n_samples]
      :type y: pd.Series
      :param objectives: List of objectives to score
      :type objectives: list
      :param X_train: Training data. Ignored. Only used for time series.
      :type X_train: pd.DataFrame
      :param y_train: Training labels. Ignored. Only used for time series.
      :type y_train: pd.Series

      :returns: Ordered dictionary of objective scores.
      :rtype: dict


   .. py:method:: summary(self)
      :property:

      A short summary of the pipeline structure, describing the list of components used.

      Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder

      :returns: A string describing the pipeline structure.


   .. py:method:: transform(self, X, y=None)

      Transform the input.

      :param X: Data of shape [n_samples, n_features].
      :type X: pd.DataFrame, or np.ndarray
      :param y: The target data of length [n_samples]. Defaults to None.
      :type y: pd.Series

      :returns: Transformed output.
      :rtype: pd.DataFrame


   .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None)

      Transforms the data by applying all pre-processing components.

      :param X: Input data to the pipeline to transform.
      :type X: pd.DataFrame
      :param y: Targets corresponding to X. Optional.
      :type y: pd.Series or None
      :param X_train: Training data. Only used for time series.
      :type X_train: pd.DataFrame or np.ndarray or None
      :param y_train: Training labels.  Only used for time series.
      :type y_train: pd.Series or None

      :returns: New transformed features.
      :rtype: pd.DataFrame