visualizations
===================================================

.. py:module:: evalml.model_understanding.visualizations

.. autoapi-nested-parse::

   Visualization functions for model understanding.


Module Contents
---------------


Functions
~~~~~~~~~

.. autoapisummary::
   :nosignatures:

   evalml.model_understanding.visualizations.binary_objective_vs_threshold
   evalml.model_understanding.visualizations.decision_tree_data_from_estimator
   evalml.model_understanding.visualizations.decision_tree_data_from_pipeline
   evalml.model_understanding.visualizations.get_linear_coefficients
   evalml.model_understanding.visualizations.get_prediction_vs_actual_data
   evalml.model_understanding.visualizations.get_prediction_vs_actual_over_time_data
   evalml.model_understanding.visualizations.graph_binary_objective_vs_threshold
   evalml.model_understanding.visualizations.graph_prediction_vs_actual
   evalml.model_understanding.visualizations.graph_prediction_vs_actual_over_time
   evalml.model_understanding.visualizations.graph_t_sne
   evalml.model_understanding.visualizations.t_sne
   evalml.model_understanding.visualizations.visualize_decision_tree


Contents
~~~~~~~~~~~~~~~~~~~
.. py:function:: binary_objective_vs_threshold(pipeline, X, y, objective, steps=100)

   Computes objective score as a function of potential binary classification decision thresholds for a fitted binary classification pipeline.

   :param pipeline: Fitted binary classification pipeline.
   :type pipeline: BinaryClassificationPipeline obj
   :param X: The input data used to compute objective score.
   :type X: pd.DataFrame
   :param y: The target labels.
   :type y: pd.Series
   :param objective: Objective used to score.
   :type objective: ObjectiveBase obj, str
   :param steps: Number of intervals to divide and calculate objective score at.
   :type steps: int

   :returns: DataFrame with thresholds and the corresponding objective score calculated at each threshold.
   :rtype: pd.DataFrame

   :raises ValueError: If objective is not a binary classification objective.
   :raises ValueError: If objective's `score_needs_proba` is not False.


.. py:function:: decision_tree_data_from_estimator(estimator)

   Return data for a fitted tree in a restructured format.

   :param estimator: A fitted DecisionTree-based estimator.
   :type estimator: ComponentBase

   :returns: An OrderedDict of OrderedDicts describing a tree structure.
   :rtype: OrderedDict

   :raises ValueError: If estimator is not a decision tree-based estimator.
   :raises NotFittedError: If estimator is not yet fitted.


.. py:function:: decision_tree_data_from_pipeline(pipeline_)

   Return data for a fitted pipeline in a restructured format.

   :param pipeline_: A pipeline with a DecisionTree-based estimator.
   :type pipeline_: PipelineBase

   :returns: An OrderedDict of OrderedDicts describing a tree structure.
   :rtype: OrderedDict

   :raises ValueError: If estimator is not a decision tree-based estimator.
   :raises NotFittedError: If estimator is not yet fitted.


.. py:function:: get_linear_coefficients(estimator, features=None)

   Returns a dataframe showing the features with the greatest predictive power for a linear model.

   :param estimator: Fitted linear model family estimator.
   :type estimator: Estimator
   :param features: List of feature names associated with the underlying data.
   :type features: list[str]

   :returns: Displaying the features by importance.
   :rtype: pd.DataFrame

   :raises ValueError: If the model is not a linear model.
   :raises NotFittedError: If the model is not yet fitted.


.. py:function:: get_prediction_vs_actual_data(y_true, y_pred, outlier_threshold=None)

   Combines y_true and y_pred into a single dataframe and adds a column for outliers. Used in `graph_prediction_vs_actual()`.

   :param y_true: The real target values of the data
   :type y_true: pd.Series, or np.ndarray
   :param y_pred: The predicted values outputted by the regression model.
   :type y_pred: pd.Series, or np.ndarray
   :param outlier_threshold: A positive threshold for what is considered an outlier value. This value is compared to the absolute difference
                             between each value of y_true and y_pred. Values within this threshold will be blue, otherwise they will be yellow.
                             Defaults to None.
   :type outlier_threshold: int, float

   :returns:         * `prediction`: Predicted values from regression model.
                     * `actual`: Real target values.
                     * `outlier`: Colors indicating which values are in the threshold for what is considered an outlier value.
   :rtype: pd.DataFrame with the following columns

   :raises ValueError: If threshold is not positive.


.. py:function:: get_prediction_vs_actual_over_time_data(pipeline, X, y, X_train, y_train, dates)

   Get the data needed for the prediction_vs_actual_over_time plot.

   :param pipeline: Fitted time series regression pipeline.
   :type pipeline: TimeSeriesRegressionPipeline
   :param X: Features used to generate new predictions.
   :type X: pd.DataFrame
   :param y: Target values to compare predictions against.
   :type y: pd.Series
   :param X_train: Data the pipeline was trained on.
   :type X_train: pd.DataFrame
   :param y_train: Target values for training data.
   :type y_train: pd.Series
   :param dates: Dates corresponding to target values and predictions.
   :type dates: pd.Series

   :returns: Predictions vs. time.
   :rtype: pd.DataFrame


.. py:function:: graph_binary_objective_vs_threshold(pipeline, X, y, objective, steps=100)

   Generates a plot graphing objective score vs. decision thresholds for a fitted binary classification pipeline.

   :param pipeline: Fitted pipeline
   :type pipeline: PipelineBase or subclass
   :param X: The input data used to score and compute scores
   :type X: pd.DataFrame
   :param y: The target labels
   :type y: pd.Series
   :param objective: Objective used to score, shown on the y-axis of the graph
   :type objective: ObjectiveBase obj, str
   :param steps: Number of intervals to divide and calculate objective score at
   :type steps: int

   :returns: plotly.Figure representing the objective score vs. threshold graph generated


.. py:function:: graph_prediction_vs_actual(y_true, y_pred, outlier_threshold=None)

   Generate a scatter plot comparing the true and predicted values. Used for regression plotting.

   :param y_true: The real target values of the data.
   :type y_true: pd.Series
   :param y_pred: The predicted values outputted by the regression model.
   :type y_pred: pd.Series
   :param outlier_threshold: A positive threshold for what is considered an outlier value. This value is compared to the absolute difference
                             between each value of y_true and y_pred. Values within this threshold will be blue, otherwise they will be yellow.
                             Defaults to None.
   :type outlier_threshold: int, float

   :returns: plotly.Figure representing the predicted vs. actual values graph

   :raises ValueError: If threshold is not positive.


.. py:function:: graph_prediction_vs_actual_over_time(pipeline, X, y, X_train, y_train, dates, single_series=None)

   Plot the target values and predictions against time on the x-axis.

   :param pipeline: Fitted time series regression pipeline.
   :type pipeline: TimeSeriesRegressionPipeline
   :param X: Features used to generate new predictions. If problem is multiseries, X should be stacked.
   :type X: pd.DataFrame
   :param y: Target values to compare predictions against. If problem is multiseries, y should be stacked.
   :type y: pd.Series
   :param X_train: Data the pipeline was trained on.
   :type X_train: pd.DataFrame
   :param y_train: Target values for training data.
   :type y_train: pd.Series
   :param dates: Dates corresponding to target values and predictions.
   :type dates: pd.Series
   :param single_series: A single series id value to plot just one series in a multiseries dataset. Defaults to None.
   :type single_series: str

   :returns: Showing the prediction vs actual over time.
   :rtype: plotly.Figure

   :raises ValueError: If the pipeline is not a time-series regression pipeline.


.. py:function:: graph_t_sne(X, n_components=2, perplexity=30.0, learning_rate=200.0, metric='euclidean', marker_line_width=2, marker_size=7, **kwargs)

   Plot high dimensional data into lower dimensional space using t-SNE.

   :param X: Data to be transformed. Must be numeric.
   :type X: np.ndarray, pd.DataFrame
   :param n_components: Dimension of the embedded space. Defaults to 2.
   :type n_components: int
   :param perplexity: Related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Defaults to 30.
   :type perplexity: float
   :param learning_rate: Usually in the range [10.0, 1000.0]. If the cost function gets stuck in a bad local minimum, increasing the learning rate may help. Must be positive. Defaults to 200.
   :type learning_rate: float
   :param metric: The metric to use when calculating distance between instances in a feature array. The default is "euclidean" which is interpreted as the squared euclidean distance.
   :type metric: str
   :param marker_line_width: Determines the line width of the marker boundary. Defaults to 2.
   :type marker_line_width: int
   :param marker_size: Determines the size of the marker. Defaults to 7.
   :type marker_size: int
   :param kwargs: Arbitrary keyword arguments.

   :returns: Figure representing the transformed data.
   :rtype: plotly.Figure

   :raises ValueError: If marker_line_width or marker_size are not valid values.


.. py:function:: t_sne(X, n_components=2, perplexity=30.0, learning_rate=200.0, metric='euclidean', **kwargs)

   Get the transformed output after fitting X to the embedded space using t-SNE.

   :param X: Data to be transformed. Must be numeric.
   :type X: np.ndarray, pd.DataFrame
   :param n_components: Dimension of the embedded space.
   :type n_components: int, optional
   :param perplexity: Related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50.
   :type perplexity: float, optional
   :param learning_rate: Usually in the range [10.0, 1000.0]. If the cost function gets stuck in a bad local minimum, increasing the learning rate may help.
   :type learning_rate: float, optional
   :param metric: The metric to use when calculating distance between instances in a feature array.
   :type metric: str, optional
   :param kwargs: Arbitrary keyword arguments.

   :returns: TSNE output.
   :rtype: np.ndarray (n_samples, n_components)

   :raises ValueError: If specified parameters are not valid values.


.. py:function:: visualize_decision_tree(estimator, max_depth=None, rotate=False, filled=False, filepath=None)

   Generate an image visualizing the decision tree.

   :param estimator: A fitted DecisionTree-based estimator.
   :type estimator: ComponentBase
   :param max_depth: The depth to which the tree should be displayed. If set to None (as by default), tree is fully generated.
   :type max_depth: int, optional
   :param rotate: Orient tree left to right rather than top-down.
   :type rotate: bool, optional
   :param filled: Paint nodes to indicate majority class for classification, extremity of values for regression, or purity of node for multi-output.
   :type filled: bool, optional
   :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
   :type filepath: str, optional

   :returns: DOT object that can be directly displayed in Jupyter notebooks.
   :rtype: graphviz.Source

   :raises ValueError: If estimator is not a decision tree-based estimator.
   :raises NotFittedError: If estimator is not yet fitted.