visualizations =================================================== .. py:module:: evalml.model_understanding.visualizations .. autoapi-nested-parse:: Visualization functions for model understanding. Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: :nosignatures: evalml.model_understanding.visualizations.binary_objective_vs_threshold evalml.model_understanding.visualizations.decision_tree_data_from_estimator evalml.model_understanding.visualizations.decision_tree_data_from_pipeline evalml.model_understanding.visualizations.get_linear_coefficients evalml.model_understanding.visualizations.get_prediction_vs_actual_data evalml.model_understanding.visualizations.get_prediction_vs_actual_over_time_data evalml.model_understanding.visualizations.graph_binary_objective_vs_threshold evalml.model_understanding.visualizations.graph_prediction_vs_actual evalml.model_understanding.visualizations.graph_prediction_vs_actual_over_time evalml.model_understanding.visualizations.graph_t_sne evalml.model_understanding.visualizations.t_sne evalml.model_understanding.visualizations.visualize_decision_tree Contents ~~~~~~~~~~~~~~~~~~~ .. py:function:: binary_objective_vs_threshold(pipeline, X, y, objective, steps=100) Computes objective score as a function of potential binary classification decision thresholds for a fitted binary classification pipeline. :param pipeline: Fitted binary classification pipeline. :type pipeline: BinaryClassificationPipeline obj :param X: The input data used to compute objective score. :type X: pd.DataFrame :param y: The target labels. :type y: pd.Series :param objective: Objective used to score. :type objective: ObjectiveBase obj, str :param steps: Number of intervals to divide and calculate objective score at. :type steps: int :returns: DataFrame with thresholds and the corresponding objective score calculated at each threshold. :rtype: pd.DataFrame :raises ValueError: If objective is not a binary classification objective. :raises ValueError: If objective's `score_needs_proba` is not False. .. py:function:: decision_tree_data_from_estimator(estimator) Return data for a fitted tree in a restructured format. :param estimator: A fitted DecisionTree-based estimator. :type estimator: ComponentBase :returns: An OrderedDict of OrderedDicts describing a tree structure. :rtype: OrderedDict :raises ValueError: If estimator is not a decision tree-based estimator. :raises NotFittedError: If estimator is not yet fitted. .. py:function:: decision_tree_data_from_pipeline(pipeline_) Return data for a fitted pipeline in a restructured format. :param pipeline_: A pipeline with a DecisionTree-based estimator. :type pipeline_: PipelineBase :returns: An OrderedDict of OrderedDicts describing a tree structure. :rtype: OrderedDict :raises ValueError: If estimator is not a decision tree-based estimator. :raises NotFittedError: If estimator is not yet fitted. .. py:function:: get_linear_coefficients(estimator, features=None) Returns a dataframe showing the features with the greatest predictive power for a linear model. :param estimator: Fitted linear model family estimator. :type estimator: Estimator :param features: List of feature names associated with the underlying data. :type features: list[str] :returns: Displaying the features by importance. :rtype: pd.DataFrame :raises ValueError: If the model is not a linear model. :raises NotFittedError: If the model is not yet fitted. .. py:function:: get_prediction_vs_actual_data(y_true, y_pred, outlier_threshold=None) Combines y_true and y_pred into a single dataframe and adds a column for outliers. Used in `graph_prediction_vs_actual()`. :param y_true: The real target values of the data :type y_true: pd.Series, or np.ndarray :param y_pred: The predicted values outputted by the regression model. :type y_pred: pd.Series, or np.ndarray :param outlier_threshold: A positive threshold for what is considered an outlier value. This value is compared to the absolute difference between each value of y_true and y_pred. Values within this threshold will be blue, otherwise they will be yellow. Defaults to None. :type outlier_threshold: int, float :returns: * `prediction`: Predicted values from regression model. * `actual`: Real target values. * `outlier`: Colors indicating which values are in the threshold for what is considered an outlier value. :rtype: pd.DataFrame with the following columns :raises ValueError: If threshold is not positive. .. py:function:: get_prediction_vs_actual_over_time_data(pipeline, X, y, X_train, y_train, dates) Get the data needed for the prediction_vs_actual_over_time plot. :param pipeline: Fitted time series regression pipeline. :type pipeline: TimeSeriesRegressionPipeline :param X: Features used to generate new predictions. :type X: pd.DataFrame :param y: Target values to compare predictions against. :type y: pd.Series :param X_train: Data the pipeline was trained on. :type X_train: pd.DataFrame :param y_train: Target values for training data. :type y_train: pd.Series :param dates: Dates corresponding to target values and predictions. :type dates: pd.Series :returns: Predictions vs. time. :rtype: pd.DataFrame .. py:function:: graph_binary_objective_vs_threshold(pipeline, X, y, objective, steps=100) Generates a plot graphing objective score vs. decision thresholds for a fitted binary classification pipeline. :param pipeline: Fitted pipeline :type pipeline: PipelineBase or subclass :param X: The input data used to score and compute scores :type X: pd.DataFrame :param y: The target labels :type y: pd.Series :param objective: Objective used to score, shown on the y-axis of the graph :type objective: ObjectiveBase obj, str :param steps: Number of intervals to divide and calculate objective score at :type steps: int :returns: plotly.Figure representing the objective score vs. threshold graph generated .. py:function:: graph_prediction_vs_actual(y_true, y_pred, outlier_threshold=None) Generate a scatter plot comparing the true and predicted values. Used for regression plotting. :param y_true: The real target values of the data. :type y_true: pd.Series :param y_pred: The predicted values outputted by the regression model. :type y_pred: pd.Series :param outlier_threshold: A positive threshold for what is considered an outlier value. This value is compared to the absolute difference between each value of y_true and y_pred. Values within this threshold will be blue, otherwise they will be yellow. Defaults to None. :type outlier_threshold: int, float :returns: plotly.Figure representing the predicted vs. actual values graph :raises ValueError: If threshold is not positive. .. py:function:: graph_prediction_vs_actual_over_time(pipeline, X, y, X_train, y_train, dates, single_series=None) Plot the target values and predictions against time on the x-axis. :param pipeline: Fitted time series regression pipeline. :type pipeline: TimeSeriesRegressionPipeline :param X: Features used to generate new predictions. If problem is multiseries, X should be stacked. :type X: pd.DataFrame :param y: Target values to compare predictions against. If problem is multiseries, y should be stacked. :type y: pd.Series :param X_train: Data the pipeline was trained on. :type X_train: pd.DataFrame :param y_train: Target values for training data. :type y_train: pd.Series :param dates: Dates corresponding to target values and predictions. :type dates: pd.Series :param single_series: A single series id value to plot just one series in a multiseries dataset. Defaults to None. :type single_series: str :returns: Showing the prediction vs actual over time. :rtype: plotly.Figure :raises ValueError: If the pipeline is not a time-series regression pipeline. .. py:function:: graph_t_sne(X, n_components=2, perplexity=30.0, learning_rate=200.0, metric='euclidean', marker_line_width=2, marker_size=7, **kwargs) Plot high dimensional data into lower dimensional space using t-SNE. :param X: Data to be transformed. Must be numeric. :type X: np.ndarray, pd.DataFrame :param n_components: Dimension of the embedded space. Defaults to 2. :type n_components: int :param perplexity: Related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Defaults to 30. :type perplexity: float :param learning_rate: Usually in the range [10.0, 1000.0]. If the cost function gets stuck in a bad local minimum, increasing the learning rate may help. Must be positive. Defaults to 200. :type learning_rate: float :param metric: The metric to use when calculating distance between instances in a feature array. The default is "euclidean" which is interpreted as the squared euclidean distance. :type metric: str :param marker_line_width: Determines the line width of the marker boundary. Defaults to 2. :type marker_line_width: int :param marker_size: Determines the size of the marker. Defaults to 7. :type marker_size: int :param kwargs: Arbitrary keyword arguments. :returns: Figure representing the transformed data. :rtype: plotly.Figure :raises ValueError: If marker_line_width or marker_size are not valid values. .. py:function:: t_sne(X, n_components=2, perplexity=30.0, learning_rate=200.0, metric='euclidean', **kwargs) Get the transformed output after fitting X to the embedded space using t-SNE. :param X: Data to be transformed. Must be numeric. :type X: np.ndarray, pd.DataFrame :param n_components: Dimension of the embedded space. :type n_components: int, optional :param perplexity: Related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. :type perplexity: float, optional :param learning_rate: Usually in the range [10.0, 1000.0]. If the cost function gets stuck in a bad local minimum, increasing the learning rate may help. :type learning_rate: float, optional :param metric: The metric to use when calculating distance between instances in a feature array. :type metric: str, optional :param kwargs: Arbitrary keyword arguments. :returns: TSNE output. :rtype: np.ndarray (n_samples, n_components) :raises ValueError: If specified parameters are not valid values. .. py:function:: visualize_decision_tree(estimator, max_depth=None, rotate=False, filled=False, filepath=None) Generate an image visualizing the decision tree. :param estimator: A fitted DecisionTree-based estimator. :type estimator: ComponentBase :param max_depth: The depth to which the tree should be displayed. If set to None (as by default), tree is fully generated. :type max_depth: int, optional :param rotate: Orient tree left to right rather than top-down. :type rotate: bool, optional :param filled: Paint nodes to indicate majority class for classification, extremity of values for regression, or purity of node for multi-output. :type filled: bool, optional :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: DOT object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Source :raises ValueError: If estimator is not a decision tree-based estimator. :raises NotFittedError: If estimator is not yet fitted.