Model Understanding

Package Contents

Functions

binary_objective_vs_threshold

Computes objective score as a function of potential binary classification

calculate_permutation_importance

Calculates permutation importance for features.

calculate_permutation_importance_one_column

Calculates permutation importance for one column in the original dataframe.

confusion_matrix

Confusion matrix for binary and multiclass classification.

explain_predictions

Creates a report summarizing the top contributing features for each data point in the input features.

explain_predictions_best_worst

Creates a report summarizing the top contributing features for the best and worst points in the dataset as measured by error to true labels.

get_linear_coefficients

Returns a dataframe showing the features with the greatest predictive power for a linear model.

get_prediction_vs_actual_data

Combines y_true and y_pred into a single dataframe and adds a column for outliers. Used in graph_prediction_vs_actual().

get_prediction_vs_actual_over_time_data

Get the data needed for the prediction_vs_actual_over_time plot.

graph_binary_objective_vs_threshold

Generates a plot graphing objective score vs. decision thresholds for a fitted binary classification pipeline.

graph_confusion_matrix

Generate and display a confusion matrix plot.

graph_partial_dependence

Create an one-way or two-way partial dependence plot. Passing a single integer or

graph_permutation_importance

Generate a bar graph of the pipeline’s permutation importance.

graph_precision_recall_curve

Generate and display a precision-recall plot.

graph_prediction_vs_actual

Generate a scatter plot comparing the true and predicted values. Used for regression plotting

graph_prediction_vs_actual_over_time

Plot the target values and predictions against time on the x-axis.

graph_roc_curve

Generate and display a Receiver Operating Characteristic (ROC) plot for binary and multiclass classification problems.

graph_t_sne

Plot high dimensional data into lower dimensional space using t-SNE .

normalize_confusion_matrix

Normalizes a confusion matrix.

partial_dependence

Calculates one or two-way partial dependence. If a single integer or

precision_recall_curve

Given labels and binary classifier predicted probabilities, compute and return the data representing a precision-recall curve.

roc_curve

Given labels and classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve. Works with binary or multiclass problems.

t_sne

Get the transformed output after fitting X to the embedded space using t-SNE.

Contents

evalml.model_understanding.binary_objective_vs_threshold(pipeline, X, y, objective, steps=100)[source]
Computes objective score as a function of potential binary classification

decision thresholds for a fitted binary classification pipeline.

Parameters
  • pipeline (BinaryClassificationPipeline obj) – Fitted binary classification pipeline

  • X (pd.DataFrame) – The input data used to compute objective score

  • y (pd.Series) – The target labels

  • objective (ObjectiveBase obj, str) – Objective used to score

  • steps (int) – Number of intervals to divide and calculate objective score at

Returns

DataFrame with thresholds and the corresponding objective score calculated at each threshold

Return type

pd.DataFrame

evalml.model_understanding.calculate_permutation_importance(pipeline, X, y, objective, n_repeats=5, n_jobs=None, random_seed=0)[source]

Calculates permutation importance for features.

Parameters
  • pipeline (PipelineBase or subclass) – Fitted pipeline.

  • X (pd.DataFrame) – The input data used to score and compute permutation importance.

  • y (pd.Series) – The target data.

  • objective (str, ObjectiveBase) – Objective to score on.

  • n_repeats (int) – Number of times to permute a feature. Defaults to 5.

  • n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Returns

Mean feature importance scores over a number of shuffles.

Return type

pd.DataFrame

evalml.model_understanding.calculate_permutation_importance_one_column(pipeline, X, y, col_name, objective, n_repeats=5, fast=True, precomputed_features=None, random_seed=0)[source]

Calculates permutation importance for one column in the original dataframe.

Parameters
  • pipeline (PipelineBase or subclass) – Fitted pipeline.

  • X (pd.DataFrame) – The input data used to score and compute permutation importance.

  • y (pd.Series) – The target data.

  • col_name (str, int) – The column in X to calculate permutation importance for.

  • objective (str, ObjectiveBase) – Objective to score on.

  • n_repeats (int) – Number of times to permute a feature. Defaults to 5.

  • fast (bool) – Whether to use the fast method of calculating the permutation importance or not. Defaults to True.

  • precomputed_features (pd.DataFrame) – Precomputed features necessary to calculate permutation importance using the fast method. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Returns

Mean feature importance scores over a number of shuffles.

Return type

float

evalml.model_understanding.confusion_matrix(y_true, y_predicted, normalize_method='true')[source]

Confusion matrix for binary and multiclass classification.

Parameters
  • y_true (pd.Series or np.ndarray) – True binary labels.

  • y_pred (pd.Series or np.ndarray) – Predictions from a binary classifier.

  • normalize_method ({'true', 'pred', 'all', None}) – Normalization method to use, if not None. Supported options are: ‘true’ to normalize by row, ‘pred’ to normalize by column, or ‘all’ to normalize by all values. Defaults to ‘true’.

Returns

Confusion matrix. The column header represents the predicted labels while row header represents the actual labels.

Return type

pd.DataFrame

evalml.model_understanding.explain_predictions(pipeline, input_features, y, indices_to_explain, top_k_features=3, include_shap_values=False, include_expected_value=False, output_format='text')[source]

Creates a report summarizing the top contributing features for each data point in the input features.

XGBoost and Stacked Ensemble models, as well as CatBoost multiclass classifiers, are not currently supported.

Parameters
  • pipeline (PipelineBase) – Fitted pipeline whose predictions we want to explain with SHAP.

  • input_features (pd.DataFrame) – Dataframe of input data to evaluate the pipeline on.

  • y (pd.Series) – Labels for the input data.

  • indices_to_explain (list(int)) – List of integer indices to explain.

  • top_k_features (int) – How many of the highest/lowest contributing feature to include in the table for each data point. Default is 3.

  • include_shap_values (bool) – Whether SHAP values should be included in the table. Default is False.

  • include_expected_value (bool) – Whether the expected value should be included in the table. Default is False.

  • output_format (str) – Either “text”, “dict”, or “dataframe”. Default is “text”.

Returns

str, dict, or pd.DataFrame - A report explaining the top contributing features to each prediction for each row of input_features.

The report will include the feature names, prediction contribution, and SHAP Value (optional).

Raises
  • ValueError – if input_features is empty.

  • ValueError – if an output_format outside of “text”, “dict” or “dataframe is provided.

  • ValueError – if the requested index falls outside the input_feature’s boundaries.

evalml.model_understanding.explain_predictions_best_worst(pipeline, input_features, y_true, num_to_explain=5, top_k_features=3, include_shap_values=False, metric=None, output_format='text', callback=None)[source]

Creates a report summarizing the top contributing features for the best and worst points in the dataset as measured by error to true labels.

XGBoost and Stacked Ensemble models, as well as CatBoost multiclass classifiers, are not currently supported.

Parameters
  • pipeline (PipelineBase) – Fitted pipeline whose predictions we want to explain with SHAP.

  • input_features (pd.DataFrame) – Input data to evaluate the pipeline on.

  • y_true (pd.Series) – True labels for the input data.

  • num_to_explain (int) – How many of the best, worst, random data points to explain.

  • top_k_features (int) – How many of the highest/lowest contributing feature to include in the table for each data point.

  • include_shap_values (bool) – Whether SHAP values should be included in the table. Default is False.

  • metric (callable) – The metric used to identify the best and worst points in the dataset. Function must accept the true labels and predicted value or probabilities as the only arguments and lower values must be better. By default, this will be the absolute error for regression problems and cross entropy loss for classification problems.

  • output_format (str) – Either “text” or “dict”. Default is “text”.

  • callback (callable) – Function to be called with incremental updates. Has the following parameters: - progress_stage: stage of computation - time_elapsed: total time in seconds that has elapsed since start of call

Returns

str, dict, or pd.DataFrame - A report explaining the top contributing features for the best/worst predictions in the input_features.

For each of the best/worst rows of input_features, the predicted values, true labels, metric value, feature names, prediction contribution, and SHAP Value (optional) will be listed.

Raises
  • ValueError – if input_features does not have more than twice the requested features to explain.

  • ValueError – if y_true and input_features have mismatched lengths.

  • ValueError – if an output_format outside of “text”, “dict” or “dataframe is provided.

evalml.model_understanding.get_linear_coefficients(estimator, features=None)[source]

Returns a dataframe showing the features with the greatest predictive power for a linear model.

Parameters
  • estimator (Estimator) – Fitted linear model family estimator.

  • features (list[str]) – List of feature names associated with the underlying data.

Returns

Displaying the features by importance.

Return type

pd.DataFrame

evalml.model_understanding.get_prediction_vs_actual_data(y_true, y_pred, outlier_threshold=None)[source]

Combines y_true and y_pred into a single dataframe and adds a column for outliers. Used in graph_prediction_vs_actual().

Parameters
  • y_true (pd.Series, or np.ndarray) – The real target values of the data

  • y_pred (pd.Series, or np.ndarray) – The predicted values outputted by the regression model.

  • outlier_threshold (int, float) – A positive threshold for what is considered an outlier value. This value is compared to the absolute difference between each value of y_true and y_pred. Values within this threshold will be blue, otherwise they will be yellow. Defaults to None

Returns

  • prediction: Predicted values from regression model.

  • actual: Real target values.

  • outlier: Colors indicating which values are in the threshold for what is considered an outlier value.

Return type

pd.DataFrame with the following columns

evalml.model_understanding.get_prediction_vs_actual_over_time_data(pipeline, X, y, dates)[source]

Get the data needed for the prediction_vs_actual_over_time plot.

Parameters
  • pipeline (TimeSeriesRegressionPipeline) – Fitted time series regression pipeline.

  • X (pd.DataFrame) – Features used to generate new predictions.

  • y (pd.Series) – Target values to compare predictions against.

  • dates (pd.Series) – Dates corresponding to target values and predictions.

Returns

pd.DataFrame

evalml.model_understanding.graph_binary_objective_vs_threshold(pipeline, X, y, objective, steps=100)[source]

Generates a plot graphing objective score vs. decision thresholds for a fitted binary classification pipeline.

Parameters
  • pipeline (PipelineBase or subclass) – Fitted pipeline

  • X (pd.DataFrame) – The input data used to score and compute scores

  • y (pd.Series) – The target labels

  • objective (ObjectiveBase obj, str) – Objective used to score, shown on the y-axis of the graph

  • steps (int) – Number of intervals to divide and calculate objective score at

Returns

plotly.Figure representing the objective score vs. threshold graph generated

evalml.model_understanding.graph_confusion_matrix(y_true, y_pred, normalize_method='true', title_addition=None)[source]

Generate and display a confusion matrix plot.

If normalize_method is set, hover text will show raw count, otherwise hover text will show count normalized with method ‘true’.

Parameters
  • y_true (pd.Series or np.ndarray) – True binary labels.

  • y_pred (pd.Series or np.ndarray) – Predictions from a binary classifier.

  • normalize_method ({'true', 'pred', 'all', None}) – Normalization method to use, if not None. Supported options are: ‘true’ to normalize by row, ‘pred’ to normalize by column, or ‘all’ to normalize by all values. Defaults to ‘true’.

  • title_addition (str or None) – if not None, append to plot title. Defaults to None.

Returns

plotly.Figure representing the confusion matrix plot generated

evalml.model_understanding.graph_partial_dependence(pipeline, X, features, class_label=None, grid_resolution=100, kind='average')[source]

Create an one-way or two-way partial dependence plot. Passing a single integer or string as features will create a one-way partial dependence plot with the feature values plotted against the partial dependence. Passing features a tuple of int/strings will create a two-way partial dependence plot with a contour of feature[0] in the y-axis, feature[1] in the x-axis and the partial dependence in the z-axis.

Parameters
  • pipeline (PipelineBase or subclass) – Fitted pipeline

  • X (pd.DataFrame, np.ndarray) – The input data used to generate a grid of values for feature where partial dependence will be calculated at

  • features (int, string, tuple[int or string]) – The target feature for which to create the partial dependence plot for. If features is an int, it must be the index of the feature to use. If features is a string, it must be a valid column name in X. If features is a tuple of strings, it must contain valid column int/names in X.

  • class_label (string, optional) – Name of class to plot for multiclass problems. If None, will plot the partial dependence for each class. This argument does not change behavior for regression or binary classification pipelines. For binary classification, the partial dependence for the positive label will always be displayed. Defaults to None.

  • grid_resolution (int) – Number of samples of feature(s) for partial dependence plot

  • {'average' (kind) – Type of partial dependence to plot. ‘average’ creates a regular partial dependence (PD) graph, ‘individual’ creates an individual conditional expectation (ICE) plot, and ‘both’ creates a single-figure PD and ICE plot. ICE plots can only be shown for one-way partial dependence plots.

  • 'individual' – Type of partial dependence to plot. ‘average’ creates a regular partial dependence (PD) graph, ‘individual’ creates an individual conditional expectation (ICE) plot, and ‘both’ creates a single-figure PD and ICE plot. ICE plots can only be shown for one-way partial dependence plots.

  • 'both'} – Type of partial dependence to plot. ‘average’ creates a regular partial dependence (PD) graph, ‘individual’ creates an individual conditional expectation (ICE) plot, and ‘both’ creates a single-figure PD and ICE plot. ICE plots can only be shown for one-way partial dependence plots.

Returns

figure object containing the partial dependence data for plotting

Return type

plotly.graph_objects.Figure

Raises
  • PartialDependenceError – if a graph is requested for a class name that isn’t present in the pipeline.

  • PartialDependenceError – if an ICE plot is requested for a two-way partial dependence.

evalml.model_understanding.graph_permutation_importance(pipeline, X, y, objective, importance_threshold=0)[source]

Generate a bar graph of the pipeline’s permutation importance.

Parameters
  • pipeline (PipelineBase or subclass) – Fitted pipeline

  • X (pd.DataFrame) – The input data used to score and compute permutation importance

  • y (pd.Series) – The target data

  • objective (str, ObjectiveBase) – Objective to score on

  • importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.

Returns

plotly.Figure, a bar graph showing features and their respective permutation importance.

evalml.model_understanding.graph_precision_recall_curve(y_true, y_pred_proba, title_addition=None)[source]

Generate and display a precision-recall plot.

Parameters
  • y_true (pd.Series or np.ndarray) – True binary labels.

  • y_pred_proba (pd.Series or np.ndarray) – Predictions from a binary classifier, before thresholding has been applied. Note this should be the predicted probability for the “true” label.

  • title_addition (str or None) – If not None, append to plot title. Default None.

Returns

plotly.Figure representing the precision-recall plot generated

evalml.model_understanding.graph_prediction_vs_actual(y_true, y_pred, outlier_threshold=None)[source]

Generate a scatter plot comparing the true and predicted values. Used for regression plotting

Parameters
  • y_true (pd.Series) – The real target values of the data

  • y_pred (pd.Series) – The predicted values outputted by the regression model.

  • outlier_threshold (int, float) – A positive threshold for what is considered an outlier value. This value is compared to the absolute difference between each value of y_true and y_pred. Values within this threshold will be blue, otherwise they will be yellow. Defaults to None

Returns

plotly.Figure representing the predicted vs. actual values graph

evalml.model_understanding.graph_prediction_vs_actual_over_time(pipeline, X, y, dates)[source]

Plot the target values and predictions against time on the x-axis.

Parameters
  • pipeline (TimeSeriesRegressionPipeline) – Fitted time series regression pipeline.

  • X (pd.DataFrame) – Features used to generate new predictions.

  • y (pd.Series) – Target values to compare predictions against.

  • dates (pd.Series) – Dates corresponding to target values and predictions.

Returns

Showing the prediction vs actual over time.

Return type

plotly.Figure

evalml.model_understanding.graph_roc_curve(y_true, y_pred_proba, custom_class_names=None, title_addition=None)[source]

Generate and display a Receiver Operating Characteristic (ROC) plot for binary and multiclass classification problems.

Parameters
  • y_true (pd.Series or np.ndarray) – True labels.

  • y_pred_proba (pd.Series or np.ndarray) – Predictions from a classifier, before thresholding has been applied. Note this should a one dimensional array with the predicted probability for the “true” label in the binary case.

  • custom_class_labels (list or None) – If not None, custom labels for classes. Default None.

  • title_addition (str or None) – if not None, append to plot title. Default None.

Returns

plotly.Figure representing the ROC plot generated

evalml.model_understanding.graph_t_sne(X, n_components=2, perplexity=30.0, learning_rate=200.0, metric='euclidean', marker_line_width=2, marker_size=7, **kwargs)[source]

Plot high dimensional data into lower dimensional space using t-SNE .

Parameters
  • X (np.ndarray, pd.DataFrame) – Data to be transformed. Must be numeric.

  • n_components (int, optional) – Dimension of the embedded space.

  • perplexity (float, optional) – Related to the number of nearest neighbors that is used in other manifold learning

  • Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. (algorithms.) –

  • learning_rate (float, optional) – Usually in the range [10.0, 1000.0]. If the cost function gets stuck in a bad

  • minimum (local) –

  • the learning rate may help. (increasing) –

  • metric (str, optional) – The metric to use when calculating distance between instances in a feature array.

  • marker_line_width (int, optional) – Determines the line width of the marker boundary.

  • marker_size (int, optional) – Determines the size of the marker.

Returns

plotly.Figure representing the transformed data

evalml.model_understanding.normalize_confusion_matrix(conf_mat, normalize_method='true')[source]

Normalizes a confusion matrix.

Parameters
  • conf_mat (pd.DataFrame or np.ndarray) – Confusion matrix to normalize.

  • normalize_method ({'true', 'pred', 'all'}) – Normalization method. Supported options are: ‘true’ to normalize by row, ‘pred’ to normalize by column, or ‘all’ to normalize by all values. Defaults to ‘true’.

Returns

normalized version of the input confusion matrix. The column header represents the predicted labels while row header represents the actual labels.

Return type

pd.DataFrame

evalml.model_understanding.partial_dependence(pipeline, X, features, percentiles=(0.05, 0.95), grid_resolution=100, kind='average')[source]

Calculates one or two-way partial dependence. If a single integer or string is given for features, one-way partial dependence is calculated. If a tuple of two integers or strings is given, two-way partial dependence is calculated with the first feature in the y-axis and second feature in the x-axis.

Parameters
  • pipeline (PipelineBase or subclass) – Fitted pipeline

  • X (pd.DataFrame, np.ndarray) – The input data used to generate a grid of values for feature where partial dependence will be calculated at

  • features (int, string, tuple[int or string]) – The target feature for which to create the partial dependence plot for. If features is an int, it must be the index of the feature to use. If features is a string, it must be a valid column name in X. If features is a tuple of int/strings, it must contain valid column integers/names in X.

  • percentiles (tuple[float]) – The lower and upper percentile used to create the extreme values for the grid. Must be in [0, 1]. Defaults to (0.05, 0.95).

  • grid_resolution (int) – Number of samples of feature(s) for partial dependence plot. If this value is less than the maximum number of categories present in categorical data within X, it will be set to the max number of categories + 1. Defaults to 100.

  • {'average' (kind) – The type of predictions to return. ‘individual’ will return the predictions for all of the points in the grid for each sample in X. ‘average’ will return the predictions for all of the points in the grid but averaged over all of the samples in X.

  • 'individual' – The type of predictions to return. ‘individual’ will return the predictions for all of the points in the grid for each sample in X. ‘average’ will return the predictions for all of the points in the grid but averaged over all of the samples in X.

  • 'both'} – The type of predictions to return. ‘individual’ will return the predictions for all of the points in the grid for each sample in X. ‘average’ will return the predictions for all of the points in the grid but averaged over all of the samples in X.

Returns

When kind=’average’: DataFrame with averaged predictions for all points in the grid averaged over all samples of X and the values used to calculate those predictions.

When kind=’individual’: DataFrame with individual predictions for all points in the grid for each sample of X and the values used to calculate those predictions. If a two-way partial dependence is calculated, then the result is a list of DataFrames with each DataFrame representing one sample’s predictions.

When kind=’both’: A tuple consisting of the averaged predictions (in a DataFrame) over all samples of X and the individual predictions (in a list of DataFrames) for each sample of X.

In the one-way case: The dataframe will contain two columns, “feature_values” (grid points at which the partial dependence was calculated) and “partial_dependence” (the partial dependence at that feature value). For classification problems, there will be a third column called “class_label” (the class label for which the partial dependence was calculated). For binary classification, the partial dependence is only calculated for the “positive” class.

In the two-way case: The data frame will contain grid_resolution number of columns and rows where the index and column headers are the sampled values of the first and second features, respectively, used to make the partial dependence contour. The values of the data frame contain the partial dependence data for each feature value pair.

Return type

pd.DataFrame, list(pd.DataFrame), or tuple(pd.DataFrame, list(pd.DataFrame))

Raises
  • PartialDependenceError – if the user provides a tuple of not exactly two features.

  • PartialDependenceError – if the provided pipeline isn’t fitted.

  • PartialDependenceError – if the provided pipeline is a Baseline pipeline.

  • PartialDependenceError – if any of the features passed in are completely NaN

  • PartialDependenceError – if any of the features are low-variance. Defined as having one value occurring more than the upper percentile passed by the user. By default 95%.

evalml.model_understanding.precision_recall_curve(y_true, y_pred_proba, pos_label_idx=- 1)[source]

Given labels and binary classifier predicted probabilities, compute and return the data representing a precision-recall curve.

Parameters
  • y_true (pd.Series or np.ndarray) – True binary labels.

  • y_pred_proba (pd.Series or np.ndarray) – Predictions from a binary classifier, before thresholding has been applied. Note this should be the predicted probability for the “true” label.

  • pos_label_idx (int) – the column index corresponding to the positive class. If predicted probabilities are two-dimensional, this will be used to access the probabilities for the positive class.

Returns

Dictionary containing metrics used to generate a precision-recall plot, with the following keys:

  • precision: Precision values.

  • recall: Recall values.

  • thresholds: Threshold values used to produce the precision and recall.

  • auc_score: The area under the ROC curve.

Return type

list

evalml.model_understanding.roc_curve(y_true, y_pred_proba)[source]

Given labels and classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve. Works with binary or multiclass problems.

Parameters
  • y_true (pd.Series or np.ndarray) – True labels.

  • y_pred_proba (pd.Series or np.ndarray) – Predictions from a classifier, before thresholding has been applied.

Returns

A list of dictionaries (with one for each class) is returned. Binary classification problems return a list with one dictionary.
Each dictionary contains metrics used to generate an ROC plot with the following keys:
  • fpr_rate: False positive rate.

  • tpr_rate: True positive rate.

  • threshold: Threshold values used to produce each pair of true/false positive rates.

  • auc_score: The area under the ROC curve.

Return type

list(dict)

evalml.model_understanding.t_sne(X, n_components=2, perplexity=30.0, learning_rate=200.0, metric='euclidean', **kwargs)[source]

Get the transformed output after fitting X to the embedded space using t-SNE.

Arguments:

X (np.ndarray, pd.DataFrame): Data to be transformed. Must be numeric. n_components (int, optional): Dimension of the embedded space. perplexity (float, optional): Related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. learning_rate (float, optional): Usually in the range [10.0, 1000.0]. If the cost function gets stuck in a bad local minimum, increasing the learning rate may help. metric (str, optional): The metric to use when calculating distance between instances in a feature array.

Returns

np.ndarray (n_samples, n_components)