evalml.model_understanding.prediction_explanations.explain_predictions_best_worst

evalml.model_understanding.prediction_explanations.explain_predictions_best_worst(pipeline, input_features, y_true, num_to_explain=5, top_k_features=3, include_shap_values=False, metric=None)[source]

Creates a report summarizing the top contributing features for the best and worst points in the dataset as measured by error to true labels.

XGBoost models and CatBoost multiclass classifiers are not currently supported.

Parameters
  • pipeline (PipelineBase) – Fitted pipeline whose predictions we want to explain with SHAP.

  • input_features (pd.DataFrame) – Dataframe of input data to evaluate the pipeline on.

  • y_true (pd.Series) – True labels for the input data.

  • num_to_explain (int) – How many of the best, worst, random data points to explain.

  • top_k_features (int) – How many of the highest/lowest contributing feature to include in the table for each data point.

  • include_shap_values (bool) – Whether SHAP values should be included in the table. Default is False.

  • metric (callable) – The metric used to identify the best and worst points in the dataset. Function must accept the true labels and predicted value or probabilities as the only arguments and lower values must be better. By default, this will be the absolute error for regression problems and cross entropy loss for classification problems.

Returns

str - A report with the pipeline name and parameters. For each of the best/worst rows of input_features, the

predicted values, true labels, and metric value will be listed along with a table. The table will have the following columns: Feature Name, Contribution to Prediction, SHAP Value (optional), and each row of the table will correspond to a feature.