decision_boundary#

Model Understanding for decision boundary on Binary Classification problems.

Module Contents#

Functions#

find_confusion_matrix_per_thresholds

Gets the confusion matrix and histogram bins for each threshold as well as the best threshold per objective. Only works with Binary Classification Pipelines.

Contents#

evalml.model_understanding.decision_boundary.find_confusion_matrix_per_thresholds(pipeline, X, y, n_bins=None, top_k=5, to_json=False)[source]#

Gets the confusion matrix and histogram bins for each threshold as well as the best threshold per objective. Only works with Binary Classification Pipelines.

Parameters
  • pipeline (PipelineBase) – A fitted Binary Classification Pipeline to get the confusion matrix with.

  • X (pd.DataFrame) – The input features.

  • y (pd.Series) – The input target.

  • n_bins (int) – The number of bins to use to calculate the threshold values. Defaults to None, which will default to using Freedman-Diaconis rule.

  • top_k (int) – The maximum number of row indices per bin to include as samples. -1 includes all row indices that fall between the bins. Defaults to 5.

  • to_json (bool) – Whether or not to return a json output. If False, returns the (DataFrame, dict) tuple, otherwise returns a json.

Returns

The dataframe has the actual positive histogram, actual negative histogram,

the confusion matrix, and a sample of rows that fall in the bin, all for each threshold value. The threshold value, represented through the dataframe index, represents the cutoff threshold at that value. The dictionary contains the ideal threshold and score per objective, keyed by objective name. If json, returns the info for both the dataframe and dictionary as a json output.

Return type

(tuple(pd.DataFrame, dict)), json)

Raises

ValueError – If the pipeline isn’t a binary classification pipeline or isn’t yet fitted on data.