permutation_importance#

Permutation importance methods.

Module Contents#

Functions#

calculate_permutation_importance

Calculates permutation importance for features.

calculate_permutation_importance_one_column

Calculates permutation importance for one column in the original dataframe.

graph_permutation_importance

Generate a bar graph of the pipeline's permutation importance.

Contents#

evalml.model_understanding.permutation_importance.calculate_permutation_importance(pipeline, X, y, objective, n_repeats=5, n_jobs=None, random_seed=0)[source]#

Calculates permutation importance for features.

Parameters
  • pipeline (PipelineBase or subclass) – Fitted pipeline.

  • X (pd.DataFrame) – The input data used to score and compute permutation importance.

  • y (pd.Series) – The target data.

  • objective (str, ObjectiveBase) – Objective to score on.

  • n_repeats (int) – Number of times to permute a feature. Defaults to 5.

  • n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Returns

Mean feature importance scores over a number of shuffles.

Return type

pd.DataFrame

Raises

ValueError – If objective cannot be used with the given pipeline.

evalml.model_understanding.permutation_importance.calculate_permutation_importance_one_column(pipeline, X, y, col_name, objective, n_repeats=5, fast=True, precomputed_features=None, random_seed=0)[source]#

Calculates permutation importance for one column in the original dataframe.

Parameters
  • pipeline (PipelineBase or subclass) – Fitted pipeline.

  • X (pd.DataFrame) – The input data used to score and compute permutation importance.

  • y (pd.Series) – The target data.

  • col_name (str, int) – The column in X to calculate permutation importance for.

  • objective (str, ObjectiveBase) – Objective to score on.

  • n_repeats (int) – Number of times to permute a feature. Defaults to 5.

  • fast (bool) – Whether to use the fast method of calculating the permutation importance or not. Defaults to True.

  • precomputed_features (pd.DataFrame) – Precomputed features necessary to calculate permutation importance using the fast method. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Returns

Mean feature importance scores over a number of shuffles.

Return type

float

Raises
  • ValueError – If pipeline does not support fast permutation importance calculation.

  • ValueError – If precomputed_features is None.

evalml.model_understanding.permutation_importance.graph_permutation_importance(pipeline, X, y, objective, importance_threshold=0)[source]#

Generate a bar graph of the pipeline’s permutation importance.

Parameters
  • pipeline (PipelineBase or subclass) – Fitted pipeline.

  • X (pd.DataFrame) – The input data used to score and compute permutation importance.

  • y (pd.Series) – The target data.

  • objective (str, ObjectiveBase) – Objective to score on.

  • importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to 0.

Returns

plotly.Figure, a bar graph showing features and their respective permutation importance.

Raises

ValueError – If importance_threshold is not greater than or equal to 0.