partial_dependence_functions#
Top level functions for running partial dependence.
Module Contents#
Functions#
Create an oneway or twoway partial dependence plot. 

Calculates one or twoway partial dependence. 
Contents#
 evalml.model_understanding.partial_dependence_functions.graph_partial_dependence(pipeline, X, features, class_label=None, grid_resolution=100, kind='average')[source]#
Create an oneway or twoway partial dependence plot.
Passing a single integer or string as features will create a oneway partial dependence plot with the feature values plotted against the partial dependence. Passing features a tuple of int/strings will create a twoway partial dependence plot with a contour of feature[0] in the yaxis, feature[1] in the xaxis and the partial dependence in the zaxis.
 Parameters
pipeline (PipelineBase or subclass) – Fitted pipeline.
X (pd.DataFrame, np.ndarray) – The input data used to generate a grid of values for feature where partial dependence will be calculated at.
features (int, string, tuple[int or string]) – The target feature for which to create the partial dependence plot for. If features is an int, it must be the index of the feature to use. If features is a string, it must be a valid column name in X. If features is a tuple of strings, it must contain valid column int/names in X.
class_label (string, optional) – Name of class to plot for multiclass problems. If None, will plot the partial dependence for each class. This argument does not change behavior for regression or binary classification pipelines. For binary classification, the partial dependence for the positive label will always be displayed. Defaults to None.
grid_resolution (int) – Number of samples of feature(s) for partial dependence plot.
kind ({'average', 'individual', 'both'}) – Type of partial dependence to plot. ‘average’ creates a regular partial dependence (PD) graph, ‘individual’ creates an individual conditional expectation (ICE) plot, and ‘both’ creates a singlefigure PD and ICE plot. ICE plots can only be shown for oneway partial dependence plots.
 Returns
figure object containing the partial dependence data for plotting
 Return type
plotly.graph_objects.Figure
 Raises
PartialDependenceError – if a graph is requested for a class name that isn’t present in the pipeline.
PartialDependenceError – if an ICE plot is requested for a twoway partial dependence.
 evalml.model_understanding.partial_dependence_functions.partial_dependence(pipeline, X, features, percentiles=(0.05, 0.95), grid_resolution=100, kind='average')[source]#
Calculates one or twoway partial dependence.
If a single integer or string is given for features, oneway partial dependence is calculated. If a tuple of two integers or strings is given, twoway partial dependence is calculated with the first feature in the yaxis and second feature in the xaxis.
 Parameters
pipeline (PipelineBase or subclass) – Fitted pipeline
X (pd.DataFrame, np.ndarray) – The input data used to generate a grid of values for feature where partial dependence will be calculated at
features (int, string, tuple[int or string]) – The target feature for which to create the partial dependence plot for. If features is an int, it must be the index of the feature to use. If features is a string, it must be a valid column name in X. If features is a tuple of int/strings, it must contain valid column integers/names in X.
percentiles (tuple[float]) – The lower and upper percentile used to create the extreme values for the grid. Must be in [0, 1]. Defaults to (0.05, 0.95).
grid_resolution (int) – Number of samples of feature(s) for partial dependence plot. If this value is less than the maximum number of categories present in categorical data within X, it will be set to the max number of categories + 1. Defaults to 100.
kind ({'average', 'individual', 'both'}) – The type of predictions to return. ‘individual’ will return the predictions for all of the points in the grid for each sample in X. ‘average’ will return the predictions for all of the points in the grid but averaged over all of the samples in X.
 Returns
When kind=’average’: DataFrame with averaged predictions for all points in the grid averaged over all samples of X and the values used to calculate those predictions.
When kind=’individual’: DataFrame with individual predictions for all points in the grid for each sample of X and the values used to calculate those predictions. If a twoway partial dependence is calculated, then the result is a list of DataFrames with each DataFrame representing one sample’s predictions.
When kind=’both’: A tuple consisting of the averaged predictions (in a DataFrame) over all samples of X and the individual predictions (in a list of DataFrames) for each sample of X.
In the oneway case: The dataframe will contain two columns, “feature_values” (grid points at which the partial dependence was calculated) and “partial_dependence” (the partial dependence at that feature value). For classification problems, there will be a third column called “class_label” (the class label for which the partial dependence was calculated). For binary classification, the partial dependence is only calculated for the “positive” class.
In the twoway case: The data frame will contain grid_resolution number of columns and rows where the index and column headers are the sampled values of the first and second features, respectively, used to make the partial dependence contour. The values of the data frame contain the partial dependence data for each feature value pair.
 Return type
pd.DataFrame, list(pd.DataFrame), or tuple(pd.DataFrame, list(pd.DataFrame))
 Raises
ValueError – Error during call to scikitlearn’s partial dependence method.
Exception – All other errors during calculation.
PartialDependenceError – if the user provides a tuple of not exactly two features.
PartialDependenceError – if the provided pipeline isn’t fitted.
PartialDependenceError – if the provided pipeline is a Baseline pipeline.
PartialDependenceError – if any of the features passed in are completely NaN
PartialDependenceError – if any of the features are lowvariance. Defined as having one value occurring more than the upper percentile passed by the user. By default 95%.