binary_classification_pipeline¶

Module Contents¶

Classes Summary¶

BinaryClassificationPipeline

Pipeline subclass for all binary classification pipelines.

Contents¶

class evalml.pipelines.binary_classification_pipeline.BinaryClassificationPipeline(component_graph, parameters=None, custom_name=None, random_seed=0)[source]¶

Pipeline subclass for all binary classification pipelines.

Parameters

component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
custom_name (str) – Custom name for the pipeline. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

problem_type

ProblemTypes.BINARY

Methods

`can_tune_threshold_with_objective`	Determine whether the threshold of a binary classification pipeline can be tuned.
`classes_`	Gets the class names for the problem.
`clone`	Constructs a new pipeline with the same components, parameters, and random state.
`compute_estimator_features`	Transforms the data by applying all pre-processing components.
`create_objectives`
`custom_name`	Custom name of the pipeline.
`describe`	Outputs pipeline details including component parameters
`feature_importance`	Importance associated with each feature. Features dropped by the feature selection are excluded.
`fit`	Build a classification model. For string and categorical targets, classes are sorted
`get_component`	Returns component by name
`get_hyperparameter_ranges`	Returns hyperparameter ranges from all components as a dictionary.
`graph`	Generate an image representing the pipeline graph.
`graph_feature_importance`	Generate a bar graph of the pipeline’s feature importance
`inverse_transform`	Apply component inverse_transform methods to estimator predictions in reverse order.
`load`	Loads pipeline at file path
`model_family`	Returns model family of this pipeline.
`name`	Name of the pipeline.
`new`	Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
`optimize_threshold`	Optimize the pipeline threshold given the objective to use. Only used for binary problems with objectives whose thresholds can be tuned.
`parameters`	Parameter dictionary for this pipeline.
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels. Assumes that the column at index 1 represents the positive label case.
`save`	Saves pipeline at file path
`score`	Evaluate model performance on objectives
`summary`	A short summary of the pipeline structure, describing the list of components used.
`threshold`	Threshold used to make a prediction. Defaults to None.
`transform`	Transform the input.

can_tune_threshold_with_objective(self, objective)¶

Determine whether the threshold of a binary classification pipeline can be tuned.

Parameters

pipeline (PipelineBase) – Binary classification pipeline.
objective – Primary AutoMLSearch objective.

property classes_(self)¶: Gets the class names for the problem.

clone(self)¶

Constructs a new pipeline with the same components, parameters, and random state.

Returns: A new instance of this pipeline with identical components, parameters, and random state.

compute_estimator_features(self, X, y=None)¶

Transforms the data by applying all pre-processing components.

Parameters: X (pd.DataFrame) – Input data to the pipeline to transform.
Returns: New transformed features.
Return type: pd.DataFrame

static create_objectives(objectives)¶

property custom_name(self)¶: Custom name of the pipeline.

describe(self, return_dict=False)¶

Outputs pipeline details including component parameters

Parameters: return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.
Returns: Dictionary of all component parameters if return_dict is True, else None
Return type: dict

property feature_importance(self)¶

Importance associated with each feature. Features dropped by the feature selection are excluded.

Returns: pd.DataFrame including feature names and their corresponding importance

fit(self, X, y)¶

Build a classification model. For string and categorical targets, classes are sorted: by sorted(set(y)) and then are mapped to values between 0 and n_classes-1.

Parameters

X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, np.ndarray) – The target training labels of length [n_samples]

Returns

self

get_component(self, name)¶

Returns component by name

Parameters: name (str) – Name of component
Returns: Component to return
Return type: Component

get_hyperparameter_ranges(self, custom_hyperparameters)¶

Returns hyperparameter ranges from all components as a dictionary.

Parameters: custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.
Returns: Dictionary of hyperparameter ranges for each component in the pipeline.
Return type: dict

graph(self, filepath=None)¶

Generate an image representing the pipeline graph.

Parameters: filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
Returns: Graph object that can be directly displayed in Jupyter notebooks.
Return type: graphviz.Digraph

graph_feature_importance(self, importance_threshold=0)¶

Generate a bar graph of the pipeline’s feature importance

Parameters: importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.
Returns: plotly.Figure, a bar graph showing features and their corresponding importance

inverse_transform(self, y)¶

Apply component inverse_transform methods to estimator predictions in reverse order.

Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).

Parameters: y (pd.Series) – Final component features

static load(file_path)¶

Loads pipeline at file path

Parameters: file_path (str) – location to load file
Returns: PipelineBase object

property model_family(self)¶: Returns model family of this pipeline.

property name(self)¶: Name of the pipeline.

new(self, parameters, random_seed=0)¶

Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.: Not to be confused with python’s __new__ method.

Parameters

parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Returns

A new instance of this pipeline with identical components.

optimize_threshold(self, X, y, y_pred_proba, objective)¶

Optimize the pipeline threshold given the objective to use. Only used for binary problems with objectives whose thresholds can be tuned.

Parameters

X (pd.DataFrame) – Input features
y (pd.Series) – Input target values
y_pred_proba (pd.Series) – The predicted probabilities of the target outputted by the pipeline
objective (ObjectiveBase) – The objective to threshold with. Must have a tunable threshold.

property parameters(self)¶

Parameter dictionary for this pipeline.

Returns: Dictionary of all component parameters.
Return type: dict

predict(self, X, objective=None)¶

Make predictions using selected features.

Parameters

X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features]
objective (Object or string) – The objective to use to make predictions

Returns

Estimated labels

Return type

pd.Series

predict_proba(self, X)[source]¶

Make probability estimates for labels. Assumes that the column at index 1 represents the positive label case.

Parameters: X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features]
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves pipeline at file path

Parameters

file_path (str) – location to save file
pickle_protocol (int) – the pickle data stream format.

Returns

None

score(self, X, y, objectives)¶

Evaluate model performance on objectives

Parameters

X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features]
y (pd.Series, or np.ndarray) – True labels of length [n_samples]
objectives (list) – List of objectives to score

Returns

Ordered dictionary of objective scores

Return type

dict

property summary(self)¶: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder

property threshold(self)¶: Threshold used to make a prediction. Defaults to None.

transform(self, X, y=None)¶

Transform the input.

Parameters

X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples]. Defaults to None.

Returns

Transformed output.

Return type

pd.DataFrame

utils binary_classification_pipeline_mixin