binary_classification_pipeline¶
Module Contents¶
Classes Summary¶
Pipeline subclass for all binary classification pipelines. |
Contents¶
-
class
evalml.pipelines.binary_classification_pipeline.
BinaryClassificationPipeline
(component_graph, parameters=None, custom_name=None, random_seed=0)[source]¶ Pipeline subclass for all binary classification pipelines.
- Parameters
component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
custom_name (str) – Custom name for the pipeline. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
problem_type
ProblemTypes.BINARY
Methods
Determine whether the threshold of a binary classification pipeline can be tuned.
Gets the class names for the problem.
Constructs a new pipeline with the same components, parameters, and random state.
Transforms the data by applying all pre-processing components.
Custom name of the pipeline.
Outputs pipeline details including component parameters
Importance associated with each feature. Features dropped by the feature selection are excluded.
Build a classification model. For string and categorical targets, classes are sorted
Returns component by name
Returns hyperparameter ranges from all components as a dictionary.
Generate an image representing the pipeline graph.
Generate a bar graph of the pipeline’s feature importance
Apply component inverse_transform methods to estimator predictions in reverse order.
Loads pipeline at file path
Returns model family of this pipeline.
Name of the pipeline.
Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Optimize the pipeline threshold given the objective to use. Only used for binary problems with objectives whose thresholds can be tuned.
Parameter dictionary for this pipeline.
Make predictions using selected features.
Make probability estimates for labels. Assumes that the column at index 1 represents the positive label case.
Saves pipeline at file path
Evaluate model performance on objectives
A short summary of the pipeline structure, describing the list of components used.
Threshold used to make a prediction. Defaults to None.
Transform the input.
-
can_tune_threshold_with_objective
(self, objective)¶ Determine whether the threshold of a binary classification pipeline can be tuned.
- Parameters
pipeline (PipelineBase) – Binary classification pipeline.
objective – Primary AutoMLSearch objective.
-
property
classes_
(self)¶ Gets the class names for the problem.
-
clone
(self)¶ Constructs a new pipeline with the same components, parameters, and random state.
- Returns
A new instance of this pipeline with identical components, parameters, and random state.
-
compute_estimator_features
(self, X, y=None)¶ Transforms the data by applying all pre-processing components.
- Parameters
X (pd.DataFrame) – Input data to the pipeline to transform.
- Returns
New transformed features.
- Return type
pd.DataFrame
-
static
create_objectives
(objectives)¶
-
property
custom_name
(self)¶ Custom name of the pipeline.
-
describe
(self, return_dict=False)¶ Outputs pipeline details including component parameters
- Parameters
return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.
- Returns
Dictionary of all component parameters if return_dict is True, else None
- Return type
dict
-
property
feature_importance
(self)¶ Importance associated with each feature. Features dropped by the feature selection are excluded.
- Returns
pd.DataFrame including feature names and their corresponding importance
-
fit
(self, X, y)¶ - Build a classification model. For string and categorical targets, classes are sorted
by sorted(set(y)) and then are mapped to values between 0 and n_classes-1.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, np.ndarray) – The target training labels of length [n_samples]
- Returns
self
-
get_component
(self, name)¶ Returns component by name
- Parameters
name (str) – Name of component
- Returns
Component to return
- Return type
Component
-
get_hyperparameter_ranges
(self, custom_hyperparameters)¶ Returns hyperparameter ranges from all components as a dictionary.
- Parameters
custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.
- Returns
Dictionary of hyperparameter ranges for each component in the pipeline.
- Return type
dict
-
graph
(self, filepath=None)¶ Generate an image representing the pipeline graph.
- Parameters
filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
- Returns
Graph object that can be directly displayed in Jupyter notebooks.
- Return type
graphviz.Digraph
-
graph_feature_importance
(self, importance_threshold=0)¶ Generate a bar graph of the pipeline’s feature importance
- Parameters
importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.
- Returns
plotly.Figure, a bar graph showing features and their corresponding importance
-
inverse_transform
(self, y)¶ Apply component inverse_transform methods to estimator predictions in reverse order.
Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
- Parameters
y (pd.Series) – Final component features
-
static
load
(file_path)¶ Loads pipeline at file path
- Parameters
file_path (str) – location to load file
- Returns
PipelineBase object
-
property
model_family
(self)¶ Returns model family of this pipeline.
-
property
name
(self)¶ Name of the pipeline.
-
new
(self, parameters, random_seed=0)¶ - Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Not to be confused with python’s __new__ method.
- Parameters
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
- Returns
A new instance of this pipeline with identical components.
-
optimize_threshold
(self, X, y, y_pred_proba, objective)¶ Optimize the pipeline threshold given the objective to use. Only used for binary problems with objectives whose thresholds can be tuned.
- Parameters
X (pd.DataFrame) – Input features
y (pd.Series) – Input target values
y_pred_proba (pd.Series) – The predicted probabilities of the target outputted by the pipeline
objective (ObjectiveBase) – The objective to threshold with. Must have a tunable threshold.
-
property
parameters
(self)¶ Parameter dictionary for this pipeline.
- Returns
Dictionary of all component parameters.
- Return type
dict
-
predict
(self, X, objective=None)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features]
objective (Object or string) – The objective to use to make predictions
- Returns
Estimated labels
- Return type
pd.Series
-
predict_proba
(self, X)[source]¶ Make probability estimates for labels. Assumes that the column at index 1 represents the positive label case.
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves pipeline at file path
- Parameters
file_path (str) – location to save file
pickle_protocol (int) – the pickle data stream format.
- Returns
None
-
score
(self, X, y, objectives)¶ Evaluate model performance on objectives
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features]
y (pd.Series, or np.ndarray) – True labels of length [n_samples]
objectives (list) – List of objectives to score
- Returns
Ordered dictionary of objective scores
- Return type
dict
-
property
summary
(self)¶ A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder
-
property
threshold
(self)¶ Threshold used to make a prediction. Defaults to None.
-
transform
(self, X, y=None)¶ Transform the input.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples]. Defaults to None.
- Returns
Transformed output.
- Return type
pd.DataFrame