multiclass_classification_pipeline

Module Contents

Classes Summary

MulticlassClassificationPipeline

Pipeline subclass for all multiclass classification pipelines.

Contents

class evalml.pipelines.multiclass_classification_pipeline.MulticlassClassificationPipeline(component_graph, parameters=None, custom_name=None, random_seed=0)[source]

Pipeline subclass for all multiclass classification pipelines.

Parameters
  • component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]

  • parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.

  • custom_name (str) – Custom name for the pipeline. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

problem_type

ProblemTypes.MULTICLASS

Methods

can_tune_threshold_with_objective

Determine whether the threshold of a binary classification pipeline can be tuned.

classes_

Gets the class names for the problem.

clone

Constructs a new pipeline with the same components, parameters, and random state.

compute_estimator_features

Transforms the data by applying all pre-processing components.

create_objectives

custom_name

Custom name of the pipeline.

describe

Outputs pipeline details including component parameters

feature_importance

Importance associated with each feature. Features dropped by the feature selection are excluded.

fit

Build a classification model. For string and categorical targets, classes are sorted

get_component

Returns component by name

graph

Generate an image representing the pipeline graph

graph_feature_importance

Generate a bar graph of the pipeline’s feature importance

inverse_transform

Apply component inverse_transform methods to estimator predictions in reverse order.

linearized_component_graph

A component graph in list form. Note that this is not guaranteed to be in proper component computation order

load

Loads pipeline at file path

model_family

Returns model family of this pipeline template

name

Name of the pipeline.

new

Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.

parameters

Parameter dictionary for this pipeline

predict

Make predictions using selected features.

predict_proba

Make probability estimates for labels.

save

Saves pipeline at file path

score

Evaluate model performance on objectives

summary

A short summary of the pipeline structure, describing the list of components used.

can_tune_threshold_with_objective(self, objective)

Determine whether the threshold of a binary classification pipeline can be tuned.

Parameters
  • pipeline (PipelineBase) – Binary classification pipeline.

  • objective – Primary AutoMLSearch objective.

property classes_(self)

Gets the class names for the problem.

clone(self)

Constructs a new pipeline with the same components, parameters, and random state.

Returns

A new instance of this pipeline with identical components, parameters, and random state.

compute_estimator_features(self, X, y=None)

Transforms the data by applying all pre-processing components.

Parameters

X (pd.DataFrame) – Input data to the pipeline to transform.

Returns

New transformed features.

Return type

pd.DataFrame

static create_objectives(objectives)
property custom_name(self)

Custom name of the pipeline.

describe(self, return_dict=False)

Outputs pipeline details including component parameters

Parameters

return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.

Returns

Dictionary of all component parameters if return_dict is True, else None

Return type

dict

property feature_importance(self)

Importance associated with each feature. Features dropped by the feature selection are excluded.

Returns

pd.DataFrame including feature names and their corresponding importance

fit(self, X, y)
Build a classification model. For string and categorical targets, classes are sorted

by sorted(set(y)) and then are mapped to values between 0 and n_classes-1.

Parameters
  • X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, np.ndarray) – The target training labels of length [n_samples]

Returns

self

get_component(self, name)

Returns component by name

Parameters

name (str) – Name of component

Returns

Component to return

Return type

Component

graph(self, filepath=None)

Generate an image representing the pipeline graph

Parameters

filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.

Returns

Graph object that can be directly displayed in Jupyter notebooks.

Return type

graphviz.Digraph

graph_feature_importance(self, importance_threshold=0)

Generate a bar graph of the pipeline’s feature importance

Parameters

importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.

Returns

plotly.Figure, a bar graph showing features and their corresponding importance

inverse_transform(self, y)

Apply component inverse_transform methods to estimator predictions in reverse order.

Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).

Parameters

y (pd.Series) – Final component features

property linearized_component_graph(self)

A component graph in list form. Note that this is not guaranteed to be in proper component computation order

static load(file_path)

Loads pipeline at file path

Parameters

file_path (str) – location to load file

Returns

PipelineBase object

property model_family(self)

Returns model family of this pipeline template

property name(self)

Name of the pipeline.

new(self, parameters, random_seed=0)
Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.

Not to be confused with python’s __new__ method.

Parameters
  • parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Returns

A new instance of this pipeline with identical components.

property parameters(self)

Parameter dictionary for this pipeline

Returns

Dictionary of all component parameters

Return type

dict

predict(self, X, objective=None)

Make predictions using selected features.

Parameters
  • X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features]

  • objective (Object or string) – The objective to use to make predictions

Returns

Estimated labels

Return type

pd.Series

predict_proba(self, X)

Make probability estimates for labels.

Parameters

X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features]

Returns

Probability estimates

Return type

pd.DataFrame

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)

Saves pipeline at file path

Parameters
  • file_path (str) – location to save file

  • pickle_protocol (int) – the pickle data stream format.

Returns

None

score(self, X, y, objectives)

Evaluate model performance on objectives

Parameters
  • X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features]

  • y (pd.Series, or np.ndarray) – True labels of length [n_samples]

  • objectives (list) – List of objectives to score

Returns

Ordered dictionary of objective scores

Return type

dict

property summary(self)

A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder