time_series_classification_pipelines#

Pipeline base class for time-series classification problems.

Module Contents#

Classes Summary#

TimeSeriesBinaryClassificationPipeline

Pipeline base class for time series binary classification problems.

TimeSeriesClassificationPipeline

Pipeline base class for time series classification problems.

TimeSeriesMulticlassClassificationPipeline

Pipeline base class for time series multiclass classification problems.

Contents#

class evalml.pipelines.time_series_classification_pipelines.TimeSeriesBinaryClassificationPipeline(component_graph, parameters=None, custom_name=None, random_seed=0)[source]#

Pipeline base class for time series binary classification problems.

Parameters
  • component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]

  • parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as time_index, gap, and max_delay must be specified with the “pipeline” key. For example: Pipeline(parameters={“pipeline”: {“time_index”: “Date”, “max_delay”: 4, “gap”: 2}}).

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Example

>>> pipeline = TimeSeriesBinaryClassificationPipeline(component_graph=["Simple Imputer", "Logistic Regression Classifier"],
...                                                   parameters={"Logistic Regression Classifier": {"penalty": "elasticnet",
...                                                                                                  "solver": "liblinear"},
...                                                               "pipeline": {"gap": 1, "max_delay": 1, "forecast_horizon": 1, "time_index": "date"}},
...                                                   custom_name="My TimeSeriesBinary Pipeline")
...
>>> assert pipeline.custom_name == "My TimeSeriesBinary Pipeline"
>>> assert pipeline.component_graph.component_dict.keys() == {'Simple Imputer', 'Logistic Regression Classifier'}
...
>>> assert pipeline.parameters == {
...     'Simple Imputer': {'impute_strategy': 'most_frequent', 'fill_value': None},
...     'Logistic Regression Classifier': {'penalty': 'elasticnet',
...                                         'C': 1.0,
...                                         'n_jobs': -1,
...                                         'multi_class': 'auto',
...                                         'solver': 'liblinear'},
...     'pipeline': {'gap': 1, 'max_delay': 1, 'forecast_horizon': 1, 'time_index': "date"}}

Attributes

problem_type

None

Methods

can_tune_threshold_with_objective

Determine whether the threshold of a binary classification pipeline can be tuned.

classes_

Gets the class names for the pipeline. Will return None before pipeline is fit.

clone

Constructs a new pipeline with the same components, parameters, and random seed.

create_objectives

Create objective instances from a list of strings or objective classes.

custom_name

Custom name of the pipeline.

dates_needed_for_prediction

Return dates needed to forecast the given date in the future.

dates_needed_for_prediction_range

Return dates needed to forecast the given date in the future.

describe

Outputs pipeline details including component parameters.

feature_importance

Importance associated with each feature. Features dropped by the feature selection are excluded.

fit

Fit a time series classification model.

fit_transform

Fit and transform all components in the component graph, if all components are Transformers.

get_component

Returns component by name.

get_hyperparameter_ranges

Returns hyperparameter ranges from all components as a dictionary.

graph

Generate an image representing the pipeline graph.

graph_dict

Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases.

graph_feature_importance

Generate a bar graph of the pipeline's feature importance.

inverse_transform

Apply component inverse_transform methods to estimator predictions in reverse order.

load

Loads pipeline at file path.

model_family

Returns model family of this pipeline.

name

Name of the pipeline.

new

Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method.

optimize_threshold

Optimize the pipeline threshold given the objective to use. Only used for binary problems with objectives whose thresholds can be tuned.

parameters

Parameter dictionary for this pipeline.

predict

Predict on future data where target is not known.

predict_in_sample

Predict on future data where the target is known, e.g. cross validation.

predict_proba

Predict on future data where the target is unknown.

predict_proba_in_sample

Predict on future data where the target is known, e.g. cross validation.

save

Saves pipeline at file path.

score

Evaluate model performance on current and additional objectives.

summary

A short summary of the pipeline structure, describing the list of components used.

threshold

Threshold used to make a prediction. Defaults to None.

transform

Transform the input.

transform_all_but_final

Transforms the data by applying all pre-processing components.

can_tune_threshold_with_objective(self, objective)#

Determine whether the threshold of a binary classification pipeline can be tuned.

Parameters

objective (ObjectiveBase) – Primary AutoMLSearch objective.

Returns

True if the pipeline threshold can be tuned.

Return type

bool

property classes_(self)#

Gets the class names for the pipeline. Will return None before pipeline is fit.

clone(self)#

Constructs a new pipeline with the same components, parameters, and random seed.

Returns

A new instance of this pipeline with identical components, parameters, and random seed.

static create_objectives(objectives)#

Create objective instances from a list of strings or objective classes.

property custom_name(self)#

Custom name of the pipeline.

dates_needed_for_prediction(self, date)#

Return dates needed to forecast the given date in the future.

Parameters

date (pd.Timestamp) – Date to forecast in the future.

Returns

Range of dates needed to forecast the given date.

Return type

dates_needed (tuple(pd.Timestamp))

dates_needed_for_prediction_range(self, start_date, end_date)#

Return dates needed to forecast the given date in the future.

Parameters
  • start_date (pd.Timestamp) – Start date of range to forecast in the future.

  • end_date (pd.Timestamp) – End date of range to forecast in the future.

Returns

Range of dates needed to forecast the given date.

Return type

dates_needed (tuple(pd.Timestamp))

Raises

ValueError – If start_date doesn’t come before end_date

describe(self, return_dict=False)#

Outputs pipeline details including component parameters.

Parameters

return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.

Returns

Dictionary of all component parameters if return_dict is True, else None.

Return type

dict

property feature_importance(self)#

Importance associated with each feature. Features dropped by the feature selection are excluded.

Returns

Feature names and their corresponding importance

Return type

pd.DataFrame

fit(self, X, y)#

Fit a time series classification model.

Parameters
  • X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, np.ndarray) – The target training labels of length [n_samples]

Returns

self

Raises

ValueError – If the number of unique classes in y are not appropriate for the type of pipeline.

fit_transform(self, X, y)#

Fit and transform all components in the component graph, if all components are Transformers.

Parameters
  • X (pd.DataFrame) – Input features of shape [n_samples, n_features].

  • y (pd.Series) – The target data of length [n_samples].

Returns

Transformed output.

Return type

pd.DataFrame

Raises

ValueError – If final component is an Estimator.

get_component(self, name)#

Returns component by name.

Parameters

name (str) – Name of component.

Returns

Component to return

Return type

Component

get_hyperparameter_ranges(self, custom_hyperparameters)#

Returns hyperparameter ranges from all components as a dictionary.

Parameters

custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.

Returns

Dictionary of hyperparameter ranges for each component in the pipeline.

Return type

dict

graph(self, filepath=None)#

Generate an image representing the pipeline graph.

Parameters

filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.

Returns

Graph object that can be directly displayed in Jupyter notebooks.

Return type

graphviz.Digraph

Raises
  • RuntimeError – If graphviz is not installed.

  • ValueError – If path is not writeable.

graph_dict(self)#

Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases.

x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {“Nodes”: {“component_name”: {“Name”: class_name, “Parameters”: parameters_attributes}, …}}, “x_edges”: [[from_component_name, to_component_name], [from_component_name, to_component_name], …], “y_edges”: [[from_component_name, to_component_name], [from_component_name, to_component_name], …]}

Returns

A dictionary representing the DAG structure.

Return type

dag_dict (dict)

graph_feature_importance(self, importance_threshold=0)#

Generate a bar graph of the pipeline’s feature importance.

Parameters

importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.

Returns

A bar graph showing features and their corresponding importance.

Return type

plotly.Figure

Raises

ValueError – If importance threshold is not valid.

inverse_transform(self, y)#

Apply component inverse_transform methods to estimator predictions in reverse order.

Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd).

Parameters

y (pd.Series) – Final component features.

Returns

The inverse transform of the target.

Return type

pd.Series

static load(file_path: Union[str, io.BytesIO])#

Loads pipeline at file path.

Parameters

file_path (str|BytesIO) – load filepath or a BytesIO object.

Returns

PipelineBase object

property model_family(self)#

Returns model family of this pipeline.

property name(self)#

Name of the pipeline.

new(self, parameters, random_seed=0)#

Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python’s __new__ method.

Parameters
  • parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Returns

A new instance of this pipeline with identical components.

optimize_threshold(self, X, y, y_pred_proba, objective)#

Optimize the pipeline threshold given the objective to use. Only used for binary problems with objectives whose thresholds can be tuned.

Parameters
  • X (pd.DataFrame) – Input features.

  • y (pd.Series) – Input target values.

  • y_pred_proba (pd.Series) – The predicted probabilities of the target outputted by the pipeline.

  • objective (ObjectiveBase) – The objective to threshold with. Must have a tunable threshold.

Raises

ValueError – If objective is not optimizable.

property parameters(self)#

Parameter dictionary for this pipeline.

Returns

Dictionary of all component parameters.

Return type

dict

predict(self, X, objective=None, X_train=None, y_train=None)#

Predict on future data where target is not known.

Parameters
  • X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].

  • objective (Object or string) – The objective to use to make predictions.

  • X_train (pd.DataFrame or np.ndarray or None) – Training data.

  • y_train (pd.Series or None) – Training labels.

Raises

ValueError – If X_train and/or y_train are None or if final component is not an Estimator.

Returns

Predictions.

predict_in_sample(self, X, y, X_train, y_train, objective=None)[source]#

Predict on future data where the target is known, e.g. cross validation.

Parameters
  • X (pd.DataFrame) – Future data of shape [n_samples, n_features].

  • y (pd.Series) – Future target of shape [n_samples].

  • X_train (pd.DataFrame) – Data the pipeline was trained on of shape [n_samples_train, n_feautures].

  • y_train (pd.Series) – Targets used to train the pipeline of shape [n_samples_train].

  • objective (ObjectiveBase, str) – Objective used to threshold predicted probabilities, optional. Defaults to None.

Returns

Estimated labels.

Return type

pd.Series

Raises

ValueError – If objective is not defined for time-series binary classification problems.

predict_proba(self, X, X_train=None, y_train=None)#

Predict on future data where the target is unknown.

Parameters
  • X (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].

  • X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].

  • y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].

Returns

Estimated probabilities.

Return type

pd.Series

Raises

ValueError – If final component is not an Estimator.

predict_proba_in_sample(self, X_holdout, y_holdout, X_train, y_train)#

Predict on future data where the target is known, e.g. cross validation.

Parameters
  • X_holdout (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].

  • y_holdout (pd.Series, np.ndarray) – Future target of shape [n_samples].

  • X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].

  • y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].

Returns

Estimated probabilities.

Return type

pd.Series

Raises

ValueError – If the final component is not an Estimator.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves pipeline at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

score(self, X, y, objectives, X_train=None, y_train=None)#

Evaluate model performance on current and additional objectives.

Parameters
  • X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features].

  • y (pd.Series) – True labels of length [n_samples].

  • objectives (list) – Non-empty list of objectives to score on.

  • X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].

  • y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].

Returns

Ordered dictionary of objective scores.

Return type

dict

property summary(self)#

A short summary of the pipeline structure, describing the list of components used.

Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder

Returns

A string describing the pipeline structure.

property threshold(self)#

Threshold used to make a prediction. Defaults to None.

transform(self, X, y=None)#

Transform the input.

Parameters
  • X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].

  • y (pd.Series) – The target data of length [n_samples]. Defaults to None.

Returns

Transformed output.

Return type

pd.DataFrame

transform_all_but_final(self, X, y=None, X_train=None, y_train=None, calculating_residuals=False)#

Transforms the data by applying all pre-processing components.

Parameters
  • X (pd.DataFrame) – Input data to the pipeline to transform.

  • y (pd.Series) – Targets corresponding to the pipeline targets.

  • X_train (pd.DataFrame) – Training data used to generate generates from past observations.

  • y_train (pd.Series) – Training targets used to generate features from past observations.

  • calculating_residuals (bool) – Whether we’re calling predict_in_sample to calculate the residuals. This means the X and y arguments are not future data, but actually the train data.

Returns

New transformed features.

Return type

pd.DataFrame

class evalml.pipelines.time_series_classification_pipelines.TimeSeriesClassificationPipeline(component_graph, parameters=None, custom_name=None, random_seed=0)[source]#

Pipeline base class for time series classification problems.

Parameters
  • component_graph (ComponentGraph, list, dict) – ComponentGraph instance, list of components in order, or dictionary of components. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]

  • parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as time_index, gap, and max_delay must be specified with the “pipeline” key. For example: Pipeline(parameters={“pipeline”: {“time_index”: “Date”, “max_delay”: 4, “gap”: 2}}).

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

problem_type

None

Methods

can_tune_threshold_with_objective

Determine whether the threshold of a binary classification pipeline can be tuned.

classes_

Gets the class names for the pipeline. Will return None before pipeline is fit.

clone

Constructs a new pipeline with the same components, parameters, and random seed.

create_objectives

Create objective instances from a list of strings or objective classes.

custom_name

Custom name of the pipeline.

dates_needed_for_prediction

Return dates needed to forecast the given date in the future.

dates_needed_for_prediction_range

Return dates needed to forecast the given date in the future.

describe

Outputs pipeline details including component parameters.

feature_importance

Importance associated with each feature. Features dropped by the feature selection are excluded.

fit

Fit a time series classification model.

fit_transform

Fit and transform all components in the component graph, if all components are Transformers.

get_component

Returns component by name.

get_hyperparameter_ranges

Returns hyperparameter ranges from all components as a dictionary.

graph

Generate an image representing the pipeline graph.

graph_dict

Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases.

graph_feature_importance

Generate a bar graph of the pipeline's feature importance.

inverse_transform

Apply component inverse_transform methods to estimator predictions in reverse order.

load

Loads pipeline at file path.

model_family

Returns model family of this pipeline.

name

Name of the pipeline.

new

Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method.

parameters

Parameter dictionary for this pipeline.

predict

Predict on future data where target is not known.

predict_in_sample

Predict on future data where the target is known, e.g. cross validation.

predict_proba

Predict on future data where the target is unknown.

predict_proba_in_sample

Predict on future data where the target is known, e.g. cross validation.

save

Saves pipeline at file path.

score

Evaluate model performance on current and additional objectives.

summary

A short summary of the pipeline structure, describing the list of components used.

transform

Transform the input.

transform_all_but_final

Transforms the data by applying all pre-processing components.

can_tune_threshold_with_objective(self, objective)#

Determine whether the threshold of a binary classification pipeline can be tuned.

Parameters

objective (ObjectiveBase) – Primary AutoMLSearch objective.

Returns

True if the pipeline threshold can be tuned.

Return type

bool

property classes_(self)#

Gets the class names for the pipeline. Will return None before pipeline is fit.

clone(self)#

Constructs a new pipeline with the same components, parameters, and random seed.

Returns

A new instance of this pipeline with identical components, parameters, and random seed.

static create_objectives(objectives)#

Create objective instances from a list of strings or objective classes.

property custom_name(self)#

Custom name of the pipeline.

dates_needed_for_prediction(self, date)#

Return dates needed to forecast the given date in the future.

Parameters

date (pd.Timestamp) – Date to forecast in the future.

Returns

Range of dates needed to forecast the given date.

Return type

dates_needed (tuple(pd.Timestamp))

dates_needed_for_prediction_range(self, start_date, end_date)#

Return dates needed to forecast the given date in the future.

Parameters
  • start_date (pd.Timestamp) – Start date of range to forecast in the future.

  • end_date (pd.Timestamp) – End date of range to forecast in the future.

Returns

Range of dates needed to forecast the given date.

Return type

dates_needed (tuple(pd.Timestamp))

Raises

ValueError – If start_date doesn’t come before end_date

describe(self, return_dict=False)#

Outputs pipeline details including component parameters.

Parameters

return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.

Returns

Dictionary of all component parameters if return_dict is True, else None.

Return type

dict

property feature_importance(self)#

Importance associated with each feature. Features dropped by the feature selection are excluded.

Returns

Feature names and their corresponding importance

Return type

pd.DataFrame

fit(self, X, y)[source]#

Fit a time series classification model.

Parameters
  • X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, np.ndarray) – The target training labels of length [n_samples]

Returns

self

Raises

ValueError – If the number of unique classes in y are not appropriate for the type of pipeline.

fit_transform(self, X, y)#

Fit and transform all components in the component graph, if all components are Transformers.

Parameters
  • X (pd.DataFrame) – Input features of shape [n_samples, n_features].

  • y (pd.Series) – The target data of length [n_samples].

Returns

Transformed output.

Return type

pd.DataFrame

Raises

ValueError – If final component is an Estimator.

get_component(self, name)#

Returns component by name.

Parameters

name (str) – Name of component.

Returns

Component to return

Return type

Component

get_hyperparameter_ranges(self, custom_hyperparameters)#

Returns hyperparameter ranges from all components as a dictionary.

Parameters

custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.

Returns

Dictionary of hyperparameter ranges for each component in the pipeline.

Return type

dict

graph(self, filepath=None)#

Generate an image representing the pipeline graph.

Parameters

filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.

Returns

Graph object that can be directly displayed in Jupyter notebooks.

Return type

graphviz.Digraph

Raises
  • RuntimeError – If graphviz is not installed.

  • ValueError – If path is not writeable.

graph_dict(self)#

Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases.

x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {“Nodes”: {“component_name”: {“Name”: class_name, “Parameters”: parameters_attributes}, …}}, “x_edges”: [[from_component_name, to_component_name], [from_component_name, to_component_name], …], “y_edges”: [[from_component_name, to_component_name], [from_component_name, to_component_name], …]}

Returns

A dictionary representing the DAG structure.

Return type

dag_dict (dict)

graph_feature_importance(self, importance_threshold=0)#

Generate a bar graph of the pipeline’s feature importance.

Parameters

importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.

Returns

A bar graph showing features and their corresponding importance.

Return type

plotly.Figure

Raises

ValueError – If importance threshold is not valid.

inverse_transform(self, y)#

Apply component inverse_transform methods to estimator predictions in reverse order.

Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd).

Parameters

y (pd.Series) – Final component features.

Returns

The inverse transform of the target.

Return type

pd.Series

static load(file_path: Union[str, io.BytesIO])#

Loads pipeline at file path.

Parameters

file_path (str|BytesIO) – load filepath or a BytesIO object.

Returns

PipelineBase object

property model_family(self)#

Returns model family of this pipeline.

property name(self)#

Name of the pipeline.

new(self, parameters, random_seed=0)#

Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python’s __new__ method.

Parameters
  • parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Returns

A new instance of this pipeline with identical components.

property parameters(self)#

Parameter dictionary for this pipeline.

Returns

Dictionary of all component parameters.

Return type

dict

predict(self, X, objective=None, X_train=None, y_train=None)#

Predict on future data where target is not known.

Parameters
  • X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].

  • objective (Object or string) – The objective to use to make predictions.

  • X_train (pd.DataFrame or np.ndarray or None) – Training data.

  • y_train (pd.Series or None) – Training labels.

Raises

ValueError – If X_train and/or y_train are None or if final component is not an Estimator.

Returns

Predictions.

predict_in_sample(self, X, y, X_train, y_train, objective=None)[source]#

Predict on future data where the target is known, e.g. cross validation.

Note: we cast y as ints first to address boolean values that may be returned from calculating predictions which we would not be able to otherwise transform if we originally had integer targets.

Parameters
  • X (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].

  • y (pd.Series, np.ndarray) – Future target of shape [n_samples].

  • X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].

  • y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].

  • objective (ObjectiveBase, str, None) – Objective used to threshold predicted probabilities, optional.

Returns

Estimated labels.

Return type

pd.Series

Raises

ValueError – If final component is not an Estimator.

predict_proba(self, X, X_train=None, y_train=None)[source]#

Predict on future data where the target is unknown.

Parameters
  • X (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].

  • X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].

  • y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].

Returns

Estimated probabilities.

Return type

pd.Series

Raises

ValueError – If final component is not an Estimator.

predict_proba_in_sample(self, X_holdout, y_holdout, X_train, y_train)[source]#

Predict on future data where the target is known, e.g. cross validation.

Parameters
  • X_holdout (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].

  • y_holdout (pd.Series, np.ndarray) – Future target of shape [n_samples].

  • X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].

  • y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].

Returns

Estimated probabilities.

Return type

pd.Series

Raises

ValueError – If the final component is not an Estimator.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves pipeline at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

score(self, X, y, objectives, X_train=None, y_train=None)[source]#

Evaluate model performance on current and additional objectives.

Parameters
  • X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features].

  • y (pd.Series) – True labels of length [n_samples].

  • objectives (list) – Non-empty list of objectives to score on.

  • X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].

  • y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].

Returns

Ordered dictionary of objective scores.

Return type

dict

property summary(self)#

A short summary of the pipeline structure, describing the list of components used.

Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder

Returns

A string describing the pipeline structure.

transform(self, X, y=None)#

Transform the input.

Parameters
  • X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].

  • y (pd.Series) – The target data of length [n_samples]. Defaults to None.

Returns

Transformed output.

Return type

pd.DataFrame

transform_all_but_final(self, X, y=None, X_train=None, y_train=None, calculating_residuals=False)#

Transforms the data by applying all pre-processing components.

Parameters
  • X (pd.DataFrame) – Input data to the pipeline to transform.

  • y (pd.Series) – Targets corresponding to the pipeline targets.

  • X_train (pd.DataFrame) – Training data used to generate generates from past observations.

  • y_train (pd.Series) – Training targets used to generate features from past observations.

  • calculating_residuals (bool) – Whether we’re calling predict_in_sample to calculate the residuals. This means the X and y arguments are not future data, but actually the train data.

Returns

New transformed features.

Return type

pd.DataFrame

class evalml.pipelines.time_series_classification_pipelines.TimeSeriesMulticlassClassificationPipeline(component_graph, parameters=None, custom_name=None, random_seed=0)[source]#

Pipeline base class for time series multiclass classification problems.

Parameters
  • component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]

  • parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as time_index, gap, and max_delay must be specified with the “pipeline” key. For example: Pipeline(parameters={“pipeline”: {“time_index”: “Date”, “max_delay”: 4, “gap”: 2}}).

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Example

>>> pipeline = TimeSeriesMulticlassClassificationPipeline(component_graph=["Simple Imputer", "Logistic Regression Classifier"],
...                                                       parameters={"Logistic Regression Classifier": {"penalty": "elasticnet",
...                                                                                                      "solver": "liblinear"},
...                                                                   "pipeline": {"gap": 1, "max_delay": 1, "forecast_horizon": 1, "time_index": "date"}},
...                                                       custom_name="My TimeSeriesMulticlass Pipeline")
>>> assert pipeline.custom_name == "My TimeSeriesMulticlass Pipeline"
>>> assert pipeline.component_graph.component_dict.keys() == {'Simple Imputer', 'Logistic Regression Classifier'}
>>> assert pipeline.parameters == {
...  'Simple Imputer': {'impute_strategy': 'most_frequent', 'fill_value': None},
...  'Logistic Regression Classifier': {'penalty': 'elasticnet',
...                                     'C': 1.0,
...                                     'n_jobs': -1,
...                                     'multi_class': 'auto',
...                                     'solver': 'liblinear'},
...     'pipeline': {'gap': 1, 'max_delay': 1, 'forecast_horizon': 1, 'time_index': "date"}}

Attributes

problem_type

ProblemTypes.TIME_SERIES_MULTICLASS

Methods

can_tune_threshold_with_objective

Determine whether the threshold of a binary classification pipeline can be tuned.

classes_

Gets the class names for the pipeline. Will return None before pipeline is fit.

clone

Constructs a new pipeline with the same components, parameters, and random seed.

create_objectives

Create objective instances from a list of strings or objective classes.

custom_name

Custom name of the pipeline.

dates_needed_for_prediction

Return dates needed to forecast the given date in the future.

dates_needed_for_prediction_range

Return dates needed to forecast the given date in the future.

describe

Outputs pipeline details including component parameters.

feature_importance

Importance associated with each feature. Features dropped by the feature selection are excluded.

fit

Fit a time series classification model.

fit_transform

Fit and transform all components in the component graph, if all components are Transformers.

get_component

Returns component by name.

get_hyperparameter_ranges

Returns hyperparameter ranges from all components as a dictionary.

graph

Generate an image representing the pipeline graph.

graph_dict

Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases.

graph_feature_importance

Generate a bar graph of the pipeline's feature importance.

inverse_transform

Apply component inverse_transform methods to estimator predictions in reverse order.

load

Loads pipeline at file path.

model_family

Returns model family of this pipeline.

name

Name of the pipeline.

new

Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method.

parameters

Parameter dictionary for this pipeline.

predict

Predict on future data where target is not known.

predict_in_sample

Predict on future data where the target is known, e.g. cross validation.

predict_proba

Predict on future data where the target is unknown.

predict_proba_in_sample

Predict on future data where the target is known, e.g. cross validation.

save

Saves pipeline at file path.

score

Evaluate model performance on current and additional objectives.

summary

A short summary of the pipeline structure, describing the list of components used.

transform

Transform the input.

transform_all_but_final

Transforms the data by applying all pre-processing components.

can_tune_threshold_with_objective(self, objective)#

Determine whether the threshold of a binary classification pipeline can be tuned.

Parameters

objective (ObjectiveBase) – Primary AutoMLSearch objective.

Returns

True if the pipeline threshold can be tuned.

Return type

bool

property classes_(self)#

Gets the class names for the pipeline. Will return None before pipeline is fit.

clone(self)#

Constructs a new pipeline with the same components, parameters, and random seed.

Returns

A new instance of this pipeline with identical components, parameters, and random seed.

static create_objectives(objectives)#

Create objective instances from a list of strings or objective classes.

property custom_name(self)#

Custom name of the pipeline.

dates_needed_for_prediction(self, date)#

Return dates needed to forecast the given date in the future.

Parameters

date (pd.Timestamp) – Date to forecast in the future.

Returns

Range of dates needed to forecast the given date.

Return type

dates_needed (tuple(pd.Timestamp))

dates_needed_for_prediction_range(self, start_date, end_date)#

Return dates needed to forecast the given date in the future.

Parameters
  • start_date (pd.Timestamp) – Start date of range to forecast in the future.

  • end_date (pd.Timestamp) – End date of range to forecast in the future.

Returns

Range of dates needed to forecast the given date.

Return type

dates_needed (tuple(pd.Timestamp))

Raises

ValueError – If start_date doesn’t come before end_date

describe(self, return_dict=False)#

Outputs pipeline details including component parameters.

Parameters

return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.

Returns

Dictionary of all component parameters if return_dict is True, else None.

Return type

dict

property feature_importance(self)#

Importance associated with each feature. Features dropped by the feature selection are excluded.

Returns

Feature names and their corresponding importance

Return type

pd.DataFrame

fit(self, X, y)#

Fit a time series classification model.

Parameters
  • X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, np.ndarray) – The target training labels of length [n_samples]

Returns

self

Raises

ValueError – If the number of unique classes in y are not appropriate for the type of pipeline.

fit_transform(self, X, y)#

Fit and transform all components in the component graph, if all components are Transformers.

Parameters
  • X (pd.DataFrame) – Input features of shape [n_samples, n_features].

  • y (pd.Series) – The target data of length [n_samples].

Returns

Transformed output.

Return type

pd.DataFrame

Raises

ValueError – If final component is an Estimator.

get_component(self, name)#

Returns component by name.

Parameters

name (str) – Name of component.

Returns

Component to return

Return type

Component

get_hyperparameter_ranges(self, custom_hyperparameters)#

Returns hyperparameter ranges from all components as a dictionary.

Parameters

custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.

Returns

Dictionary of hyperparameter ranges for each component in the pipeline.

Return type

dict

graph(self, filepath=None)#

Generate an image representing the pipeline graph.

Parameters

filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.

Returns

Graph object that can be directly displayed in Jupyter notebooks.

Return type

graphviz.Digraph

Raises
  • RuntimeError – If graphviz is not installed.

  • ValueError – If path is not writeable.

graph_dict(self)#

Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases.

x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {“Nodes”: {“component_name”: {“Name”: class_name, “Parameters”: parameters_attributes}, …}}, “x_edges”: [[from_component_name, to_component_name], [from_component_name, to_component_name], …], “y_edges”: [[from_component_name, to_component_name], [from_component_name, to_component_name], …]}

Returns

A dictionary representing the DAG structure.

Return type

dag_dict (dict)

graph_feature_importance(self, importance_threshold=0)#

Generate a bar graph of the pipeline’s feature importance.

Parameters

importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.

Returns

A bar graph showing features and their corresponding importance.

Return type

plotly.Figure

Raises

ValueError – If importance threshold is not valid.

inverse_transform(self, y)#

Apply component inverse_transform methods to estimator predictions in reverse order.

Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd).

Parameters

y (pd.Series) – Final component features.

Returns

The inverse transform of the target.

Return type

pd.Series

static load(file_path: Union[str, io.BytesIO])#

Loads pipeline at file path.

Parameters

file_path (str|BytesIO) – load filepath or a BytesIO object.

Returns

PipelineBase object

property model_family(self)#

Returns model family of this pipeline.

property name(self)#

Name of the pipeline.

new(self, parameters, random_seed=0)#

Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python’s __new__ method.

Parameters
  • parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Returns

A new instance of this pipeline with identical components.

property parameters(self)#

Parameter dictionary for this pipeline.

Returns

Dictionary of all component parameters.

Return type

dict

predict(self, X, objective=None, X_train=None, y_train=None)#

Predict on future data where target is not known.

Parameters
  • X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].

  • objective (Object or string) – The objective to use to make predictions.

  • X_train (pd.DataFrame or np.ndarray or None) – Training data.

  • y_train (pd.Series or None) – Training labels.

Raises

ValueError – If X_train and/or y_train are None or if final component is not an Estimator.

Returns

Predictions.

predict_in_sample(self, X, y, X_train, y_train, objective=None)#

Predict on future data where the target is known, e.g. cross validation.

Note: we cast y as ints first to address boolean values that may be returned from calculating predictions which we would not be able to otherwise transform if we originally had integer targets.

Parameters
  • X (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].

  • y (pd.Series, np.ndarray) – Future target of shape [n_samples].

  • X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].

  • y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].

  • objective (ObjectiveBase, str, None) – Objective used to threshold predicted probabilities, optional.

Returns

Estimated labels.

Return type

pd.Series

Raises

ValueError – If final component is not an Estimator.

predict_proba(self, X, X_train=None, y_train=None)#

Predict on future data where the target is unknown.

Parameters
  • X (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].

  • X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].

  • y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].

Returns

Estimated probabilities.

Return type

pd.Series

Raises

ValueError – If final component is not an Estimator.

predict_proba_in_sample(self, X_holdout, y_holdout, X_train, y_train)#

Predict on future data where the target is known, e.g. cross validation.

Parameters
  • X_holdout (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].

  • y_holdout (pd.Series, np.ndarray) – Future target of shape [n_samples].

  • X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].

  • y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].

Returns

Estimated probabilities.

Return type

pd.Series

Raises

ValueError – If the final component is not an Estimator.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves pipeline at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

score(self, X, y, objectives, X_train=None, y_train=None)#

Evaluate model performance on current and additional objectives.

Parameters
  • X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features].

  • y (pd.Series) – True labels of length [n_samples].

  • objectives (list) – Non-empty list of objectives to score on.

  • X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].

  • y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].

Returns

Ordered dictionary of objective scores.

Return type

dict

property summary(self)#

A short summary of the pipeline structure, describing the list of components used.

Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder

Returns

A string describing the pipeline structure.

transform(self, X, y=None)#

Transform the input.

Parameters
  • X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].

  • y (pd.Series) – The target data of length [n_samples]. Defaults to None.

Returns

Transformed output.

Return type

pd.DataFrame

transform_all_but_final(self, X, y=None, X_train=None, y_train=None, calculating_residuals=False)#

Transforms the data by applying all pre-processing components.

Parameters
  • X (pd.DataFrame) – Input data to the pipeline to transform.

  • y (pd.Series) – Targets corresponding to the pipeline targets.

  • X_train (pd.DataFrame) – Training data used to generate generates from past observations.

  • y_train (pd.Series) – Training targets used to generate features from past observations.

  • calculating_residuals (bool) – Whether we’re calling predict_in_sample to calculate the residuals. This means the X and y arguments are not future data, but actually the train data.

Returns

New transformed features.

Return type

pd.DataFrame