time_series_regression_pipeline#
Pipeline base class for time series regression problems.
Module Contents#
Classes Summary#
Pipeline base class for time series regression problems. |
Contents#
- class evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline(component_graph, parameters=None, custom_name=None, random_seed=0)[source]#
Pipeline base class for time series regression problems.
- Parameters
component_graph (ComponentGraph, list, dict) – ComponentGraph instance, list of components in order, or dictionary of components. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as time_index, gap, and max_delay must be specified with the “pipeline” key. For example: Pipeline(parameters={“pipeline”: {“time_index”: “Date”, “max_delay”: 4, “gap”: 2}}).
random_seed (int) – Seed for the random number generator. Defaults to 0.
Example
>>> pipeline = TimeSeriesRegressionPipeline(component_graph=["Simple Imputer", "Linear Regressor"], ... parameters={"Simple Imputer": {"impute_strategy": "mean"}, ... "pipeline": {"gap": 1, "max_delay": 1, "forecast_horizon": 1, "time_index": "date"}}, ... custom_name="My TimeSeriesRegression Pipeline") ... >>> assert pipeline.custom_name == "My TimeSeriesRegression Pipeline" >>> assert pipeline.component_graph.component_dict.keys() == {'Simple Imputer', 'Linear Regressor'}
The pipeline parameters will be chosen from the default parameters for every component, unless specific parameters were passed in as they were above.
>>> assert pipeline.parameters == { ... 'Simple Imputer': {'impute_strategy': 'mean', 'fill_value': None}, ... 'Linear Regressor': {'fit_intercept': True, 'n_jobs': -1}, ... 'pipeline': {'gap': 1, 'max_delay': 1, 'forecast_horizon': 1, 'time_index': "date"}}
Attributes
NO_PREDS_PI_ESTIMATORS
ProblemTypes.TIME_SERIES_REGRESSION
problem_type
None
Methods
Determine whether the threshold of a binary classification pipeline can be tuned.
Constructs a new pipeline with the same components, parameters, and random seed.
Create objective instances from a list of strings or objective classes.
Custom name of the pipeline.
Return dates needed to forecast the given date in the future.
Return dates needed to forecast the given date in the future.
Outputs pipeline details including component parameters.
Importance associated with each feature. Features dropped by the feature selection are excluded.
Fit a time series pipeline.
Fit and transform all components in the component graph, if all components are Transformers.
Returns component by name.
Generates all possible forecasting time points based on latest data point in X.
Generates all possible forecasting predictions based on last period of X.
Returns hyperparameter ranges from all components as a dictionary.
Find the prediction intervals using the fitted regressor.
Generate an image representing the pipeline graph.
Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases.
Generate a bar graph of the pipeline's feature importance.
Apply component inverse_transform methods to estimator predictions in reverse order.
Loads pipeline at file path.
Returns model family of this pipeline.
Name of the pipeline.
Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method.
Parameter dictionary for this pipeline.
Predict on future data where target is not known.
Predict on future data where the target is known, e.g. cross validation.
Saves pipeline at file path.
Evaluate model performance on current and additional objectives.
A short summary of the pipeline structure, describing the list of components used.
Transform the input.
Transforms the data by applying all pre-processing components.
- can_tune_threshold_with_objective(self, objective)#
Determine whether the threshold of a binary classification pipeline can be tuned.
- Parameters
objective (ObjectiveBase) – Primary AutoMLSearch objective.
- Returns
True if the pipeline threshold can be tuned.
- Return type
bool
- clone(self)#
Constructs a new pipeline with the same components, parameters, and random seed.
- Returns
A new instance of this pipeline with identical components, parameters, and random seed.
- static create_objectives(objectives)#
Create objective instances from a list of strings or objective classes.
- property custom_name(self)#
Custom name of the pipeline.
- dates_needed_for_prediction(self, date)#
Return dates needed to forecast the given date in the future.
- Parameters
date (pd.Timestamp) – Date to forecast in the future.
- Returns
Range of dates needed to forecast the given date.
- Return type
dates_needed (tuple(pd.Timestamp))
- dates_needed_for_prediction_range(self, start_date, end_date)#
Return dates needed to forecast the given date in the future.
- Parameters
start_date (pd.Timestamp) – Start date of range to forecast in the future.
end_date (pd.Timestamp) – End date of range to forecast in the future.
- Returns
Range of dates needed to forecast the given date.
- Return type
dates_needed (tuple(pd.Timestamp))
- Raises
ValueError – If start_date doesn’t come before end_date
- describe(self, return_dict=False)#
Outputs pipeline details including component parameters.
- Parameters
return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.
- Returns
Dictionary of all component parameters if return_dict is True, else None.
- Return type
dict
- property feature_importance(self)#
Importance associated with each feature. Features dropped by the feature selection are excluded.
- Returns
Feature names and their corresponding importance
- Return type
pd.DataFrame
- fit(self, X, y)[source]#
Fit a time series pipeline.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features].
y (pd.Series, np.ndarray) – The target training targets of length [n_samples].
- Returns
self
- Raises
ValueError – If the target is not numeric.
- fit_transform(self, X, y)#
Fit and transform all components in the component graph, if all components are Transformers.
- Parameters
X (pd.DataFrame) – Input features of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples].
- Returns
Transformed output.
- Return type
pd.DataFrame
- Raises
ValueError – If final component is an Estimator.
- get_component(self, name)#
Returns component by name.
- Parameters
name (str) – Name of component.
- Returns
Component to return
- Return type
Component
- get_forecast_period(self, X)[source]#
Generates all possible forecasting time points based on latest data point in X.
- Parameters
X (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_feautures].
- Raises
ValueError – If pipeline is not trained.
- Returns
Datetime periods from gap to forecast_horizon + gap.
- Return type
pd.Series
Example
>>> X = pd.DataFrame({'date': pd.date_range(start='1-1-2022', periods=10, freq='D'), 'feature': range(10, 20)}) >>> y = pd.Series(range(0, 10), name='target') >>> gap = 1 >>> forecast_horizon = 2 >>> pipeline = TimeSeriesRegressionPipeline(component_graph=["Linear Regressor"], ... parameters={"Simple Imputer": {"impute_strategy": "mean"}, ... "pipeline": {"gap": gap, "max_delay": 1, "forecast_horizon": forecast_horizon, "time_index": "date"}}, ... ) >>> pipeline.fit(X, y) pipeline = TimeSeriesRegressionPipeline(component_graph={'Linear Regressor': ['Linear Regressor', 'X', 'y']}, parameters={'Linear Regressor':{'fit_intercept': True, 'n_jobs': -1}, 'pipeline':{'gap': 1, 'max_delay': 1, 'forecast_horizon': 2, 'time_index': 'date'}}, random_seed=0) >>> dates = pipeline.get_forecast_period(X) >>> expected = pd.Series(pd.date_range(start='2022-01-11', periods=forecast_horizon, freq='D').shift(gap), name='date', index=[10, 11]) >>> assert dates.equals(expected)
- get_forecast_predictions(self, X, y)[source]#
Generates all possible forecasting predictions based on last period of X.
- Parameters
X (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_feautures].
y (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
- Returns
Predictions from gap periods out to forecast_horizon + gap periods.
- get_hyperparameter_ranges(self, custom_hyperparameters)#
Returns hyperparameter ranges from all components as a dictionary.
- Parameters
custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.
- Returns
Dictionary of hyperparameter ranges for each component in the pipeline.
- Return type
dict
- get_prediction_intervals(self, X, y=None, X_train=None, y_train=None, coverage=None)[source]#
Find the prediction intervals using the fitted regressor.
This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation.
Certain estimators (Extra Trees Estimator, XGBoost Estimator, Prophet Estimator, ARIMA, and Exponential Smoothing estimator) utilize a different methodology to calculate prediction intervals. See the docs for these estimators to learn more.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – Target data.
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
coverage (list[float]) – A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for.
- Returns
Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
- Return type
dict
- Raises
MethodPropertyNotFoundError – If the estimator does not support Time Series Regression as a problem type.
- graph(self, filepath=None)#
Generate an image representing the pipeline graph.
- Parameters
filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
- Returns
Graph object that can be directly displayed in Jupyter notebooks.
- Return type
graphviz.Digraph
- Raises
RuntimeError – If graphviz is not installed.
ValueError – If path is not writeable.
- graph_dict(self)#
Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases.
x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {“Nodes”: {“component_name”: {“Name”: class_name, “Parameters”: parameters_attributes}, …}}, “x_edges”: [[from_component_name, to_component_name], [from_component_name, to_component_name], …], “y_edges”: [[from_component_name, to_component_name], [from_component_name, to_component_name], …]}
- Returns
A dictionary representing the DAG structure.
- Return type
dag_dict (dict)
- graph_feature_importance(self, importance_threshold=0)#
Generate a bar graph of the pipeline’s feature importance.
- Parameters
importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.
- Returns
A bar graph showing features and their corresponding importance.
- Return type
plotly.Figure
- Raises
ValueError – If importance threshold is not valid.
- inverse_transform(self, y)#
Apply component inverse_transform methods to estimator predictions in reverse order.
Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd).
- Parameters
y (pd.Series) – Final component features.
- Returns
The inverse transform of the target.
- Return type
pd.Series
- static load(file_path: Union[str, io.BytesIO])#
Loads pipeline at file path.
- Parameters
file_path (str|BytesIO) – load filepath or a BytesIO object.
- Returns
PipelineBase object
- property model_family(self)#
Returns model family of this pipeline.
- property name(self)#
Name of the pipeline.
- new(self, parameters, random_seed=0)#
Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python’s __new__ method.
- Parameters
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
- Returns
A new instance of this pipeline with identical components.
- property parameters(self)#
Parameter dictionary for this pipeline.
- Returns
Dictionary of all component parameters.
- Return type
dict
- predict(self, X, objective=None, X_train=None, y_train=None)#
Predict on future data where target is not known.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
objective (Object or string) – The objective to use to make predictions.
X_train (pd.DataFrame or np.ndarray or None) – Training data.
y_train (pd.Series or None) – Training labels.
- Raises
ValueError – If X_train and/or y_train are None or if final component is not an Estimator.
- Returns
Predictions.
- predict_in_sample(self, X, y, X_train, y_train, objective=None, calculating_residuals=False)#
Predict on future data where the target is known, e.g. cross validation.
- Parameters
X (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features]
y (pd.Series, np.ndarray) – Future target of shape [n_samples]
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_feautures]
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train]
objective (ObjectiveBase, str, None) – Objective used to threshold predicted probabilities, optional.
calculating_residuals (bool) – Whether we’re calling predict_in_sample to calculate the residuals. This means the X and y arguments are not future data, but actually the train data.
- Returns
Estimated labels.
- Return type
pd.Series
- Raises
ValueError – If final component is not an Estimator.
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves pipeline at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- score(self, X, y, objectives, X_train=None, y_train=None)[source]#
Evaluate model performance on current and additional objectives.
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – True labels of length [n_samples].
objectives (list) – Non-empty list of objectives to score on.
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_feautures].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
- Returns
Ordered dictionary of objective scores.
- Return type
dict
- property summary(self)#
A short summary of the pipeline structure, describing the list of components used.
Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder
- Returns
A string describing the pipeline structure.
- transform(self, X, y=None)#
Transform the input.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples]. Defaults to None.
- Returns
Transformed output.
- Return type
pd.DataFrame
- transform_all_but_final(self, X, y=None, X_train=None, y_train=None, calculating_residuals=False)#
Transforms the data by applying all pre-processing components.
- Parameters
X (pd.DataFrame) – Input data to the pipeline to transform.
y (pd.Series) – Targets corresponding to the pipeline targets.
X_train (pd.DataFrame) – Training data used to generate generates from past observations.
y_train (pd.Series) – Training targets used to generate features from past observations.
calculating_residuals (bool) – Whether we’re calling predict_in_sample to calculate the residuals. This means the X and y arguments are not future data, but actually the train data.
- Returns
New transformed features.
- Return type
pd.DataFrame