delayed_feature_transformer

Transformer that delays input features and target variable for time series problems.

Module Contents

Classes Summary

DelayedFeatureTransformer

Transformer that delays input features and target variable for time series problems.

Contents

class evalml.pipelines.components.transformers.preprocessing.delayed_feature_transformer.DelayedFeatureTransformer(date_index=None, max_delay=2, gap=0, forecast_horizon=1, conf_level=0.05, delay_features=True, delay_target=True, random_seed=0, **kwargs)[source]

Transformer that delays input features and target variable for time series problems.

This component uses an algorithm based on the autocorrelation values of the target variable to determine which lags to select from the set of all possible lags.

The algorithm is based on the idea that the local maxima of the autocorrelation function indicate the lags that have the most impact on the present time.

The algorithm computes the autocorrelation values and finds the local maxima, called “peaks”, that are significant at the given conf_level. Since lags in the range [0, 10] tend to be predictive but not local maxima, the union of the peaks is taken with the significant lags in the range [0, 10]. At the end, only selected lags in the range [0, max_delay] are used.

Parametrizing the algorithm by conf_level lets the AutoMLAlgorithm tune the set of lags chosen so that the chances of finding a good set of lags is higher.

Using conf_level value of 1 selects all possible lags.

Parameters
  • date_index (str) – Name of the column containing the datetime information used to order the data. Ignored.

  • max_delay (int) – Maximum number of time units to delay each feature. Defaults to 2.

  • forecast_horizon (int) – The number of time periods the pipeline is expected to forecast.

  • conf_level (float) – Float in range (0, 1] that determines the confidence interval size used to select which lags to compute from the set of [1, max_delay]. A delay of 1 will always be computed. If 1, selects all possible lags in the set of [1, max_delay], inclusive.

  • delay_features (bool) – Whether to delay the input features. Defaults to True.

  • delay_target (bool) – Whether to delay the target. Defaults to True.

  • gap (int) – The number of time units between when the features are collected and when the target is collected. For example, if you are predicting the next time step’s target, gap=1. This is only needed because when gap=0, we need to be sure to start the lagging of the target variable at 1. Defaults to 1.

  • random_seed (int) – Seed for the random number generator. This transformer performs the same regardless of the random seed provided.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

Delayed Feature Transformer

needs_fitting

True

target_colname_prefix

target_delay_{}

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the DelayFeatureTransformer.

fit_transform

Fit the component and transform the input data.

load

Loads component at file path.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Computes the delayed features for all features in X and y.

clone(self)

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]

Fits the DelayFeatureTransformer.

Parameters
  • X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]

  • y (pd.Series, optional) – The target training data of length [n_samples]

Returns

self

fit_transform(self, X, y)[source]

Fit the component and transform the input data.

Parameters
  • X (pd.DataFrame or None) – Data to transform. None is expected when only the target variable is being used.

  • y (pd.Series, or None) – Target.

Returns

Transformed X.

Return type

pd.DataFrame

static load(file_path)

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

property parameters(self)

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]

Computes the delayed features for all features in X and y.

For each feature in X, it will add a column to the output dataframe for each delay in the (inclusive) range [1, max_delay]. The values of each delayed feature are simply the original feature shifted forward in time by the delay amount. For example, a delay of 3 units means that the feature value at row n will be taken from the n-3rd row of that feature

If y is not None, it will also compute the delayed values for the target variable.

Parameters
  • X (pd.DataFrame or None) – Data to transform. None is expected when only the target variable is being used.

  • y (pd.Series, or None) – Target.

Returns

Transformed X.

Return type

pd.DataFrame