stl_decomposer#
Component that removes trends and seasonality from time series using STL.
Module Contents#
Classes Summary#
Removes trends and seasonality from time series using the STL algorithm. |
Contents#
- class evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer(time_index: str = None, degree: int = 1, period: int = None, seasonal_smoother: int = 7, random_seed: int = 0, **kwargs)[source]#
Removes trends and seasonality from time series using the STL algorithm.
https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.STL.html
- Parameters
time_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.
degree (int) – Not currently used. STL 3x “degree-like” values. None are able to be set at this time. Defaults to 1.
period (int) – The number of entries in the time series data that corresponds to one period of a cyclic signal. For instance, if data is known to possess a weekly seasonal signal, and if the data is daily data, the period should likely be 7. For daily data with a yearly seasonal signal, the period should likely be 365. If None, statsmodels will infer the period based on the frequency. Defaults to None.
seasonal_smoother (int) – The length of the seasonal smoother used by the underlying STL algorithm. For compatibility, must be odd. If an even number is provided, the next, highest odd number will be used. Defaults to 7.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
None
invalid_frequencies
[‘SM’, ‘BM’, ‘SMS’, ‘BMS’, ‘BQ’, ‘BQS’, ‘T’, ‘S’, ‘L’, ‘U’, ‘N’, ‘A’, ‘BA’, ‘AS’, ‘BAS’, ‘BH’]
modifies_features
False
modifies_target
True
name
STL Decomposer
needs_fitting
True
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Function that uses autocorrelative methods to determine the first, signficant period of the seasonal signal.
Fits the STLDecomposer and determine the seasonal signal.
Removes fitted trend and seasonality from target variable.
Return a list of dataframes with 4 columns: signal, trend, seasonality, residual.
Adds back fitted trend and seasonality to target variable.
Determines if the given string represents a valid frequency for this decomposer.
Loads component at file path.
Returns the parameters which were used to initialize the component.
Plots the decomposition of the target signal.
Saves component at file path.
Function to set the component's seasonal period based on the target's seasonality.
Transforms the target data by removing the STL trend and seasonality.
- clone(self)#
Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
- default_parameters(cls)#
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
- describe(self, print_name=False, return_dict=False)#
Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
- determine_periodicity(self, X: pandas.DataFrame, y: pandas.Series, method: str = 'autocorrelation')#
Function that uses autocorrelative methods to determine the first, signficant period of the seasonal signal.
- Parameters
X (pandas.DataFrame) – The feature data of the time series problem.
y (pandas.Series) – The target data of a time series problem.
method (str) – Either “autocorrelation” or “partial-autocorrelation”. The method by which to determine the first period of the seasonal part of the target signal. “partial-autocorrelation” should currently not be used. Defaults to “autocorrelation”.
- Returns
- The integer numbers of entries in time series data over which the seasonal part of the target data
repeats. If the time series data is in days, then this is the number of days that it takes the target’s seasonal signal to repeat. Note: the target data can contain multiple seasonal signals. This function will only return the first, and thus, shortest period. E.g. if the target has both weekly and yearly seasonality, the function will only return “7” and not return “365”. If no period is detected, returns [None].
- Return type
(list[int])
- fit(self, X: pandas.DataFrame, y: pandas.Series = None) STLDecomposer [source]#
Fits the STLDecomposer and determine the seasonal signal.
Instantiates a statsmodels STL decompose object with the component’s stored parameters and fits it. Since the statsmodels object does not fit the sklearn api, it is not saved during __init__() in _component_obj and will be re-instantiated each time fit is called.
To emulate the sklearn API, when the STL decomposer is fit, the full seasonal component, a single period sample of the seasonal component, the full trend-cycle component and the residual are saved.
y(t) = S(t) + T(t) + R(t)
- Parameters
X (pd.DataFrame, optional) – Conditionally used to build datetime index.
y (pd.Series) – Target variable to detrend and deseasonalize.
- Returns
self
- Raises
ValueError – If y is None.
ValueError – If target data doesn’t have DatetimeIndex AND no Datetime features in features data
- fit_transform(self, X: pandas.DataFrame, y: pandas.Series = None) tuple[pandas.DataFrame, pandas.Series] #
Removes fitted trend and seasonality from target variable.
- Parameters
X (pd.DataFrame, optional) – Ignored.
y (pd.Series) – Target variable to detrend and deseasonalize.
- Returns
- The first element are the input features returned without modification.
The second element is the target variable y with the fitted trend removed.
- Return type
tuple of pd.DataFrame, pd.Series
- get_trend_dataframe(self, X, y)[source]#
Return a list of dataframes with 4 columns: signal, trend, seasonality, residual.
- Parameters
X (pd.DataFrame) – Input data with time series data in index.
y (pd.Series or pd.DataFrame) – Target variable data provided as a Series for univariate problems or a DataFrame for multivariate problems.
- Returns
- Each DataFrame contains the columns “signal”, “trend”, “seasonality” and “residual,”
with the latter 3 column values being the decomposed elements of the target data. The “signal” column is simply the input target signal but reindexed with a datetime index to match the input features.
- Return type
list of pd.DataFrame
- Raises
TypeError – If X does not have time-series data in the index.
ValueError – If time series index of X does not have an inferred frequency.
ValueError – If the forecaster associated with the detrender has not been fit yet.
TypeError – If y is not provided as a pandas Series or DataFrame.
- inverse_transform(self, y_t: pandas.Series) tuple[pandas.DataFrame, pandas.Series] [source]#
Adds back fitted trend and seasonality to target variable.
The STL trend is projected to cover the entire requested target range, then added back into the signal. Then, the seasonality is projected forward to and added back into the signal.
- Parameters
y_t (pd.Series) – Target variable.
- Returns
- The first element are the input features returned without modification.
The second element is the target variable y with the trend and seasonality added back in.
- Return type
tuple of pd.DataFrame, pd.Series
- Raises
ValueError – If y is None.
- classmethod is_freq_valid(self, freq: str)#
Determines if the given string represents a valid frequency for this decomposer.
- Parameters
freq (str) – A frequency to validate. See the pandas docs at https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases for options.
- Returns
boolean representing whether the frequency is valid or not.
- static load(file_path)#
Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
- property parameters(self)#
Returns the parameters which were used to initialize the component.
- plot_decomposition(self, X: pandas.DataFrame, y: pandas.Series, show: bool = False) tuple[matplotlib.pyplot.Figure, list] #
Plots the decomposition of the target signal.
- Parameters
X (pd.DataFrame) – Input data with time series data in index.
y (pd.Series or pd.DataFrame) – Target variable data provided as a Series for univariate problems or a DataFrame for multivariate problems.
show (bool) – Whether to display the plot or not. Defaults to False.
- Returns
- The figure and axes that have the decompositions
plotted on them
- Return type
matplotlib.pyplot.Figure, list[matplotlib.pyplot.Axes]
- save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#
Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
- set_period(self, X: pandas.DataFrame, y: pandas.Series)#
Function to set the component’s seasonal period based on the target’s seasonality.
- Parameters
X (pandas.DataFrame) – The feature data of the time series problem.
y (pandas.Series) – The target data of a time series problem.
- transform(self, X: pandas.DataFrame, y: pandas.Series = None) tuple[pandas.DataFrame, pandas.Series] [source]#
Transforms the target data by removing the STL trend and seasonality.
Uses an ARIMA model to project forward the addititve trend and removes it. Then, utilizes the first period’s worth of seasonal data determined in the .fit() function to extrapolate the seasonal signal of the data to be transformed. This seasonal signal is also assumed to be additive and is removed.
- Parameters
X (pd.DataFrame, optional) – Conditionally used to build datetime index.
y (pd.Series) – Target variable to detrend and deseasonalize.
- Returns
- The input features are returned without modification. The target
variable y is detrended and deseasonalized.
- Return type
tuple of pd.DataFrame, pd.Series
- Raises
ValueError – If target data doesn’t have DatetimeIndex AND no Datetime features in features data