stl_decomposer
===============================================================================
.. py:module:: evalml.pipelines.components.transformers.preprocessing.stl_decomposer
.. autoapi-nested-parse::
Component that removes trends and seasonality from time series using STL.
Module Contents
---------------
Classes Summary
~~~~~~~~~~~~~~~
.. autoapisummary::
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer
Contents
~~~~~~~~~~~~~~~~~~~
.. py:class:: STLDecomposer(time_index: str = None, series_id: str = None, degree: int = 1, period: int = None, periods: dict = None, seasonal_smoother: int = 7, random_seed: int = 0, **kwargs)
Removes trends and seasonality from time series using the STL algorithm.
https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.STL.html
:param time_index: Specifies the name of the column in X that provides the datetime objects. Defaults to None.
:type time_index: str
:param series_id: Specifies the name of the column in X that provides the series_id objects for multiseries. Defaults to None.
:type series_id: str
:param degree: Not currently used. STL 3x "degree-like" values. None are able to be set at
this time. Defaults to 1.
:type degree: int
:param period: The number of entries in the time series data that corresponds to one period of a
cyclic signal. For instance, if data is known to possess a weekly seasonal signal, and if the data
is daily data, the period should likely be 7. For daily data with a yearly seasonal signal, the period
should likely be 365. If None, statsmodels will infer the period based on the frequency. Defaults to None.
:type period: int
:param seasonal_smoother: The length of the seasonal smoother used by the underlying STL algorithm. For compatibility,
must be odd. If an even number is provided, the next, highest odd number will be used. Defaults to 7.
:type seasonal_smoother: int
:param random_seed: Seed for the random number generator. Defaults to 0.
:type random_seed: int
**Attributes**
.. list-table::
:widths: 15 85
:header-rows: 0
* - **hyperparameter_ranges**
- None
* - **invalid_frequencies**
- []
* - **modifies_features**
- False
* - **modifies_target**
- True
* - **name**
- STL Decomposer
* - **needs_fitting**
- True
* - **training_only**
- False
**Methods**
.. autoapisummary::
:nosignatures:
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.clone
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.default_parameters
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.describe
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.determine_periodicity
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.fit
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.fit_transform
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.get_trend_dataframe
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.get_trend_prediction_intervals
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.inverse_transform
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.is_freq_valid
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.load
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.parameters
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.plot_decomposition
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.save
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.set_period
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.transform
evalml.pipelines.components.transformers.preprocessing.stl_decomposer.STLDecomposer.update_parameters
.. py:method:: clone(self)
Constructs a new component with the same parameters and random state.
:returns: A new instance of this component with identical parameters and random state.
.. py:method:: default_parameters(cls)
Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
:returns: Default parameters for this component.
:rtype: dict
.. py:method:: describe(self, print_name=False, return_dict=False)
Describe a component and its parameters.
:param print_name: whether to print name of component
:type print_name: bool, optional
:param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters}
:type return_dict: bool, optional
:returns: Returns dictionary if return_dict is True, else None.
:rtype: None or dict
.. py:method:: determine_periodicity(cls, X: pandas.DataFrame, y: pandas.Series, acf_threshold: float = 0.01, rel_max_order: int = 5)
:classmethod:
Function that uses autocorrelative methods to determine the likely most signficant period of the seasonal signal.
:param X: The feature data of the time series problem.
:type X: pandas.DataFrame
:param y: The target data of a time series problem.
:type y: pandas.Series
:param acf_threshold: The threshold for the autocorrelation function to determine the period. Any values below
the threshold are considered to be 0 and will not be considered for the period. Defaults to 0.01.
:type acf_threshold: float
:param rel_max_order: The order of the relative maximum to determine the period. Defaults to 5.
:type rel_max_order: int
:returns:
The integer number of entries in time series data over which the seasonal part of the target data
repeats. If the time series data is in days, then this is the number of days that it takes the target's
seasonal signal to repeat. Note: the target data can contain multiple seasonal signals. This function
will only return the stronger. E.g. if the target has both weekly and yearly seasonality, the function
may return either "7" or "365", depending on which seasonality is more strongly autocorrelated. If no
period is detected, returns None.
:rtype: int
.. py:method:: fit(self, X: pandas.DataFrame, y: Union[pandas.Series, pandas.DataFrame] = None) -> STLDecomposer
Fits the STLDecomposer and determine the seasonal signal.
Instantiates a statsmodels STL decompose object with the component's stored
parameters and fits it. Since the statsmodels object does not fit the sklearn
api, it is not saved during __init__() in _component_obj and will be re-instantiated
each time fit is called.
To emulate the sklearn API, when the STL decomposer is fit, the full seasonal
component, a single period sample of the seasonal component, the full
trend-cycle component and the residual are saved.
y(t) = S(t) + T(t) + R(t)
:param X: Conditionally used to build datetime index.
:type X: pd.DataFrame, optional
:param y: Target variable to detrend and deseasonalize.
:type y: pd.Series or pd.DataFrame
:returns: self
:raises ValueError: If y is None.
:raises ValueError: If target data doesn't have DatetimeIndex AND no Datetime features in features data
.. py:method:: fit_transform(self, X: pandas.DataFrame, y: pandas.Series = None) -> tuple[pandas.DataFrame, pandas.Series]
Removes fitted trend and seasonality from target variable.
:param X: Ignored.
:type X: pd.DataFrame, optional
:param y: Target variable to detrend and deseasonalize.
:type y: pd.Series
:returns:
The first element are the input features returned without modification.
The second element is the target variable y with the fitted trend removed.
:rtype: tuple of pd.DataFrame, pd.Series
.. py:method:: get_trend_dataframe(self, X, y)
Return a list of dataframes with 4 columns: signal, trend, seasonality, residual.
:param X: Input data with time series data in index.
:type X: pd.DataFrame
:param y: Target variable data provided as a Series for univariate problems or
a DataFrame for multivariate problems.
:type y: pd.Series or pd.DataFrame
:returns:
Each DataFrame contains the columns "signal", "trend", "seasonality" and "residual,"
with the latter 3 column values being the decomposed elements of the target data. The "signal" column
is simply the input target signal but reindexed with a datetime index to match the input features.
(Multi series) dictionary of lists: Series id maps to a list of pd.DataFrames that each contain the columns "signal", "trend", "seasonality" and "residual"
:rtype: (Single series) list of pd.DataFrame
:raises TypeError: If X does not have time-series data in the index.
:raises ValueError: If time series index of X does not have an inferred frequency.
:raises ValueError: If the forecaster associated with the detrender has not been fit yet.
:raises TypeError: If y is not provided as a pandas Series or DataFrame.
.. py:method:: get_trend_prediction_intervals(self, y, coverage=None)
Calculate the prediction intervals for the trend data.
:param y: Target data.
:type y: pd.Series or pd.DataFrame
:param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the
prediction interval should be calculated for.
:type coverage: list[float]
:returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
(Multi series) dict of dict of pd.Series: Each series id maps to a dictionary of prediction intervals
:rtype: (Single series) dict of pd.Series
.. py:method:: inverse_transform(self, y_t: Union[pandas.Series, pandas.DataFrame]) -> Union[pandas.Series, pandas.DataFrame]
Adds back fitted trend and seasonality to target variable.
The STL trend is projected to cover the entire requested target range, then added back into the signal. Then,
the seasonality is projected forward to and added back into the signal.
:param y_t: Target variable.
:type y_t: pd.Series or pd.DataFrame
:returns: The target variable y with the trend and seasonality added back in.
:rtype: pd.Series or pd.DataFrame
:raises ValueError: If y is None.
.. py:method:: is_freq_valid(cls, freq: str)
:classmethod:
Determines if the given string represents a valid frequency for this decomposer.
:param freq: A frequency to validate. See the pandas docs at https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases for options.
:type freq: str
:returns: boolean representing whether the frequency is valid or not.
.. py:method:: load(file_path)
:staticmethod:
Loads component at file path.
:param file_path: Location to load file.
:type file_path: str
:returns: ComponentBase object
.. py:method:: parameters(self)
:property:
Returns the parameters which were used to initialize the component.
.. py:method:: plot_decomposition(self, X: pandas.DataFrame, y: Union[pandas.Series, pandas.DataFrame], show: bool = False) -> Union[tuple[matplotlib.pyplot.Figure, list], dict[str, tuple[matplotlib.pyplot.Figure]]]
Plots the decomposition of the target signal.
:param X: Input data with time series data in index.
:type X: pd.DataFrame
:param y: Target variable data provided as a Series for univariate problems or
a DataFrame for multivariate problems.
:type y: pd.Series or pd.DataFrame
:param show: Whether to display the plot or not. Defaults to False.
:type show: bool
:returns:
The figure and axes that have the decompositions
plotted on them
(Multi series) dict[str, (matplotlib.pyplot.Figure, list[matplotlib.pyplot.Axes])]: A dictionary that maps the series id
to the figure and axes that have the decompositions plotted on them
:rtype: (Single series) matplotlib.pyplot.Figure, list[matplotlib.pyplot.Axes]
.. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)
Saves component at file path.
:param file_path: Location to save file.
:type file_path: str
:param pickle_protocol: The pickle data stream format.
:type pickle_protocol: int
.. py:method:: set_period(self, X: pandas.DataFrame, y: pandas.Series, acf_threshold: float = 0.01, rel_max_order: int = 5)
Function to set the component's seasonal period based on the target's seasonality.
:param X: The feature data of the time series problem.
:type X: pandas.DataFrame
:param y: The target data of a time series problem.
:type y: pandas.Series
:param acf_threshold: The threshold for the autocorrelation function to determine the period. Any values below
the threshold are considered to be 0 and will not be considered for the period. Defaults to 0.01.
:type acf_threshold: float
:param rel_max_order: The order of the relative maximum to determine the period. Defaults to 5.
:type rel_max_order: int
.. py:method:: transform(self, X: pandas.DataFrame, y: Union[pandas.Series, pandas.DataFrame] = None) -> Union[tuple[pandas.DataFrame, pandas.Series], tuple[pandas.DataFrame, pandas.DataFrame]]
Transforms the target data by removing the STL trend and seasonality.
Uses an ARIMA model to project forward the addititve trend and removes it. Then, utilizes the first period's
worth of seasonal data determined in the .fit() function to extrapolate the seasonal signal of the data to be
transformed. This seasonal signal is also assumed to be additive and is removed.
:param X: Conditionally used to build datetime index.
:type X: pd.DataFrame, optional
:param y: Target variable to detrend and deseasonalize.
:type y: pd.Series or pd.DataFrame
:returns:
The list of input features are returned without modification. The target
variable y is detrended and deseasonalized.
(Multi series) pd.DataFrame, pd.DataFrame: The list of input features are returned without modification. The target
variable y is detrended and deseasonalized.
:rtype: (Single series) pd.DataFrame, pd.Series
:raises ValueError: If target data doesn't have DatetimeIndex AND no Datetime features in features data
.. py:method:: update_parameters(self, update_dict, reset_fit=True)
Updates the parameter dictionary of the component.
:param update_dict: A dict of parameters to update.
:type update_dict: dict
:param reset_fit: If True, will set `_is_fitted` to False.
:type reset_fit: bool, optional