polynomial_decomposer#

Component that removes trends from time series by fitting a polynomial to the data.

Module Contents#

Classes Summary#

PolynomialDecomposer

Removes trends and seasonality from time series by fitting a polynomial and moving average to the data.

Contents#

class evalml.pipelines.components.transformers.preprocessing.polynomial_decomposer.PolynomialDecomposer(time_index: str = None, degree: int = 1, period: int = - 1, random_seed: int = 0, **kwargs)[source]#

Removes trends and seasonality from time series by fitting a polynomial and moving average to the data.

Scikit-learn’s PolynomialForecaster is used to generate the additive trend portion of the target data. A polynomial

will be fit to the data during fit. That additive polynomial trend will be removed during fit so that statsmodel’s seasonal_decompose can determine the addititve seasonality of the data by using rolling averages over the series’ inferred periodicity.

For example, daily time series data will generate rolling averages over the first week of data, normalize out the mean and return those 7 averages repeated over the entire length of the given series. Those seven averages, repeated as many times as necessary to match the length of the given target data, will be used as the seasonal signal of the data.

Parameters
  • time_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.

  • degree (int) – Degree for the polynomial. If 1, linear model is fit to the data. If 2, quadratic model is fit, etc. Defaults to 1.

  • period (int) – The number of entries in the time series data that corresponds to one period of a cyclic signal. For instance, if data is known to possess a weekly seasonal signal, and if the data is daily data, period should be 7. For daily data with a yearly seasonal signal, period should be 365. Defaults to -1, which uses the statsmodels libarary’s freq_to_period function. statsmodels/statsmodels

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{ “degree”: Integer(1, 3)}

invalid_frequencies

[]

modifies_features

False

modifies_target

True

name

Polynomial Decomposer

needs_fitting

True

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

determine_periodicity

Function that uses autocorrelative methods to determine the likely most signficant period of the seasonal signal.

fit

Fits the PolynomialDecomposer and determine the seasonal signal.

fit_transform

Removes fitted trend and seasonality from target variable.

get_trend_dataframe

Return a list of dataframes with 4 columns: signal, trend, seasonality, residual.

inverse_transform

Adds back fitted trend and seasonality to target variable.

is_freq_valid

Determines if the given string represents a valid frequency for this decomposer.

load

Loads component at file path.

parameters

Returns the parameters which were used to initialize the component.

plot_decomposition

Plots the decomposition of the target signal.

save

Saves component at file path.

set_period

Function to set the component's seasonal period based on the target's seasonality.

transform

Transforms the target data by removing the polynomial trend and rolling average seasonality.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

classmethod determine_periodicity(cls, X: pandas.DataFrame, y: pandas.Series, acf_threshold: float = 0.01, rel_max_order: int = 5)#

Function that uses autocorrelative methods to determine the likely most signficant period of the seasonal signal.

Parameters
  • X (pandas.DataFrame) – The feature data of the time series problem.

  • y (pandas.Series) – The target data of a time series problem.

  • acf_threshold (float) – The threshold for the autocorrelation function to determine the period. Any values below the threshold are considered to be 0 and will not be considered for the period. Defaults to 0.01.

  • rel_max_order (int) – The order of the relative maximum to determine the period. Defaults to 5.

Returns

The integer number of entries in time series data over which the seasonal part of the target data

repeats. If the time series data is in days, then this is the number of days that it takes the target’s seasonal signal to repeat. Note: the target data can contain multiple seasonal signals. This function will only return the stronger. E.g. if the target has both weekly and yearly seasonality, the function may return either “7” or “365”, depending on which seasonality is more strongly autocorrelated. If no period is detected, returns None.

Return type

int

fit(self, X: pandas.DataFrame, y: pandas.Series = None) PolynomialDecomposer[source]#

Fits the PolynomialDecomposer and determine the seasonal signal.

Currently only fits the polynomial detrender. The seasonality is determined by removing the trend from the signal and using statsmodels’ seasonal_decompose(). Both the trend and seasonality are currently assumed to be additive.

Parameters
  • X (pd.DataFrame, optional) – Conditionally used to build datetime index.

  • y (pd.Series) – Target variable to detrend and deseasonalize.

Returns

self

Raises
  • NotImplementedError – If the input data has a frequency of “month-begin”. This isn’t supported by statsmodels decompose as the freqstr “MS” is misinterpreted as milliseconds.

  • ValueError – If y is None.

  • ValueError – If target data doesn’t have DatetimeIndex AND no Datetime features in features data

fit_transform(self, X: pandas.DataFrame, y: pandas.Series = None) tuple[pandas.DataFrame, pandas.Series]#

Removes fitted trend and seasonality from target variable.

Parameters
  • X (pd.DataFrame, optional) – Ignored.

  • y (pd.Series) – Target variable to detrend and deseasonalize.

Returns

The first element are the input features returned without modification.

The second element is the target variable y with the fitted trend removed.

Return type

tuple of pd.DataFrame, pd.Series

get_trend_dataframe(self, X: pandas.DataFrame, y: pandas.Series) list[pandas.DataFrame][source]#

Return a list of dataframes with 4 columns: signal, trend, seasonality, residual.

Scikit-learn’s PolynomialForecaster is used to generate the trend portion of the target data. statsmodel’s seasonal_decompose is used to generate the seasonality of the data.

Parameters
  • X (pd.DataFrame) – Input data with time series data in index.

  • y (pd.Series or pd.DataFrame) – Target variable data provided as a Series for univariate problems or a DataFrame for multivariate problems.

Returns

Each DataFrame contains the columns “signal”, “trend”, “seasonality” and “residual,”

with the latter 3 column values being the decomposed elements of the target data. The “signal” column is simply the input target signal but reindexed with a datetime index to match the input features.

Return type

list of pd.DataFrame

Raises
  • TypeError – If X does not have time-series data in the index.

  • ValueError – If time series index of X does not have an inferred frequency.

  • ValueError – If the forecaster associated with the detrender has not been fit yet.

  • TypeError – If y is not provided as a pandas Series or DataFrame.

inverse_transform(self, y_t: pandas.Series) tuple[pandas.DataFrame, pandas.Series][source]#

Adds back fitted trend and seasonality to target variable.

The polynomial trend is added back into the signal, calling the detrender’s inverse_transform(). Then, the seasonality is projected forward to and added back into the signal.

Parameters

y_t (pd.Series) – Target variable.

Returns

The first element are the input features returned without modification.

The second element is the target variable y with the trend and seasonality added back in.

Return type

tuple of pd.DataFrame, pd.Series

Raises

ValueError – If y is None.

classmethod is_freq_valid(cls, freq: str)#

Determines if the given string represents a valid frequency for this decomposer.

Parameters

freq (str) – A frequency to validate. See the pandas docs at https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases for options.

Returns

boolean representing whether the frequency is valid or not.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

property parameters(self)#

Returns the parameters which were used to initialize the component.

plot_decomposition(self, X: pandas.DataFrame, y: Union[pandas.Series, pandas.DataFrame], show: bool = False) Union[tuple[matplotlib.pyplot.Figure, list], dict[str, tuple[matplotlib.pyplot.Figure]]]#

Plots the decomposition of the target signal.

Parameters
  • X (pd.DataFrame) – Input data with time series data in index.

  • y (pd.Series or pd.DataFrame) – Target variable data provided as a Series for univariate problems or a DataFrame for multivariate problems.

  • show (bool) – Whether to display the plot or not. Defaults to False.

Returns

The figure and axes that have the decompositions

plotted on them

(Multi series) dict[str, (matplotlib.pyplot.Figure, list[matplotlib.pyplot.Axes])]: A dictionary that maps the series id

to the figure and axes that have the decompositions plotted on them

Return type

(Single series) matplotlib.pyplot.Figure, list[matplotlib.pyplot.Axes]

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

set_period(self, X: pandas.DataFrame, y: pandas.Series, acf_threshold: float = 0.01, rel_max_order: int = 5)#

Function to set the component’s seasonal period based on the target’s seasonality.

Parameters
  • X (pandas.DataFrame) – The feature data of the time series problem.

  • y (pandas.Series) – The target data of a time series problem.

  • acf_threshold (float) – The threshold for the autocorrelation function to determine the period. Any values below the threshold are considered to be 0 and will not be considered for the period. Defaults to 0.01.

  • rel_max_order (int) – The order of the relative maximum to determine the period. Defaults to 5.

transform(self, X: pandas.DataFrame, y: pandas.Series = None) tuple[pandas.DataFrame, pandas.Series][source]#

Transforms the target data by removing the polynomial trend and rolling average seasonality.

Applies the fit polynomial detrender to the target data, removing the additive polynomial trend. Then, utilizes the first period’s worth of seasonal data determined in the .fit() function to extrapolate the seasonal signal of the data to be transformed. This seasonal signal is also assumed to be additive and is removed.

Parameters
  • X (pd.DataFrame, optional) – Conditionally used to build datetime index.

  • y (pd.Series) – Target variable to detrend and deseasonalize.

Returns

The input features are returned without modification. The target

variable y is detrended and deseasonalized.

Return type

tuple of pd.DataFrame, pd.Series

Raises

ValueError – If target data doesn’t have DatetimeIndex AND no Datetime features in features data

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.