time_series_regression_pipeline ========================================================== .. py:module:: evalml.pipelines.time_series_regression_pipeline .. autoapi-nested-parse:: Pipeline base class for time series regression problems. Module Contents --------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline Contents ~~~~~~~~~~~~~~~~~~~ .. py:class:: TimeSeriesRegressionPipeline(component_graph, parameters=None, custom_name=None, random_seed=0) Pipeline base class for time series regression problems. :param component_graph: ComponentGraph instance, list of components in order, or dictionary of components. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component's index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"] :type component_graph: ComponentGraph, list, dict :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as time_index, gap, and max_delay must be specified with the "pipeline" key. For example: Pipeline(parameters={"pipeline": {"time_index": "Date", "max_delay": 4, "gap": 2}}). :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int .. rubric:: Example >>> pipeline = TimeSeriesRegressionPipeline(component_graph=["Simple Imputer", "Linear Regressor"], ... parameters={"Simple Imputer": {"impute_strategy": "mean"}, ... "pipeline": {"gap": 1, "max_delay": 1, "forecast_horizon": 1, "time_index": "date"}}, ... custom_name="My TimeSeriesRegression Pipeline") ... >>> assert pipeline.custom_name == "My TimeSeriesRegression Pipeline" >>> assert pipeline.component_graph.component_dict.keys() == {'Simple Imputer', 'Linear Regressor'} The pipeline parameters will be chosen from the default parameters for every component, unless specific parameters were passed in as they were above. >>> assert pipeline.parameters == { ... 'Simple Imputer': {'impute_strategy': 'mean', 'fill_value': None}, ... 'Linear Regressor': {'fit_intercept': True, 'n_jobs': -1}, ... 'pipeline': {'gap': 1, 'max_delay': 1, 'forecast_horizon': 1, 'time_index': "date"}} **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **NO_PREDS_PI_ESTIMATORS** - ProblemTypes.TIME_SERIES_REGRESSION * - **problem_type** - None **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.can_tune_threshold_with_objective evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.clone evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.create_objectives evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.custom_name evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.dates_needed_for_prediction evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.dates_needed_for_prediction_range evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.describe evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.feature_importance evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.fit evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.fit_transform evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.get_component evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.get_forecast_period evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.get_forecast_predictions evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.get_hyperparameter_ranges evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.get_prediction_intervals evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.graph evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.graph_dict evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.graph_feature_importance evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.inverse_transform evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.load evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.model_family evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.name evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.new evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.parameters evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.predict evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.predict_in_sample evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.save evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.score evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.summary evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.transform evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.transform_all_but_final .. py:method:: can_tune_threshold_with_objective(self, objective) Determine whether the threshold of a binary classification pipeline can be tuned. :param objective: Primary AutoMLSearch objective. :type objective: ObjectiveBase :returns: True if the pipeline threshold can be tuned. :rtype: bool .. py:method:: clone(self) Constructs a new pipeline with the same components, parameters, and random seed. :returns: A new instance of this pipeline with identical components, parameters, and random seed. .. py:method:: create_objectives(objectives) :staticmethod: Create objective instances from a list of strings or objective classes. .. py:method:: custom_name(self) :property: Custom name of the pipeline. .. py:method:: dates_needed_for_prediction(self, date) Return dates needed to forecast the given date in the future. :param date: Date to forecast in the future. :type date: pd.Timestamp :returns: Range of dates needed to forecast the given date. :rtype: dates_needed (tuple(pd.Timestamp)) .. py:method:: dates_needed_for_prediction_range(self, start_date, end_date) Return dates needed to forecast the given date in the future. :param start_date: Start date of range to forecast in the future. :type start_date: pd.Timestamp :param end_date: End date of range to forecast in the future. :type end_date: pd.Timestamp :returns: Range of dates needed to forecast the given date. :rtype: dates_needed (tuple(pd.Timestamp)) :raises ValueError: If start_date doesn't come before end_date .. py:method:: describe(self, return_dict=False) Outputs pipeline details including component parameters. :param return_dict: If True, return dictionary of information about pipeline. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None. :rtype: dict .. py:method:: feature_importance(self) :property: Importance associated with each feature. Features dropped by the feature selection are excluded. :returns: Feature names and their corresponding importance :rtype: pd.DataFrame .. py:method:: fit(self, X, y) Fit a time series pipeline. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: The target training targets of length [n_samples]. :type y: pd.Series, np.ndarray :returns: self :raises ValueError: If the target is not numeric. .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: get_component(self, name) Returns component by name. :param name: Name of component. :type name: str :returns: Component to return :rtype: Component .. py:method:: get_forecast_period(self, X) Generates all possible forecasting time points based on latest data point in X. :param X: Data the pipeline was trained on of shape [n_samples_train, n_feautures]. :type X: pd.DataFrame, np.ndarray :raises ValueError: If pipeline is not trained. :returns: Datetime periods from `gap` to `forecast_horizon + gap`. :rtype: pd.Series .. rubric:: Example >>> X = pd.DataFrame({'date': pd.date_range(start='1-1-2022', periods=10, freq='D'), 'feature': range(10, 20)}) >>> y = pd.Series(range(0, 10), name='target') >>> gap = 1 >>> forecast_horizon = 2 >>> pipeline = TimeSeriesRegressionPipeline(component_graph=["Linear Regressor"], ... parameters={"Simple Imputer": {"impute_strategy": "mean"}, ... "pipeline": {"gap": gap, "max_delay": 1, "forecast_horizon": forecast_horizon, "time_index": "date"}}, ... ) >>> pipeline.fit(X, y) pipeline = TimeSeriesRegressionPipeline(component_graph={'Linear Regressor': ['Linear Regressor', 'X', 'y']}, parameters={'Linear Regressor':{'fit_intercept': True, 'n_jobs': -1}, 'pipeline':{'gap': 1, 'max_delay': 1, 'forecast_horizon': 2, 'time_index': 'date'}}, random_seed=0) >>> dates = pipeline.get_forecast_period(X) >>> expected = pd.Series(pd.date_range(start='2022-01-11', periods=forecast_horizon, freq='D').shift(gap), name='date', index=[10, 11]) >>> assert dates.equals(expected) .. py:method:: get_forecast_predictions(self, X, y) Generates all possible forecasting predictions based on last period of X. :param X: Data the pipeline was trained on of shape [n_samples_train, n_feautures]. :type X: pd.DataFrame, np.ndarray :param y: Targets used to train the pipeline of shape [n_samples_train]. :type y: pd.Series, np.ndarray :returns: Predictions from `gap` periods out to `forecast_horizon + gap` periods. .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters) Returns hyperparameter ranges from all components as a dictionary. :param custom_hyperparameters: Custom hyperparameters for the pipeline. :type custom_hyperparameters: dict :returns: Dictionary of hyperparameter ranges for each component in the pipeline. :rtype: dict .. py:method:: get_prediction_intervals(self, X, y=None, X_train=None, y_train=None, coverage=None) Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. Certain estimators (Extra Trees Estimator, XGBoost Estimator, Prophet Estimator, ARIMA, and Exponential Smoothing estimator) utilize a different methodology to calculate prediction intervals. See the docs for these estimators to learn more. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_features]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: graph(self, filepath=None) Generate an image representing the pipeline graph. :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. :raises ValueError: If path is not writeable. .. py:method:: graph_dict(self) Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases. x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}}, "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...], "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]} :returns: A dictionary representing the DAG structure. :rtype: dag_dict (dict) .. py:method:: graph_feature_importance(self, importance_threshold=0) Generate a bar graph of the pipeline's feature importance. :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero. :type importance_threshold: float, optional :returns: A bar graph showing features and their corresponding importance. :rtype: plotly.Figure :raises ValueError: If importance threshold is not valid. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: Final component features. :type y: pd.Series :returns: The inverse transform of the target. :rtype: pd.Series .. py:method:: load(file_path: Union[str, io.BytesIO]) :staticmethod: Loads pipeline at file path. :param file_path: load filepath or a BytesIO object. :type file_path: str|BytesIO :returns: PipelineBase object .. py:method:: model_family(self) :property: Returns model family of this pipeline. .. py:method:: name(self) :property: Name of the pipeline. .. py:method:: new(self, parameters, random_seed=0) Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: A new instance of this pipeline with identical components. .. py:method:: parameters(self) :property: Parameter dictionary for this pipeline. :returns: Dictionary of all component parameters. :rtype: dict .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None) Predict on future data where target is not known. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param objective: The objective to use to make predictions. :type objective: Object or string :param X_train: Training data. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. :type y_train: pd.Series or None :raises ValueError: If X_train and/or y_train are None or if final component is not an Estimator. :returns: Predictions. .. py:method:: predict_in_sample(self, X, y, X_train, y_train, objective=None, calculating_residuals=False) Predict on future data where the target is known, e.g. cross validation. :param X: Future data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: Future target of shape [n_samples] :type y: pd.Series, np.ndarray :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_feautures] :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train] :type y_train: pd.Series, np.ndarray :param objective: Objective used to threshold predicted probabilities, optional. :type objective: ObjectiveBase, str, None :param calculating_residuals: Whether we're calling predict_in_sample to calculate the residuals. This means the X and y arguments are not future data, but actually the train data. :type calculating_residuals: bool :returns: Estimated labels. :rtype: pd.Series :raises ValueError: If final component is not an Estimator. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves pipeline at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None) Evaluate model performance on current and additional objectives. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: True labels of length [n_samples]. :type y: pd.Series :param objectives: Non-empty list of objectives to score on. :type objectives: list :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_feautures]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :returns: Ordered dictionary of objective scores. :rtype: dict .. py:method:: summary(self) :property: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder :returns: A string describing the pipeline structure. .. py:method:: transform(self, X, y=None) Transform the input. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None, calculating_residuals=False) Transforms the data by applying all pre-processing components. :param X: Input data to the pipeline to transform. :type X: pd.DataFrame :param y: Targets corresponding to the pipeline targets. :type y: pd.Series :param X_train: Training data used to generate generates from past observations. :type X_train: pd.DataFrame :param y_train: Training targets used to generate features from past observations. :type y_train: pd.Series :param calculating_residuals: Whether we're calling predict_in_sample to calculate the residuals. This means the X and y arguments are not future data, but actually the train data. :type calculating_residuals: bool :returns: New transformed features. :rtype: pd.DataFrame