time_series_regression_pipeline ========================================================== .. py:module:: evalml.pipelines.time_series_regression_pipeline .. autoapi-nested-parse:: Pipeline base class for time series regression problems. Module Contents --------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline Contents ~~~~~~~~~~~~~~~~~~~ .. py:class:: TimeSeriesRegressionPipeline(component_graph, parameters=None, custom_name=None, random_seed=0) Pipeline base class for time series regression problems. :param component_graph: ComponentGraph instance, list of components in order, or dictionary of components. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component's index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"] :type component_graph: ComponentGraph, list, dict :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as time_index, gap, and max_delay must be specified with the "pipeline" key. For example: Pipeline(parameters={"pipeline": {"time_index": "Date", "max_delay": 4, "gap": 2}}). :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int .. rubric:: Example >>> pipeline = TimeSeriesRegressionPipeline(component_graph=["Simple Imputer", "Linear Regressor"], ... parameters={"Linear Regressor": {"normalize": True}, ... "pipeline": {"gap": 1, "max_delay": 1, "forecast_horizon": 1, "time_index": "date"}}, ... custom_name="My TimeSeriesRegression Pipeline") ... >>> assert pipeline.custom_name == "My TimeSeriesRegression Pipeline" >>> assert pipeline.component_graph.component_dict.keys() == {'Simple Imputer', 'Linear Regressor'} The pipeline parameters will be chosen from the default parameters for every component, unless specific parameters were passed in as they were above. >>> assert pipeline.parameters == { ... 'Simple Imputer': {'impute_strategy': 'most_frequent', 'fill_value': None}, ... 'Linear Regressor': {'fit_intercept': True, 'normalize': True, 'n_jobs': -1}, ... 'pipeline': {'gap': 1, 'max_delay': 1, 'forecast_horizon': 1, 'time_index': "date"}} **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **problem_type** - ProblemTypes.TIME_SERIES_REGRESSION **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.can_tune_threshold_with_objective evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.clone evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.create_objectives evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.custom_name evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.describe evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.feature_importance evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.fit evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.fit_transform evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.get_component evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.get_forecast_period evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.get_forecast_predictions evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.get_hyperparameter_ranges evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.graph evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.graph_dict evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.graph_feature_importance evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.inverse_transform evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.load evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.model_family evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.name evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.new evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.parameters evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.predict evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.predict_in_sample evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.save evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.score evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.summary evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.transform evalml.pipelines.time_series_regression_pipeline.TimeSeriesRegressionPipeline.transform_all_but_final .. py:method:: can_tune_threshold_with_objective(self, objective) Determine whether the threshold of a binary classification pipeline can be tuned. :param objective: Primary AutoMLSearch objective. :type objective: ObjectiveBase :returns: True if the pipeline threshold can be tuned. :rtype: bool .. py:method:: clone(self) Constructs a new pipeline with the same components, parameters, and random seed. :returns: A new instance of this pipeline with identical components, parameters, and random seed. .. py:method:: create_objectives(objectives) :staticmethod: Create objective instances from a list of strings or objective classes. .. py:method:: custom_name(self) :property: Custom name of the pipeline. .. py:method:: describe(self, return_dict=False) Outputs pipeline details including component parameters. :param return_dict: If True, return dictionary of information about pipeline. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None. :rtype: dict .. py:method:: feature_importance(self) :property: Importance associated with each feature. Features dropped by the feature selection are excluded. :returns: Feature names and their corresponding importance :rtype: pd.DataFrame .. py:method:: fit(self, X, y) Fit a time series pipeline. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: The target training targets of length [n_samples]. :type y: pd.Series, np.ndarray :returns: self :raises ValueError: If the target is not numeric. .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: get_component(self, name) Returns component by name. :param name: Name of component. :type name: str :returns: Component to return :rtype: Component .. py:method:: get_forecast_period(self, X) Generates all possible forecasting time points based on latest data point in X. :param X: Data the pipeline was trained on of shape [n_samples_train, n_feautures]. :type X: pd.DataFrame, np.ndarray :raises ValueError: If pipeline is not trained. :returns: Datetime periods out to `forecast_horizon + gap`. :rtype: pd.Series .. rubric:: Example >>> X = pd.DataFrame({'date': pd.date_range(start='1-1-2022', periods=10, freq='D'), 'feature': range(10, 20)}) >>> y = pd.Series(range(0, 10), name='target') >>> gap = 1 >>> forecast_horizon = 2 >>> pipeline = TimeSeriesRegressionPipeline(component_graph=["Linear Regressor"], ... parameters={"Linear Regressor": {"normalize": True}, ... "pipeline": {"gap": gap, "max_delay": 1, "forecast_horizon": forecast_horizon, "time_index": "date"}}, ... ) >>> pipeline.fit(X, y) pipeline = TimeSeriesRegressionPipeline(component_graph={'Linear Regressor': ['Linear Regressor', 'X', 'y']}, parameters={'Linear Regressor':{'fit_intercept': True, 'normalize': True, 'n_jobs': -1}, 'pipeline':{'gap': 1, 'max_delay': 1, 'forecast_horizon': 2, 'time_index': 'date'}}, random_seed=0) >>> dates = pipeline.get_forecast_period(X) >>> expected = pd.Series(pd.date_range(start='2022-01-11', periods=(gap + forecast_horizon), freq='D'), name='date', index=[10, 11, 12]) >>> assert dates.equals(expected) .. py:method:: get_forecast_predictions(self, X, y) Generates all possible forecasting predictions based on last period of X. :param X: Data the pipeline was trained on of shape [n_samples_train, n_feautures]. :type X: pd.DataFrame, np.ndarray :param y: Targets used to train the pipeline of shape [n_samples_train]. :type y: pd.Series, np.ndarray :returns: Predictions out to `forecast_horizon + gap` periods. .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters) Returns hyperparameter ranges from all components as a dictionary. :param custom_hyperparameters: Custom hyperparameters for the pipeline. :type custom_hyperparameters: dict :returns: Dictionary of hyperparameter ranges for each component in the pipeline. :rtype: dict .. py:method:: graph(self, filepath=None) Generate an image representing the pipeline graph. :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. :raises ValueError: If path is not writeable. .. py:method:: graph_dict(self) Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases. x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}}, "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...], "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]} :returns: A dictionary representing the DAG structure. :rtype: dag_dict (dict) .. py:method:: graph_feature_importance(self, importance_threshold=0) Generate a bar graph of the pipeline's feature importance. :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero. :type importance_threshold: float, optional :returns: A bar graph showing features and their corresponding importance. :rtype: plotly.Figure :raises ValueError: If importance threshold is not valid. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: Final component features. :type y: pd.Series :returns: The inverse transform of the target. :rtype: pd.Series .. py:method:: load(file_path) :staticmethod: Loads pipeline at file path. :param file_path: Location to load file. :type file_path: str :returns: PipelineBase object .. py:method:: model_family(self) :property: Returns model family of this pipeline. .. py:method:: name(self) :property: Name of the pipeline. .. py:method:: new(self, parameters, random_seed=0) Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: A new instance of this pipeline with identical components. .. py:method:: parameters(self) :property: Parameter dictionary for this pipeline. :returns: Dictionary of all component parameters. :rtype: dict .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None) Predict on future data where target is not known. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param objective: The objective to use to make predictions. :type objective: Object or string :param X_train: Training data. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. :type y_train: pd.Series or None :raises ValueError: If X_train and/or y_train are None or if final component is not an Estimator. :returns: Predictions. .. py:method:: predict_in_sample(self, X, y, X_train, y_train, objective=None) Predict on future data where the target is known, e.g. cross validation. :param X: Future data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: Future target of shape [n_samples] :type y: pd.Series, np.ndarray :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_feautures] :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train] :type y_train: pd.Series, np.ndarray :param objective: Objective used to threshold predicted probabilities, optional. :type objective: ObjectiveBase, str, None :returns: Estimated labels. :rtype: pd.Series :raises ValueError: If final component is not an Estimator. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves pipeline at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None) Evaluate model performance on current and additional objectives. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: True labels of length [n_samples]. :type y: pd.Series :param objectives: Non-empty list of objectives to score on. :type objectives: list :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_feautures]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :returns: Ordered dictionary of objective scores. :rtype: dict .. py:method:: summary(self) :property: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder :returns: A string describing the pipeline structure. .. py:method:: transform(self, X, y=None) Transform the input. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None) Transforms the data by applying all pre-processing components. :param X: Input data to the pipeline to transform. :type X: pd.DataFrame :param y: Targets corresponding to the pipeline targets. :type y: pd.Series :param X_train: Training data used to generate generates from past observations. :type X_train: pd.DataFrame :param y_train: Training targets used to generate features from past observations. :type y_train: pd.Series :returns: New transformed features. :rtype: pd.DataFrame