Pipelines ========================== .. py:module:: evalml.pipelines .. autoapi-nested-parse:: EvalML pipelines. Subpackages ----------- .. toctree:: :titlesonly: :maxdepth: 3 components/index.rst Submodules ---------- .. toctree:: :titlesonly: :maxdepth: 1 binary_classification_pipeline/index.rst binary_classification_pipeline_mixin/index.rst classification_pipeline/index.rst component_graph/index.rst multiclass_classification_pipeline/index.rst pipeline_base/index.rst pipeline_meta/index.rst regression_pipeline/index.rst time_series_classification_pipelines/index.rst time_series_pipeline_base/index.rst time_series_regression_pipeline/index.rst utils/index.rst Package Contents ---------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.pipelines.ARIMARegressor evalml.pipelines.BinaryClassificationPipeline evalml.pipelines.CatBoostClassifier evalml.pipelines.CatBoostRegressor evalml.pipelines.ClassificationPipeline evalml.pipelines.ComponentGraph evalml.pipelines.DecisionTreeClassifier evalml.pipelines.DecisionTreeRegressor evalml.pipelines.DFSTransformer evalml.pipelines.DropNaNRowsTransformer evalml.pipelines.ElasticNetClassifier evalml.pipelines.ElasticNetRegressor evalml.pipelines.Estimator evalml.pipelines.ExponentialSmoothingRegressor evalml.pipelines.ExtraTreesClassifier evalml.pipelines.ExtraTreesRegressor evalml.pipelines.FeatureSelector evalml.pipelines.Imputer evalml.pipelines.KNeighborsClassifier evalml.pipelines.LightGBMClassifier evalml.pipelines.LightGBMRegressor evalml.pipelines.LinearRegressor evalml.pipelines.LogisticRegressionClassifier evalml.pipelines.MulticlassClassificationPipeline evalml.pipelines.OneHotEncoder evalml.pipelines.OrdinalEncoder evalml.pipelines.PerColumnImputer evalml.pipelines.PipelineBase evalml.pipelines.ProphetRegressor evalml.pipelines.RandomForestClassifier evalml.pipelines.RandomForestRegressor evalml.pipelines.RegressionPipeline evalml.pipelines.RFClassifierSelectFromModel evalml.pipelines.RFRegressorSelectFromModel evalml.pipelines.SimpleImputer evalml.pipelines.StackedEnsembleBase evalml.pipelines.StackedEnsembleClassifier evalml.pipelines.StackedEnsembleRegressor evalml.pipelines.StandardScaler evalml.pipelines.SVMClassifier evalml.pipelines.SVMRegressor evalml.pipelines.TargetEncoder evalml.pipelines.TimeSeriesBinaryClassificationPipeline evalml.pipelines.TimeSeriesClassificationPipeline evalml.pipelines.TimeSeriesFeaturizer evalml.pipelines.TimeSeriesImputer evalml.pipelines.TimeSeriesMulticlassClassificationPipeline evalml.pipelines.TimeSeriesRegressionPipeline evalml.pipelines.TimeSeriesRegularizer evalml.pipelines.Transformer evalml.pipelines.VowpalWabbitBinaryClassifier evalml.pipelines.VowpalWabbitMulticlassClassifier evalml.pipelines.VowpalWabbitRegressor evalml.pipelines.XGBoostClassifier evalml.pipelines.XGBoostRegressor Contents ~~~~~~~~~~~~~~~~~~~ .. py:class:: ARIMARegressor(time_index: Optional[Hashable] = None, trend: Optional[str] = None, start_p: int = 2, d: int = 0, start_q: int = 2, max_p: int = 5, max_d: int = 2, max_q: int = 5, seasonal: bool = True, sp: int = 1, n_jobs: int = -1, random_seed: Union[int, float] = 0, maxiter: int = 10, use_covariates: bool = True, **kwargs) Autoregressive Integrated Moving Average Model. The three parameters (p, d, q) are the AR order, the degree of differencing, and the MA order. More information here: https://www.statsmodels.org/devel/generated/statsmodels.tsa.arima.model.ARIMA.html. Currently ARIMARegressor isn't supported via conda install. It's recommended that it be installed via PyPI. :param time_index: Specifies the name of the column in X that provides the datetime objects. Defaults to None. :type time_index: str :param trend: Controls the deterministic trend. Options are ['n', 'c', 't', 'ct'] where 'c' is a constant term, 't' indicates a linear trend, and 'ct' is both. Can also be an iterable when defining a polynomial, such as [1, 1, 0, 1]. :type trend: str :param start_p: Minimum Autoregressive order. Defaults to 2. :type start_p: int :param d: Minimum Differencing degree. Defaults to 0. :type d: int :param start_q: Minimum Moving Average order. Defaults to 2. :type start_q: int :param max_p: Maximum Autoregressive order. Defaults to 5. :type max_p: int :param max_d: Maximum Differencing degree. Defaults to 2. :type max_d: int :param max_q: Maximum Moving Average order. Defaults to 5. :type max_q: int :param seasonal: Whether to fit a seasonal model to ARIMA. Defaults to True. :type seasonal: boolean :param sp: Period for seasonal differencing, specifically the number of periods in each season. If "detect", this model will automatically detect this parameter (given the time series is a standard frequency) and will fall back to 1 (no seasonality) if it cannot be detected. Defaults to 1. :type sp: int or str :param n_jobs: Non-negative integer describing level of parallelism used for pipelines. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "start_p": Integer(1, 3), "d": Integer(0, 2), "start_q": Integer(1, 3), "max_p": Integer(3, 10), "max_d": Integer(2, 5), "max_q": Integer(3, 10), "seasonal": [True, False],} * - **max_cols** - 7 * - **max_rows** - 1000 * - **model_family** - ModelFamily.ARIMA * - **modifies_features** - True * - **modifies_target** - False * - **name** - ARIMA Regressor * - **supported_problem_types** - [ProblemTypes.TIME_SERIES_REGRESSION] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.ARIMARegressor.clone evalml.pipelines.ARIMARegressor.default_parameters evalml.pipelines.ARIMARegressor.describe evalml.pipelines.ARIMARegressor.feature_importance evalml.pipelines.ARIMARegressor.fit evalml.pipelines.ARIMARegressor.get_prediction_intervals evalml.pipelines.ARIMARegressor.load evalml.pipelines.ARIMARegressor.needs_fitting evalml.pipelines.ARIMARegressor.parameters evalml.pipelines.ARIMARegressor.predict evalml.pipelines.ARIMARegressor.predict_proba evalml.pipelines.ARIMARegressor.save evalml.pipelines.ARIMARegressor.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) -> numpy.ndarray :property: Returns array of 0's with a length of 1 as feature_importance is not defined for ARIMA regressor. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits ARIMA regressor to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self :raises ValueError: If y was not passed in. .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: pandas.Series = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted ARIMARegressor. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Optional. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Not used for ARIMA regressor. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) -> pandas.Series Make predictions using fitted ARIMA regressor. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Predicted values. :rtype: pd.Series :raises ValueError: If X was passed to `fit` but not passed in `predict`. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: BinaryClassificationPipeline(component_graph, parameters=None, custom_name=None, random_seed=0) Pipeline subclass for all binary classification pipelines. :param component_graph: ComponentGraph instance, list of components in order, or dictionary of components. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component's index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"] :type component_graph: ComponentGraph, list, dict :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param custom_name: Custom name for the pipeline. Defaults to None. :type custom_name: str :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int .. rubric:: Example >>> pipeline = BinaryClassificationPipeline(component_graph=["Simple Imputer", "Logistic Regression Classifier"], ... parameters={"Logistic Regression Classifier": {"penalty": "elasticnet", ... "solver": "liblinear"}}, ... custom_name="My Binary Pipeline") ... >>> assert pipeline.custom_name == "My Binary Pipeline" >>> assert pipeline.component_graph.component_dict.keys() == {'Simple Imputer', 'Logistic Regression Classifier'} The pipeline parameters will be chosen from the default parameters for every component, unless specific parameters were passed in as they were above. >>> assert pipeline.parameters == { ... 'Simple Imputer': {'impute_strategy': 'most_frequent', 'fill_value': None}, ... 'Logistic Regression Classifier': {'penalty': 'elasticnet', ... 'C': 1.0, ... 'n_jobs': -1, ... 'multi_class': 'auto', ... 'solver': 'liblinear'}} **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **problem_type** - ProblemTypes.BINARY **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.BinaryClassificationPipeline.can_tune_threshold_with_objective evalml.pipelines.BinaryClassificationPipeline.classes_ evalml.pipelines.BinaryClassificationPipeline.clone evalml.pipelines.BinaryClassificationPipeline.create_objectives evalml.pipelines.BinaryClassificationPipeline.custom_name evalml.pipelines.BinaryClassificationPipeline.describe evalml.pipelines.BinaryClassificationPipeline.feature_importance evalml.pipelines.BinaryClassificationPipeline.fit evalml.pipelines.BinaryClassificationPipeline.fit_transform evalml.pipelines.BinaryClassificationPipeline.get_component evalml.pipelines.BinaryClassificationPipeline.get_hyperparameter_ranges evalml.pipelines.BinaryClassificationPipeline.graph evalml.pipelines.BinaryClassificationPipeline.graph_dict evalml.pipelines.BinaryClassificationPipeline.graph_feature_importance evalml.pipelines.BinaryClassificationPipeline.inverse_transform evalml.pipelines.BinaryClassificationPipeline.load evalml.pipelines.BinaryClassificationPipeline.model_family evalml.pipelines.BinaryClassificationPipeline.name evalml.pipelines.BinaryClassificationPipeline.new evalml.pipelines.BinaryClassificationPipeline.optimize_threshold evalml.pipelines.BinaryClassificationPipeline.parameters evalml.pipelines.BinaryClassificationPipeline.predict evalml.pipelines.BinaryClassificationPipeline.predict_proba evalml.pipelines.BinaryClassificationPipeline.save evalml.pipelines.BinaryClassificationPipeline.score evalml.pipelines.BinaryClassificationPipeline.summary evalml.pipelines.BinaryClassificationPipeline.threshold evalml.pipelines.BinaryClassificationPipeline.transform evalml.pipelines.BinaryClassificationPipeline.transform_all_but_final .. py:method:: can_tune_threshold_with_objective(self, objective) Determine whether the threshold of a binary classification pipeline can be tuned. :param objective: Primary AutoMLSearch objective. :type objective: ObjectiveBase :returns: True if the pipeline threshold can be tuned. :rtype: bool .. py:method:: classes_(self) :property: Gets the class names for the pipeline. Will return None before pipeline is fit. .. py:method:: clone(self) Constructs a new pipeline with the same components, parameters, and random seed. :returns: A new instance of this pipeline with identical components, parameters, and random seed. .. py:method:: create_objectives(objectives) :staticmethod: Create objective instances from a list of strings or objective classes. .. py:method:: custom_name(self) :property: Custom name of the pipeline. .. py:method:: describe(self, return_dict=False) Outputs pipeline details including component parameters. :param return_dict: If True, return dictionary of information about pipeline. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None. :rtype: dict .. py:method:: feature_importance(self) :property: Importance associated with each feature. Features dropped by the feature selection are excluded. :returns: Feature names and their corresponding importance :rtype: pd.DataFrame .. py:method:: fit(self, X, y) Build a classification model. For string and categorical targets, classes are sorted by sorted(set(y)) and then are mapped to values between 0 and n_classes-1. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: The target training labels of length [n_samples] :type y: pd.Series, np.ndarray :returns: self :raises ValueError: If the number of unique classes in y are not appropriate for the type of pipeline. :raises TypeError: If the dtype is boolean but pd.NA exists in the series. :raises Exception: For all other exceptions. .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: get_component(self, name) Returns component by name. :param name: Name of component. :type name: str :returns: Component to return :rtype: Component .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters) Returns hyperparameter ranges from all components as a dictionary. :param custom_hyperparameters: Custom hyperparameters for the pipeline. :type custom_hyperparameters: dict :returns: Dictionary of hyperparameter ranges for each component in the pipeline. :rtype: dict .. py:method:: graph(self, filepath=None) Generate an image representing the pipeline graph. :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. :raises ValueError: If path is not writeable. .. py:method:: graph_dict(self) Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases. x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}}, "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...], "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]} :returns: A dictionary representing the DAG structure. :rtype: dag_dict (dict) .. py:method:: graph_feature_importance(self, importance_threshold=0) Generate a bar graph of the pipeline's feature importance. :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero. :type importance_threshold: float, optional :returns: A bar graph showing features and their corresponding importance. :rtype: plotly.Figure :raises ValueError: If importance threshold is not valid. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: Final component features. :type y: pd.Series :returns: The inverse transform of the target. :rtype: pd.Series .. py:method:: load(file_path: Union[str, io.BytesIO]) :staticmethod: Loads pipeline at file path. :param file_path: load filepath or a BytesIO object. :type file_path: str|BytesIO :returns: PipelineBase object .. py:method:: model_family(self) :property: Returns model family of this pipeline. .. py:method:: name(self) :property: Name of the pipeline. .. py:method:: new(self, parameters, random_seed=0) Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: A new instance of this pipeline with identical components. .. py:method:: optimize_threshold(self, X, y, y_pred_proba, objective) Optimize the pipeline threshold given the objective to use. Only used for binary problems with objectives whose thresholds can be tuned. :param X: Input features. :type X: pd.DataFrame :param y: Input target values. :type y: pd.Series :param y_pred_proba: The predicted probabilities of the target outputted by the pipeline. :type y_pred_proba: pd.Series :param objective: The objective to threshold with. Must have a tunable threshold. :type objective: ObjectiveBase :raises ValueError: If objective is not optimizable. .. py:method:: parameters(self) :property: Parameter dictionary for this pipeline. :returns: Dictionary of all component parameters. :rtype: dict .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None) Make predictions using selected features. Note: we cast y as ints first to address boolean values that may be returned from calculating predictions which we would not be able to otherwise transform if we originally had integer targets. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param objective: The objective to use to make predictions. :type objective: Object or string :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series :returns: Estimated labels. :rtype: pd.Series .. py:method:: predict_proba(self, X, X_train=None, y_train=None) Make probability estimates for labels. Assumes that the column at index 1 represents the positive label case. :param X: Data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series or None :returns: Probability estimates :rtype: pd.Series .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves pipeline at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None) Evaluate model performance on objectives. :param X: Data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: True labels of length [n_samples] :type y: pd.Series :param objectives: List of objectives to score :type objectives: list :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series :returns: Ordered dictionary of objective scores. :rtype: dict .. py:method:: summary(self) :property: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder :returns: A string describing the pipeline structure. .. py:method:: threshold(self) :property: Threshold used to make a prediction. Defaults to None. .. py:method:: transform(self, X, y=None) Transform the input. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None) Transforms the data by applying all pre-processing components. :param X: Input data to the pipeline to transform. :type X: pd.DataFrame :param y: Targets corresponding to X. Optional. :type y: pd.Series or None :param X_train: Training data. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Only used for time series. :type y_train: pd.Series or None :returns: New transformed features. :rtype: pd.DataFrame .. py:class:: CatBoostClassifier(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=True, allow_writing_files=False, random_seed=0, n_jobs=-1, **kwargs) CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features. For more information, check out https://catboost.ai/ :param n_estimators: The maximum number of trees to build. Defaults to 10. :type n_estimators: float :param eta: The learning rate. Defaults to 0.03. :type eta: float :param max_depth: The maximum tree depth for base learners. Defaults to 6. :type max_depth: int :param bootstrap_type: Defines the method for sampling the weights of objects. Available methods are 'Bayesian', 'Bernoulli', 'MVS'. Defaults to None. :type bootstrap_type: string :param silent: Whether to use the "silent" logging mode. Defaults to True. :type silent: boolean :param allow_writing_files: Whether to allow writing snapshot files while training. Defaults to False. :type allow_writing_files: boolean :param n_jobs: Number of jobs to run in parallel. -1 uses all processes. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "n_estimators": Integer(4, 100), "eta": Real(0.000001, 1), "max_depth": Integer(4, 10),} * - **model_family** - ModelFamily.CATBOOST * - **modifies_features** - True * - **modifies_target** - False * - **name** - CatBoost Classifier * - **supported_problem_types** - [ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.CatBoostClassifier.clone evalml.pipelines.CatBoostClassifier.default_parameters evalml.pipelines.CatBoostClassifier.describe evalml.pipelines.CatBoostClassifier.feature_importance evalml.pipelines.CatBoostClassifier.fit evalml.pipelines.CatBoostClassifier.get_prediction_intervals evalml.pipelines.CatBoostClassifier.load evalml.pipelines.CatBoostClassifier.needs_fitting evalml.pipelines.CatBoostClassifier.parameters evalml.pipelines.CatBoostClassifier.predict evalml.pipelines.CatBoostClassifier.predict_proba evalml.pipelines.CatBoostClassifier.save evalml.pipelines.CatBoostClassifier.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Feature importance of fitted CatBoost classifier. .. py:method:: fit(self, X, y=None) Fits CatBoost classifier component to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X) Make predictions using the fitted CatBoost classifier. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series .. py:method:: predict_proba(self, X) Make prediction probabilities using the fitted CatBoost classifier. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted probability values. :rtype: pd.DataFrame .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: CatBoostRegressor(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=False, allow_writing_files=False, random_seed=0, n_jobs=-1, **kwargs) CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features. For more information, check out https://catboost.ai/ :param n_estimators: The maximum number of trees to build. Defaults to 10. :type n_estimators: float :param eta: The learning rate. Defaults to 0.03. :type eta: float :param max_depth: The maximum tree depth for base learners. Defaults to 6. :type max_depth: int :param bootstrap_type: Defines the method for sampling the weights of objects. Available methods are 'Bayesian', 'Bernoulli', 'MVS'. Defaults to None. :type bootstrap_type: string :param silent: Whether to use the "silent" logging mode. Defaults to True. :type silent: boolean :param allow_writing_files: Whether to allow writing snapshot files while training. Defaults to False. :type allow_writing_files: boolean :param n_jobs: Number of jobs to run in parallel. -1 uses all processes. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "n_estimators": Integer(4, 100), "eta": Real(0.000001, 1), "max_depth": Integer(4, 10),} * - **model_family** - ModelFamily.CATBOOST * - **modifies_features** - True * - **modifies_target** - False * - **name** - CatBoost Regressor * - **supported_problem_types** - [ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.CatBoostRegressor.clone evalml.pipelines.CatBoostRegressor.default_parameters evalml.pipelines.CatBoostRegressor.describe evalml.pipelines.CatBoostRegressor.feature_importance evalml.pipelines.CatBoostRegressor.fit evalml.pipelines.CatBoostRegressor.get_prediction_intervals evalml.pipelines.CatBoostRegressor.load evalml.pipelines.CatBoostRegressor.needs_fitting evalml.pipelines.CatBoostRegressor.parameters evalml.pipelines.CatBoostRegressor.predict evalml.pipelines.CatBoostRegressor.predict_proba evalml.pipelines.CatBoostRegressor.save evalml.pipelines.CatBoostRegressor.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Feature importance of fitted CatBoost regressor. .. py:method:: fit(self, X, y=None) Fits CatBoost regressor component to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X) Make predictions using the fitted CatBoost regressor. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.DataFrame .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: ClassificationPipeline(component_graph, parameters=None, custom_name=None, random_seed=0) Pipeline subclass for all classification pipelines. :param component_graph: List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component's index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"] :type component_graph: list or dict :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param custom_name: Custom name for the pipeline. Defaults to None. :type custom_name: str :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **problem_type** - None **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.ClassificationPipeline.can_tune_threshold_with_objective evalml.pipelines.ClassificationPipeline.classes_ evalml.pipelines.ClassificationPipeline.clone evalml.pipelines.ClassificationPipeline.create_objectives evalml.pipelines.ClassificationPipeline.custom_name evalml.pipelines.ClassificationPipeline.describe evalml.pipelines.ClassificationPipeline.feature_importance evalml.pipelines.ClassificationPipeline.fit evalml.pipelines.ClassificationPipeline.fit_transform evalml.pipelines.ClassificationPipeline.get_component evalml.pipelines.ClassificationPipeline.get_hyperparameter_ranges evalml.pipelines.ClassificationPipeline.graph evalml.pipelines.ClassificationPipeline.graph_dict evalml.pipelines.ClassificationPipeline.graph_feature_importance evalml.pipelines.ClassificationPipeline.inverse_transform evalml.pipelines.ClassificationPipeline.load evalml.pipelines.ClassificationPipeline.model_family evalml.pipelines.ClassificationPipeline.name evalml.pipelines.ClassificationPipeline.new evalml.pipelines.ClassificationPipeline.parameters evalml.pipelines.ClassificationPipeline.predict evalml.pipelines.ClassificationPipeline.predict_proba evalml.pipelines.ClassificationPipeline.save evalml.pipelines.ClassificationPipeline.score evalml.pipelines.ClassificationPipeline.summary evalml.pipelines.ClassificationPipeline.transform evalml.pipelines.ClassificationPipeline.transform_all_but_final .. py:method:: can_tune_threshold_with_objective(self, objective) Determine whether the threshold of a binary classification pipeline can be tuned. :param objective: Primary AutoMLSearch objective. :type objective: ObjectiveBase :returns: True if the pipeline threshold can be tuned. :rtype: bool .. py:method:: classes_(self) :property: Gets the class names for the pipeline. Will return None before pipeline is fit. .. py:method:: clone(self) Constructs a new pipeline with the same components, parameters, and random seed. :returns: A new instance of this pipeline with identical components, parameters, and random seed. .. py:method:: create_objectives(objectives) :staticmethod: Create objective instances from a list of strings or objective classes. .. py:method:: custom_name(self) :property: Custom name of the pipeline. .. py:method:: describe(self, return_dict=False) Outputs pipeline details including component parameters. :param return_dict: If True, return dictionary of information about pipeline. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None. :rtype: dict .. py:method:: feature_importance(self) :property: Importance associated with each feature. Features dropped by the feature selection are excluded. :returns: Feature names and their corresponding importance :rtype: pd.DataFrame .. py:method:: fit(self, X, y) Build a classification model. For string and categorical targets, classes are sorted by sorted(set(y)) and then are mapped to values between 0 and n_classes-1. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: The target training labels of length [n_samples] :type y: pd.Series, np.ndarray :returns: self :raises ValueError: If the number of unique classes in y are not appropriate for the type of pipeline. :raises TypeError: If the dtype is boolean but pd.NA exists in the series. :raises Exception: For all other exceptions. .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: get_component(self, name) Returns component by name. :param name: Name of component. :type name: str :returns: Component to return :rtype: Component .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters) Returns hyperparameter ranges from all components as a dictionary. :param custom_hyperparameters: Custom hyperparameters for the pipeline. :type custom_hyperparameters: dict :returns: Dictionary of hyperparameter ranges for each component in the pipeline. :rtype: dict .. py:method:: graph(self, filepath=None) Generate an image representing the pipeline graph. :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. :raises ValueError: If path is not writeable. .. py:method:: graph_dict(self) Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases. x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}}, "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...], "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]} :returns: A dictionary representing the DAG structure. :rtype: dag_dict (dict) .. py:method:: graph_feature_importance(self, importance_threshold=0) Generate a bar graph of the pipeline's feature importance. :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero. :type importance_threshold: float, optional :returns: A bar graph showing features and their corresponding importance. :rtype: plotly.Figure :raises ValueError: If importance threshold is not valid. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: Final component features. :type y: pd.Series :returns: The inverse transform of the target. :rtype: pd.Series .. py:method:: load(file_path: Union[str, io.BytesIO]) :staticmethod: Loads pipeline at file path. :param file_path: load filepath or a BytesIO object. :type file_path: str|BytesIO :returns: PipelineBase object .. py:method:: model_family(self) :property: Returns model family of this pipeline. .. py:method:: name(self) :property: Name of the pipeline. .. py:method:: new(self, parameters, random_seed=0) Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: A new instance of this pipeline with identical components. .. py:method:: parameters(self) :property: Parameter dictionary for this pipeline. :returns: Dictionary of all component parameters. :rtype: dict .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None) Make predictions using selected features. Note: we cast y as ints first to address boolean values that may be returned from calculating predictions which we would not be able to otherwise transform if we originally had integer targets. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param objective: The objective to use to make predictions. :type objective: Object or string :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series :returns: Estimated labels. :rtype: pd.Series .. py:method:: predict_proba(self, X, X_train=None, y_train=None) Make probability estimates for labels. :param X: Data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series or None :returns: Probability estimates :rtype: pd.DataFrame :raises ValueError: If final component is not an estimator. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves pipeline at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None) Evaluate model performance on objectives. :param X: Data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: True labels of length [n_samples] :type y: pd.Series :param objectives: List of objectives to score :type objectives: list :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series :returns: Ordered dictionary of objective scores. :rtype: dict .. py:method:: summary(self) :property: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder :returns: A string describing the pipeline structure. .. py:method:: transform(self, X, y=None) Transform the input. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None) Transforms the data by applying all pre-processing components. :param X: Input data to the pipeline to transform. :type X: pd.DataFrame :param y: Targets corresponding to X. Optional. :type y: pd.Series or None :param X_train: Training data. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Only used for time series. :type y_train: pd.Series or None :returns: New transformed features. :rtype: pd.DataFrame .. py:class:: ComponentGraph(component_dict=None, cached_data=None, random_seed=0) Component graph for a pipeline as a directed acyclic graph (DAG). :param component_dict: A dictionary which specifies the components and edges between components that should be used to create the component graph. Defaults to None. :type component_dict: dict :param cached_data: A dictionary of nested cached data. If the hashes and components are in this cache, we skip fitting for these components. Expected to be of format {hash1: {component_name: trained_component, ...}, hash2: {...}, ...}. Defaults to None. :type cached_data: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int .. rubric:: Examples >>> component_dict = {'Imputer': ['Imputer', 'X', 'y'], ... 'Logistic Regression': ['Logistic Regression Classifier', 'Imputer.x', 'y']} >>> component_graph = ComponentGraph(component_dict) >>> assert component_graph.compute_order == ['Imputer', 'Logistic Regression'] ... ... >>> component_dict = {'Imputer': ['Imputer', 'X', 'y'], ... 'OHE': ['One Hot Encoder', 'Imputer.x', 'y'], ... 'estimator_1': ['Random Forest Classifier', 'OHE.x', 'y'], ... 'estimator_2': ['Decision Tree Classifier', 'OHE.x', 'y'], ... 'final': ['Logistic Regression Classifier', 'estimator_1.x', 'estimator_2.x', 'y']} >>> component_graph = ComponentGraph(component_dict) The default parameters for every component in the component graph. >>> assert component_graph.default_parameters == { ... 'Imputer': {'categorical_impute_strategy': 'most_frequent', ... 'numeric_impute_strategy': 'mean', ... 'boolean_impute_strategy': 'most_frequent', ... 'categorical_fill_value': None, ... 'numeric_fill_value': None, ... 'boolean_fill_value': None}, ... 'One Hot Encoder': {'top_n': 10, ... 'features_to_encode': None, ... 'categories': None, ... 'drop': 'if_binary', ... 'handle_unknown': 'ignore', ... 'handle_missing': 'error'}, ... 'Random Forest Classifier': {'n_estimators': 100, ... 'max_depth': 6, ... 'n_jobs': -1}, ... 'Decision Tree Classifier': {'criterion': 'gini', ... 'max_features': 'auto', ... 'max_depth': 6, ... 'min_samples_split': 2, ... 'min_weight_fraction_leaf': 0.0}, ... 'Logistic Regression Classifier': {'penalty': 'l2', ... 'C': 1.0, ... 'n_jobs': -1, ... 'multi_class': 'auto', ... 'solver': 'lbfgs'}} **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.ComponentGraph.compute_order evalml.pipelines.ComponentGraph.default_parameters evalml.pipelines.ComponentGraph.describe evalml.pipelines.ComponentGraph.fit evalml.pipelines.ComponentGraph.fit_and_transform_all_but_final evalml.pipelines.ComponentGraph.fit_transform evalml.pipelines.ComponentGraph.generate_order evalml.pipelines.ComponentGraph.get_component evalml.pipelines.ComponentGraph.get_component_input_logical_types evalml.pipelines.ComponentGraph.get_estimators evalml.pipelines.ComponentGraph.get_inputs evalml.pipelines.ComponentGraph.get_last_component evalml.pipelines.ComponentGraph.graph evalml.pipelines.ComponentGraph.has_dfs evalml.pipelines.ComponentGraph.instantiate evalml.pipelines.ComponentGraph.inverse_transform evalml.pipelines.ComponentGraph.last_component_input_logical_types evalml.pipelines.ComponentGraph.predict evalml.pipelines.ComponentGraph.transform evalml.pipelines.ComponentGraph.transform_all_but_final .. py:method:: compute_order(self) :property: The order that components will be computed or called in. .. py:method:: default_parameters(self) :property: The default parameter dictionary for this pipeline. :returns: Dictionary of all component default parameters. :rtype: dict .. py:method:: describe(self, return_dict=False) Outputs component graph details including component parameters. :param return_dict: If True, return dictionary of information about component graph. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None :rtype: dict :raises ValueError: If the componentgraph is not instantiated .. py:method:: fit(self, X, y) Fit each component in the graph. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self .. py:method:: fit_and_transform_all_but_final(self, X, y) Fit and transform all components save the final one, usually an estimator. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: Transformed features and target. :rtype: Tuple (pd.DataFrame, pd.Series) .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: generate_order(cls, component_dict) :classmethod: Regenerated the topologically sorted order of the graph. .. py:method:: get_component(self, component_name) Retrieves a single component object from the graph. :param component_name: Name of the component to retrieve :type component_name: str :returns: ComponentBase object :raises ValueError: If the component is not in the graph. .. py:method:: get_component_input_logical_types(self, component_name) Get the logical types that are passed to the given component. :param component_name: Name of component in the graph :type component_name: str :returns: Dict - Mapping feature name to logical type instance. :raises ValueError: If the component is not in the graph. :raises ValueError: If the component graph as not been fitted .. py:method:: get_estimators(self) Gets a list of all the estimator components within this graph. :returns: All estimator objects within the graph. :rtype: list :raises ValueError: If the component graph is not yet instantiated. .. py:method:: get_inputs(self, component_name) Retrieves all inputs for a given component. :param component_name: Name of the component to look up. :type component_name: str :returns: List of inputs for the component to use. :rtype: list[str] :raises ValueError: If the component is not in the graph. .. py:method:: get_last_component(self) Retrieves the component that is computed last in the graph, usually the final estimator. :returns: ComponentBase object :raises ValueError: If the component graph has no edges. .. py:method:: graph(self, name=None, graph_format=None) Generate an image representing the component graph. :param name: Name of the graph. Defaults to None. :type name: str :param graph_format: file format to save the graph in. Defaults to None. :type graph_format: str :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. .. py:method:: has_dfs(self) :property: Whether this component graph contains a DFSTransformer or not. .. py:method:: instantiate(self, parameters=None) Instantiates all uninstantiated components within the graph using the given parameters. An error will be raised if a component is already instantiated but the parameters dict contains arguments for that component. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary {} or None implies using all default values for component parameters. If a component in the component graph is already instantiated, it will not use any of its parameters defined in this dictionary. Defaults to None. :type parameters: dict :returns: self :raises ValueError: If component graph is already instantiated or if a component errored while instantiating. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: (pd.Series): Final component features. :returns: The target with inverse transformation applied. :rtype: pd.Series .. py:method:: last_component_input_logical_types(self) :property: Get the logical types that are passed to the last component in the pipeline. :returns: Dict - Mapping feature name to logical type instance. :raises ValueError: If the component is not in the graph. :raises ValueError: If the component graph as not been fitted .. py:method:: predict(self, X) Make predictions using selected features. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises ValueError: If final component is not an Estimator. .. py:method:: transform(self, X, y=None) Transform the input using the component graph. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is not a Transformer. .. py:method:: transform_all_but_final(self, X, y=None) Transform all components save the final one, and gathers the data from any number of parents to get all the information that should be fed to the final component. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed values. :rtype: pd.DataFrame .. py:class:: DecisionTreeClassifier(criterion='gini', max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs) Decision Tree Classifier. :param criterion: The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "entropy" for the information gain. Defaults to "gini". :type criterion: {"gini", "entropy"} :param max_features: The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split. - If "auto", then max_features=sqrt(n_features). - If "sqrt", then max_features=sqrt(n_features). - If "log2", then max_features=log2(n_features). - If None, then max_features = n_features. The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to "auto". :type max_features: int, float or {"auto", "sqrt", "log2"} :param max_depth: The maximum depth of the tree. Defaults to 6. :type max_depth: int :param min_samples_split: The minimum number of samples required to split an internal node: - If int, then consider min_samples_split as the minimum number. - If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split. Defaults to 2. :type min_samples_split: int or float :param min_weight_fraction_leaf: The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0. :type min_weight_fraction_leaf: float :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "criterion": ["gini", "entropy"], "max_features": ["auto", "sqrt", "log2"], "max_depth": Integer(4, 10),} * - **model_family** - ModelFamily.DECISION_TREE * - **modifies_features** - True * - **modifies_target** - False * - **name** - Decision Tree Classifier * - **supported_problem_types** - [ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.DecisionTreeClassifier.clone evalml.pipelines.DecisionTreeClassifier.default_parameters evalml.pipelines.DecisionTreeClassifier.describe evalml.pipelines.DecisionTreeClassifier.feature_importance evalml.pipelines.DecisionTreeClassifier.fit evalml.pipelines.DecisionTreeClassifier.get_prediction_intervals evalml.pipelines.DecisionTreeClassifier.load evalml.pipelines.DecisionTreeClassifier.needs_fitting evalml.pipelines.DecisionTreeClassifier.parameters evalml.pipelines.DecisionTreeClassifier.predict evalml.pipelines.DecisionTreeClassifier.predict_proba evalml.pipelines.DecisionTreeClassifier.save evalml.pipelines.DecisionTreeClassifier.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) -> pandas.Series :property: Returns importance associated with each feature. :returns: Importance associated with each feature. :rtype: np.ndarray :raises MethodPropertyNotFoundError: If estimator does not have a feature_importance method or a component_obj that implements feature_importance. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: DecisionTreeRegressor(criterion='squared_error', max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs) Decision Tree Regressor. :param criterion: The function to measure the quality of a split. Supported criteria are: - "squared_error" for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node - "friedman_mse", which uses mean squared error with Friedman"s improvement score for potential splits - "absolute_error" for the mean absolute error, which minimizes the L1 loss using the median of each terminal node, - "poisson" which uses reduction in Poisson deviance to find splits. :type criterion: {"squared_error", "friedman_mse", "absolute_error", "poisson"} :param max_features: The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split. - If "auto", then max_features=sqrt(n_features). - If "sqrt", then max_features=sqrt(n_features). - If "log2", then max_features=log2(n_features). - If None, then max_features = n_features. The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. :type max_features: int, float or {"auto", "sqrt", "log2"} :param max_depth: The maximum depth of the tree. Defaults to 6. :type max_depth: int :param min_samples_split: The minimum number of samples required to split an internal node: - If int, then consider min_samples_split as the minimum number. - If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split. Defaults to 2. :type min_samples_split: int or float :param min_weight_fraction_leaf: The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0. :type min_weight_fraction_leaf: float :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "criterion": ["squared_error", "friedman_mse", "absolute_error"], "max_features": ["auto", "sqrt", "log2"], "max_depth": Integer(4, 10),} * - **model_family** - ModelFamily.DECISION_TREE * - **modifies_features** - True * - **modifies_target** - False * - **name** - Decision Tree Regressor * - **supported_problem_types** - [ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.DecisionTreeRegressor.clone evalml.pipelines.DecisionTreeRegressor.default_parameters evalml.pipelines.DecisionTreeRegressor.describe evalml.pipelines.DecisionTreeRegressor.feature_importance evalml.pipelines.DecisionTreeRegressor.fit evalml.pipelines.DecisionTreeRegressor.get_prediction_intervals evalml.pipelines.DecisionTreeRegressor.load evalml.pipelines.DecisionTreeRegressor.needs_fitting evalml.pipelines.DecisionTreeRegressor.parameters evalml.pipelines.DecisionTreeRegressor.predict evalml.pipelines.DecisionTreeRegressor.predict_proba evalml.pipelines.DecisionTreeRegressor.save evalml.pipelines.DecisionTreeRegressor.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) -> pandas.Series :property: Returns importance associated with each feature. :returns: Importance associated with each feature. :rtype: np.ndarray :raises MethodPropertyNotFoundError: If estimator does not have a feature_importance method or a component_obj that implements feature_importance. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: DFSTransformer(index='index', features=None, random_seed=0, **kwargs) Featuretools DFS component that generates features for the input features. :param index: The name of the column that contains the indices. If no column with this name exists, then featuretools.EntitySet() creates a column with this name to serve as the index column. Defaults to 'index'. :type index: string :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :param features: List of features to run DFS on. Defaults to None. Features will only be computed if the columns used by the feature exist in the input and if the feature itself is not in input. If features is an empty list, no transformation will occur to inputted data. :type features: list **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - DFS Transformer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.DFSTransformer.clone evalml.pipelines.DFSTransformer.contains_pre_existing_features evalml.pipelines.DFSTransformer.default_parameters evalml.pipelines.DFSTransformer.describe evalml.pipelines.DFSTransformer.fit evalml.pipelines.DFSTransformer.fit_transform evalml.pipelines.DFSTransformer.load evalml.pipelines.DFSTransformer.needs_fitting evalml.pipelines.DFSTransformer.parameters evalml.pipelines.DFSTransformer.save evalml.pipelines.DFSTransformer.transform evalml.pipelines.DFSTransformer.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: contains_pre_existing_features(dfs_features: Optional[List[featuretools.feature_base.FeatureBase]], input_feature_names: List[str], target: Optional[str] = None) :staticmethod: Determines whether or not features from a DFS Transformer match pipeline input features. :param dfs_features: List of features output from a DFS Transformer. :type dfs_features: Optional[List[FeatureBase]] :param input_feature_names: List of input features into the DFS Transformer. :type input_feature_names: List[str] :param target: The target whose values we are trying to predict. This is used to know which column to ignore if the target column is present in the list of features in the DFS Transformer's parameters. :type target: Optional[str] .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the DFSTransformer Transformer component. :param X: The input data to transform, of shape [n_samples, n_features]. :type X: pd.DataFrame, np.array :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Computes the feature matrix for the input X using featuretools' dfs algorithm. :param X: The input training data to transform. Has shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: Ignored. :type y: pd.Series, optional :returns: Feature matrix :rtype: pd.DataFrame .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: DropNaNRowsTransformer(parameters=None, component_obj=None, random_seed=0, **kwargs) Transformer to drop rows with NaN values. :param random_seed: Seed for the random number generator. Is not used by this component. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - True * - **name** - Drop NaN Rows Transformer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.DropNaNRowsTransformer.clone evalml.pipelines.DropNaNRowsTransformer.default_parameters evalml.pipelines.DropNaNRowsTransformer.describe evalml.pipelines.DropNaNRowsTransformer.fit evalml.pipelines.DropNaNRowsTransformer.fit_transform evalml.pipelines.DropNaNRowsTransformer.load evalml.pipelines.DropNaNRowsTransformer.needs_fitting evalml.pipelines.DropNaNRowsTransformer.parameters evalml.pipelines.DropNaNRowsTransformer.save evalml.pipelines.DropNaNRowsTransformer.transform evalml.pipelines.DropNaNRowsTransformer.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data using fitted component. :param X: Features. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series, optional :returns: Data with NaN rows dropped. :rtype: (pd.DataFrame, pd.Series) .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: ElasticNetClassifier(penalty='elasticnet', C=1.0, l1_ratio=0.15, multi_class='auto', solver='saga', n_jobs=-1, random_seed=0, **kwargs) Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator. :param penalty: The norm used in penalization. Defaults to "elasticnet". :type penalty: {"l1", "l2", "elasticnet", "none"} :param C: Inverse of regularization strength. Must be a positive float. Defaults to 1.0. :type C: float :param l1_ratio: The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty='elasticnet'. Setting l1_ratio=0 is equivalent to using penalty='l2', while setting l1_ratio=1 is equivalent to using penalty='l1'. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15. :type l1_ratio: float :param multi_class: If the option chosen is "ovr", then a binary problem is fit for each label. For "multinomial" the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. "multinomial" is unavailable when solver="liblinear". "auto" selects "ovr" if the data is binary, or if solver="liblinear", and otherwise selects "multinomial". Defaults to "auto". :type multi_class: {"auto", "ovr", "multinomial"} :param solver: Algorithm to use in the optimization problem. For small datasets, "liblinear" is a good choice, whereas "sag" and "saga" are faster for large ones. For multiclass problems, only "newton-cg", "sag", "saga" and "lbfgs" handle multinomial loss; "liblinear" is limited to one-versus-rest schemes. - "newton-cg", "lbfgs", "sag" and "saga" handle L2 or no penalty - "liblinear" and "saga" also handle L1 penalty - "saga" also supports "elasticnet" penalty - "liblinear" does not support setting penalty='none' Defaults to "saga". :type solver: {"newton-cg", "lbfgs", "liblinear", "sag", "saga"} :param n_jobs: Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1. :type n_jobs: int :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "C": Real(0.01, 10), "l1_ratio": Real(0, 1)} * - **model_family** - ModelFamily.LINEAR_MODEL * - **modifies_features** - True * - **modifies_target** - False * - **name** - Elastic Net Classifier * - **supported_problem_types** - [ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.ElasticNetClassifier.clone evalml.pipelines.ElasticNetClassifier.default_parameters evalml.pipelines.ElasticNetClassifier.describe evalml.pipelines.ElasticNetClassifier.feature_importance evalml.pipelines.ElasticNetClassifier.fit evalml.pipelines.ElasticNetClassifier.get_prediction_intervals evalml.pipelines.ElasticNetClassifier.load evalml.pipelines.ElasticNetClassifier.needs_fitting evalml.pipelines.ElasticNetClassifier.parameters evalml.pipelines.ElasticNetClassifier.predict evalml.pipelines.ElasticNetClassifier.predict_proba evalml.pipelines.ElasticNetClassifier.save evalml.pipelines.ElasticNetClassifier.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Feature importance for fitted ElasticNet classifier. .. py:method:: fit(self, X, y) Fits ElasticNet classifier component to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: ElasticNetRegressor(alpha=0.0001, l1_ratio=0.15, max_iter=1000, random_seed=0, **kwargs) Elastic Net Regressor. :param alpha: Constant that multiplies the penalty terms. Defaults to 0.0001. :type alpha: float :param l1_ratio: The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty='elasticnet'. Setting l1_ratio=0 is equivalent to using penalty='l2', while setting l1_ratio=1 is equivalent to using penalty='l1'. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15. :type l1_ratio: float :param max_iter: The maximum number of iterations. Defaults to 1000. :type max_iter: int :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "alpha": Real(0, 1), "l1_ratio": Real(0, 1),} * - **model_family** - ModelFamily.LINEAR_MODEL * - **modifies_features** - True * - **modifies_target** - False * - **name** - Elastic Net Regressor * - **supported_problem_types** - [ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.ElasticNetRegressor.clone evalml.pipelines.ElasticNetRegressor.default_parameters evalml.pipelines.ElasticNetRegressor.describe evalml.pipelines.ElasticNetRegressor.feature_importance evalml.pipelines.ElasticNetRegressor.fit evalml.pipelines.ElasticNetRegressor.get_prediction_intervals evalml.pipelines.ElasticNetRegressor.load evalml.pipelines.ElasticNetRegressor.needs_fitting evalml.pipelines.ElasticNetRegressor.parameters evalml.pipelines.ElasticNetRegressor.predict evalml.pipelines.ElasticNetRegressor.predict_proba evalml.pipelines.ElasticNetRegressor.save evalml.pipelines.ElasticNetRegressor.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Feature importance for fitted ElasticNet regressor. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: Estimator(parameters: dict = None, component_obj: Type[evalml.pipelines.components.ComponentBase] = None, random_seed: Union[int, float] = 0, **kwargs) A component that fits and predicts given data. To implement a new Estimator, define your own class which is a subclass of Estimator, including a name and a list of acceptable ranges for any parameters to be tuned during the automl search (hyperparameters). Define an `__init__` method which sets up any necessary state and objects. Make sure your `__init__` only uses standard keyword arguments and calls `super().__init__()` with a parameters dict. You may also override the `fit`, `transform`, `fit_transform` and other methods in this class if appropriate. To see some examples, check out the definitions of any Estimator component subclass. :param parameters: Dictionary of parameters for the component. Defaults to None. :type parameters: dict :param component_obj: Third-party objects useful in component implementation. Defaults to None. :type component_obj: obj :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **model_family** - ModelFamily.NONE * - **modifies_features** - True * - **modifies_target** - False * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.Estimator.clone evalml.pipelines.Estimator.default_parameters evalml.pipelines.Estimator.describe evalml.pipelines.Estimator.feature_importance evalml.pipelines.Estimator.fit evalml.pipelines.Estimator.get_prediction_intervals evalml.pipelines.Estimator.load evalml.pipelines.Estimator.model_family evalml.pipelines.Estimator.name evalml.pipelines.Estimator.needs_fitting evalml.pipelines.Estimator.parameters evalml.pipelines.Estimator.predict evalml.pipelines.Estimator.predict_proba evalml.pipelines.Estimator.save evalml.pipelines.Estimator.supported_problem_types evalml.pipelines.Estimator.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) -> pandas.Series :property: Returns importance associated with each feature. :returns: Importance associated with each feature. :rtype: np.ndarray :raises MethodPropertyNotFoundError: If estimator does not have a feature_importance method or a component_obj that implements feature_importance. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: model_family(cls) :property: Returns ModelFamily of this component. .. py:method:: name(cls) :property: Returns string name of this component. .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: supported_problem_types(cls) :property: Problem types this estimator supports. .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: ExponentialSmoothingRegressor(trend: Optional[str] = None, damped_trend: bool = False, seasonal: Optional[str] = None, sp: int = 2, n_jobs: int = -1, random_seed: Union[int, float] = 0, **kwargs) Holt-Winters Exponential Smoothing Forecaster. Currently ExponentialSmoothingRegressor isn't supported via conda install. It's recommended that it be installed via PyPI. :param trend: Type of trend component. Defaults to None. :type trend: str :param damped_trend: If the trend component should be damped. Defaults to False. :type damped_trend: bool :param seasonal: Type of seasonal component. Takes one of {“additive”, None}. Can also be multiplicative if :type seasonal: str :param none of the target data is 0: :param but AutoMLSearch wiill not tune for this. Defaults to None.: :param sp: The number of seasonal periods to consider. Defaults to 2. :type sp: int :param n_jobs: Non-negative integer describing level of parallelism used for pipelines. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "trend": [None, "additive"], "damped_trend": [True, False], "seasonal": [None, "additive"], "sp": Integer(2, 8),} * - **model_family** - ModelFamily.EXPONENTIAL_SMOOTHING * - **modifies_features** - True * - **modifies_target** - False * - **name** - Exponential Smoothing Regressor * - **supported_problem_types** - [ProblemTypes.TIME_SERIES_REGRESSION] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.ExponentialSmoothingRegressor.clone evalml.pipelines.ExponentialSmoothingRegressor.default_parameters evalml.pipelines.ExponentialSmoothingRegressor.describe evalml.pipelines.ExponentialSmoothingRegressor.feature_importance evalml.pipelines.ExponentialSmoothingRegressor.fit evalml.pipelines.ExponentialSmoothingRegressor.get_prediction_intervals evalml.pipelines.ExponentialSmoothingRegressor.load evalml.pipelines.ExponentialSmoothingRegressor.needs_fitting evalml.pipelines.ExponentialSmoothingRegressor.parameters evalml.pipelines.ExponentialSmoothingRegressor.predict evalml.pipelines.ExponentialSmoothingRegressor.predict_proba evalml.pipelines.ExponentialSmoothingRegressor.save evalml.pipelines.ExponentialSmoothingRegressor.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) -> pandas.Series :property: Returns array of 0's with a length of 1 as feature_importance is not defined for Exponential Smoothing regressor. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits Exponential Smoothing Regressor to data. :param X: The input training data of shape [n_samples, n_features]. Ignored. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self :raises ValueError: If y was not passed in. .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted ExponentialSmoothingRegressor. Calculates the prediction intervals by using a simulation of the time series following a specified state space model. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Optional. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: List[float] :param predictions: Not used for Exponential Smoothing regressor. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) -> pandas.Series Make predictions using fitted Exponential Smoothing regressor. :param X: Data of shape [n_samples, n_features]. Ignored except to set forecast horizon. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Predicted values. :rtype: pd.Series .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: ExtraTreesClassifier(n_estimators=100, max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, n_jobs=-1, random_seed=0, **kwargs) Extra Trees Classifier. :param n_estimators: The number of trees in the forest. Defaults to 100. :type n_estimators: float :param max_features: The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split. - If "auto", then max_features=sqrt(n_features). - If "sqrt", then max_features=sqrt(n_features). - If "log2", then max_features=log2(n_features). - If None, then max_features = n_features. The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to "auto". :type max_features: int, float or {"auto", "sqrt", "log2"} :param max_depth: The maximum depth of the tree. Defaults to 6. :type max_depth: int :param min_samples_split: The minimum number of samples required to split an internal node: - If int, then consider min_samples_split as the minimum number. - If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split. :type min_samples_split: int or float :param Defaults to 2.: :param min_weight_fraction_leaf: The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0. :type min_weight_fraction_leaf: float :param n_jobs: Number of jobs to run in parallel. -1 uses all processes. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "n_estimators": Integer(10, 1000), "max_features": ["auto", "sqrt", "log2"], "max_depth": Integer(4, 10),} * - **model_family** - ModelFamily.EXTRA_TREES * - **modifies_features** - True * - **modifies_target** - False * - **name** - Extra Trees Classifier * - **supported_problem_types** - [ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.ExtraTreesClassifier.clone evalml.pipelines.ExtraTreesClassifier.default_parameters evalml.pipelines.ExtraTreesClassifier.describe evalml.pipelines.ExtraTreesClassifier.feature_importance evalml.pipelines.ExtraTreesClassifier.fit evalml.pipelines.ExtraTreesClassifier.get_prediction_intervals evalml.pipelines.ExtraTreesClassifier.load evalml.pipelines.ExtraTreesClassifier.needs_fitting evalml.pipelines.ExtraTreesClassifier.parameters evalml.pipelines.ExtraTreesClassifier.predict evalml.pipelines.ExtraTreesClassifier.predict_proba evalml.pipelines.ExtraTreesClassifier.save evalml.pipelines.ExtraTreesClassifier.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) -> pandas.Series :property: Returns importance associated with each feature. :returns: Importance associated with each feature. :rtype: np.ndarray :raises MethodPropertyNotFoundError: If estimator does not have a feature_importance method or a component_obj that implements feature_importance. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: ExtraTreesRegressor(n_estimators: int = 100, max_features: str = 'auto', max_depth: int = 6, min_samples_split: int = 2, min_weight_fraction_leaf: float = 0.0, n_jobs: int = -1, random_seed: Union[int, float] = 0, **kwargs) Extra Trees Regressor. :param n_estimators: The number of trees in the forest. Defaults to 100. :type n_estimators: float :param max_features: The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split. - If "auto", then max_features=sqrt(n_features). - If "sqrt", then max_features=sqrt(n_features). - If "log2", then max_features=log2(n_features). - If None, then max_features = n_features. The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to "auto". :type max_features: int, float or {"auto", "sqrt", "log2"} :param max_depth: The maximum depth of the tree. Defaults to 6. :type max_depth: int :param min_samples_split: The minimum number of samples required to split an internal node: - If int, then consider min_samples_split as the minimum number. - If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split. :type min_samples_split: int or float :param Defaults to 2.: :param min_weight_fraction_leaf: The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0. :type min_weight_fraction_leaf: float :param n_jobs: Number of jobs to run in parallel. -1 uses all processes. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "n_estimators": Integer(10, 1000), "max_features": ["auto", "sqrt", "log2"], "max_depth": Integer(4, 10),} * - **model_family** - ModelFamily.EXTRA_TREES * - **modifies_features** - True * - **modifies_target** - False * - **name** - Extra Trees Regressor * - **supported_problem_types** - [ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.ExtraTreesRegressor.clone evalml.pipelines.ExtraTreesRegressor.default_parameters evalml.pipelines.ExtraTreesRegressor.describe evalml.pipelines.ExtraTreesRegressor.feature_importance evalml.pipelines.ExtraTreesRegressor.fit evalml.pipelines.ExtraTreesRegressor.get_prediction_intervals evalml.pipelines.ExtraTreesRegressor.load evalml.pipelines.ExtraTreesRegressor.needs_fitting evalml.pipelines.ExtraTreesRegressor.parameters evalml.pipelines.ExtraTreesRegressor.predict evalml.pipelines.ExtraTreesRegressor.predict_proba evalml.pipelines.ExtraTreesRegressor.save evalml.pipelines.ExtraTreesRegressor.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) -> pandas.Series :property: Returns importance associated with each feature. :returns: Importance associated with each feature. :rtype: np.ndarray :raises MethodPropertyNotFoundError: If estimator does not have a feature_importance method or a component_obj that implements feature_importance. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted ExtraTreesRegressor. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Optional. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: FeatureSelector(parameters=None, component_obj=None, random_seed=0, **kwargs) Selects top features based on importance weights. :param parameters: Dictionary of parameters for the component. Defaults to None. :type parameters: dict :param component_obj: Third-party objects useful in component implementation. Defaults to None. :type component_obj: obj :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **modifies_features** - True * - **modifies_target** - False * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.FeatureSelector.clone evalml.pipelines.FeatureSelector.default_parameters evalml.pipelines.FeatureSelector.describe evalml.pipelines.FeatureSelector.fit evalml.pipelines.FeatureSelector.fit_transform evalml.pipelines.FeatureSelector.get_names evalml.pipelines.FeatureSelector.load evalml.pipelines.FeatureSelector.name evalml.pipelines.FeatureSelector.needs_fitting evalml.pipelines.FeatureSelector.parameters evalml.pipelines.FeatureSelector.save evalml.pipelines.FeatureSelector.transform evalml.pipelines.FeatureSelector.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises MethodPropertyNotFoundError: If component does not have a fit method or a component_obj that implements fit. .. py:method:: fit_transform(self, X, y=None) Fit and transform data using the feature selector. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:method:: get_names(self) Get names of selected features. :returns: List of the names of features selected. :rtype: list[str] .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: name(cls) :property: Returns string name of this component. .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception. :param X: Data to transform. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If feature selector does not have a transform method or a component_obj that implements transform .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: Imputer(categorical_impute_strategy='most_frequent', categorical_fill_value=None, numeric_impute_strategy='mean', numeric_fill_value=None, boolean_impute_strategy='most_frequent', boolean_fill_value=None, random_seed=0, **kwargs) Imputes missing data according to a specified imputation strategy. :param categorical_impute_strategy: Impute strategy to use for string, object, boolean, categorical dtypes. Valid values include "most_frequent" and "constant". :type categorical_impute_strategy: string :param numeric_impute_strategy: Impute strategy to use for numeric columns. Valid values include "mean", "median", "most_frequent", and "constant". :type numeric_impute_strategy: string :param boolean_impute_strategy: Impute strategy to use for boolean columns. Valid values include "most_frequent" and "constant". :type boolean_impute_strategy: string :param categorical_fill_value: When categorical_impute_strategy == "constant", fill_value is used to replace missing data. The default value of None will fill with the string "missing_value". :type categorical_fill_value: string :param numeric_fill_value: When numeric_impute_strategy == "constant", fill_value is used to replace missing data. The default value of None will fill with 0. :type numeric_fill_value: int, float :param boolean_fill_value: When boolean_impute_strategy == "constant", fill_value is used to replace missing data. The default value of None will fill with True. :type boolean_fill_value: bool :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "categorical_impute_strategy": ["most_frequent"], "numeric_impute_strategy": ["mean", "median", "most_frequent", "knn"], "boolean_impute_strategy": ["most_frequent"]} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Imputer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.Imputer.clone evalml.pipelines.Imputer.default_parameters evalml.pipelines.Imputer.describe evalml.pipelines.Imputer.fit evalml.pipelines.Imputer.fit_transform evalml.pipelines.Imputer.load evalml.pipelines.Imputer.needs_fitting evalml.pipelines.Imputer.parameters evalml.pipelines.Imputer.save evalml.pipelines.Imputer.transform evalml.pipelines.Imputer.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits imputer to data. 'None' values are converted to np.nan before imputation and are treated as the same. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame, np.ndarray :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data X by imputing missing values. :param X: Data to transform :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, random_seed=0, **kwargs) K-Nearest Neighbors Classifier. :param n_neighbors: Number of neighbors to use by default. Defaults to 5. :type n_neighbors: int :param weights: Weight function used in prediction. Can be: - ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally. - ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away. - [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights. Defaults to "uniform". :type weights: {‘uniform’, ‘distance’} or callable :param algorithm: Algorithm used to compute the nearest neighbors: - ‘ball_tree’ will use BallTree - ‘kd_tree’ will use KDTree - ‘brute’ will use a brute-force search. ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method. Defaults to "auto". Note: fitting on sparse input will override the setting of this parameter, using brute force. :type algorithm: {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’} :param leaf_size: Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. Defaults to 30. :type leaf_size: int :param p: Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. Defaults to 2. :type p: int :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "n_neighbors": Integer(2, 12), "weights": ["uniform", "distance"], "algorithm": ["auto", "ball_tree", "kd_tree", "brute"], "leaf_size": Integer(10, 30), "p": Integer(1, 5),} * - **model_family** - ModelFamily.K_NEIGHBORS * - **modifies_features** - True * - **modifies_target** - False * - **name** - KNN Classifier * - **supported_problem_types** - [ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.KNeighborsClassifier.clone evalml.pipelines.KNeighborsClassifier.default_parameters evalml.pipelines.KNeighborsClassifier.describe evalml.pipelines.KNeighborsClassifier.feature_importance evalml.pipelines.KNeighborsClassifier.fit evalml.pipelines.KNeighborsClassifier.get_prediction_intervals evalml.pipelines.KNeighborsClassifier.load evalml.pipelines.KNeighborsClassifier.needs_fitting evalml.pipelines.KNeighborsClassifier.parameters evalml.pipelines.KNeighborsClassifier.predict evalml.pipelines.KNeighborsClassifier.predict_proba evalml.pipelines.KNeighborsClassifier.save evalml.pipelines.KNeighborsClassifier.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Returns array of 0's matching the input number of features as feature_importance is not defined for KNN classifiers. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: LightGBMClassifier(boosting_type='gbdt', learning_rate=0.1, n_estimators=100, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=-1, random_seed=0, **kwargs) LightGBM Classifier. :param boosting_type: Type of boosting to use. Defaults to "gbdt". - 'gbdt' uses traditional Gradient Boosting Decision Tree - "dart", uses Dropouts meet Multiple Additive Regression Trees - "goss", uses Gradient-based One-Side Sampling - "rf", uses Random Forest :type boosting_type: string :param learning_rate: Boosting learning rate. Defaults to 0.1. :type learning_rate: float :param n_estimators: Number of boosted trees to fit. Defaults to 100. :type n_estimators: int :param max_depth: Maximum tree depth for base learners, <=0 means no limit. Defaults to 0. :type max_depth: int :param num_leaves: Maximum tree leaves for base learners. Defaults to 31. :type num_leaves: int :param min_child_samples: Minimum number of data needed in a child (leaf). Defaults to 20. :type min_child_samples: int :param bagging_fraction: LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9. :type bagging_fraction: float :param bagging_freq: Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0. :type bagging_freq: int :param n_jobs: Number of threads to run in parallel. -1 uses all threads. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "learning_rate": Real(0.000001, 1), "boosting_type": ["gbdt", "dart", "goss", "rf"], "n_estimators": Integer(10, 100), "max_depth": Integer(0, 10), "num_leaves": Integer(2, 100), "min_child_samples": Integer(1, 100), "bagging_fraction": Real(0.000001, 1), "bagging_freq": Integer(0, 1),} * - **model_family** - ModelFamily.LIGHTGBM * - **modifies_features** - True * - **modifies_target** - False * - **name** - LightGBM Classifier * - **SEED_MAX** - SEED_BOUNDS.max_bound * - **SEED_MIN** - 0 * - **supported_problem_types** - [ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.LightGBMClassifier.clone evalml.pipelines.LightGBMClassifier.default_parameters evalml.pipelines.LightGBMClassifier.describe evalml.pipelines.LightGBMClassifier.feature_importance evalml.pipelines.LightGBMClassifier.fit evalml.pipelines.LightGBMClassifier.get_prediction_intervals evalml.pipelines.LightGBMClassifier.load evalml.pipelines.LightGBMClassifier.needs_fitting evalml.pipelines.LightGBMClassifier.parameters evalml.pipelines.LightGBMClassifier.predict evalml.pipelines.LightGBMClassifier.predict_proba evalml.pipelines.LightGBMClassifier.save evalml.pipelines.LightGBMClassifier.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) -> pandas.Series :property: Returns importance associated with each feature. :returns: Importance associated with each feature. :rtype: np.ndarray :raises MethodPropertyNotFoundError: If estimator does not have a feature_importance method or a component_obj that implements feature_importance. .. py:method:: fit(self, X, y=None) Fits LightGBM classifier component to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X) Make predictions using the fitted LightGBM classifier. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.DataFrame .. py:method:: predict_proba(self, X) Make prediction probabilities using the fitted LightGBM classifier. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted probability values. :rtype: pd.DataFrame .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: LightGBMRegressor(boosting_type='gbdt', learning_rate=0.1, n_estimators=20, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=-1, random_seed=0, **kwargs) LightGBM Regressor. :param boosting_type: Type of boosting to use. Defaults to "gbdt". - 'gbdt' uses traditional Gradient Boosting Decision Tree - "dart", uses Dropouts meet Multiple Additive Regression Trees - "goss", uses Gradient-based One-Side Sampling - "rf", uses Random Forest :type boosting_type: string :param learning_rate: Boosting learning rate. Defaults to 0.1. :type learning_rate: float :param n_estimators: Number of boosted trees to fit. Defaults to 100. :type n_estimators: int :param max_depth: Maximum tree depth for base learners, <=0 means no limit. Defaults to 0. :type max_depth: int :param num_leaves: Maximum tree leaves for base learners. Defaults to 31. :type num_leaves: int :param min_child_samples: Minimum number of data needed in a child (leaf). Defaults to 20. :type min_child_samples: int :param bagging_fraction: LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9. :type bagging_fraction: float :param bagging_freq: Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0. :type bagging_freq: int :param n_jobs: Number of threads to run in parallel. -1 uses all threads. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "learning_rate": Real(0.000001, 1), "boosting_type": ["gbdt", "dart", "goss", "rf"], "n_estimators": Integer(10, 100), "max_depth": Integer(0, 10), "num_leaves": Integer(2, 100), "min_child_samples": Integer(1, 100), "bagging_fraction": Real(0.000001, 1), "bagging_freq": Integer(0, 1),} * - **model_family** - ModelFamily.LIGHTGBM * - **modifies_features** - True * - **modifies_target** - False * - **name** - LightGBM Regressor * - **SEED_MAX** - SEED_BOUNDS.max_bound * - **SEED_MIN** - 0 * - **supported_problem_types** - [ProblemTypes.REGRESSION] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.LightGBMRegressor.clone evalml.pipelines.LightGBMRegressor.default_parameters evalml.pipelines.LightGBMRegressor.describe evalml.pipelines.LightGBMRegressor.feature_importance evalml.pipelines.LightGBMRegressor.fit evalml.pipelines.LightGBMRegressor.get_prediction_intervals evalml.pipelines.LightGBMRegressor.load evalml.pipelines.LightGBMRegressor.needs_fitting evalml.pipelines.LightGBMRegressor.parameters evalml.pipelines.LightGBMRegressor.predict evalml.pipelines.LightGBMRegressor.predict_proba evalml.pipelines.LightGBMRegressor.save evalml.pipelines.LightGBMRegressor.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) -> pandas.Series :property: Returns importance associated with each feature. :returns: Importance associated with each feature. :rtype: np.ndarray :raises MethodPropertyNotFoundError: If estimator does not have a feature_importance method or a component_obj that implements feature_importance. .. py:method:: fit(self, X, y=None) Fits LightGBM regressor to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X) Make predictions using fitted LightGBM regressor. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: LinearRegressor(fit_intercept=True, n_jobs=-1, random_seed=0, **kwargs) Linear Regressor. :param fit_intercept: Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered). Defaults to True. :type fit_intercept: boolean :param n_jobs: Number of jobs to run in parallel. -1 uses all threads. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "fit_intercept": [True, False],} * - **model_family** - ModelFamily.LINEAR_MODEL * - **modifies_features** - True * - **modifies_target** - False * - **name** - Linear Regressor * - **supported_problem_types** - [ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.LinearRegressor.clone evalml.pipelines.LinearRegressor.default_parameters evalml.pipelines.LinearRegressor.describe evalml.pipelines.LinearRegressor.feature_importance evalml.pipelines.LinearRegressor.fit evalml.pipelines.LinearRegressor.get_prediction_intervals evalml.pipelines.LinearRegressor.load evalml.pipelines.LinearRegressor.needs_fitting evalml.pipelines.LinearRegressor.parameters evalml.pipelines.LinearRegressor.predict evalml.pipelines.LinearRegressor.predict_proba evalml.pipelines.LinearRegressor.save evalml.pipelines.LinearRegressor.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Feature importance for fitted linear regressor. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: LogisticRegressionClassifier(penalty='l2', C=1.0, multi_class='auto', solver='lbfgs', n_jobs=-1, random_seed=0, **kwargs) Logistic Regression Classifier. :param penalty: The norm used in penalization. Defaults to "l2". :type penalty: {"l1", "l2", "elasticnet", "none"} :param C: Inverse of regularization strength. Must be a positive float. Defaults to 1.0. :type C: float :param multi_class: If the option chosen is "ovr", then a binary problem is fit for each label. For "multinomial" the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. "multinomial" is unavailable when solver="liblinear". "auto" selects "ovr" if the data is binary, or if solver="liblinear", and otherwise selects "multinomial". Defaults to "auto". :type multi_class: {"auto", "ovr", "multinomial"} :param solver: Algorithm to use in the optimization problem. For small datasets, "liblinear" is a good choice, whereas "sag" and "saga" are faster for large ones. For multiclass problems, only "newton-cg", "sag", "saga" and "lbfgs" handle multinomial loss; "liblinear" is limited to one-versus-rest schemes. - "newton-cg", "lbfgs", "sag" and "saga" handle L2 or no penalty - "liblinear" and "saga" also handle L1 penalty - "saga" also supports "elasticnet" penalty - "liblinear" does not support setting penalty='none' Defaults to "lbfgs". :type solver: {"newton-cg", "lbfgs", "liblinear", "sag", "saga"} :param n_jobs: Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1. :type n_jobs: int :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "penalty": ["l2"], "C": Real(0.01, 10),} * - **model_family** - ModelFamily.LINEAR_MODEL * - **modifies_features** - True * - **modifies_target** - False * - **name** - Logistic Regression Classifier * - **supported_problem_types** - [ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.LogisticRegressionClassifier.clone evalml.pipelines.LogisticRegressionClassifier.default_parameters evalml.pipelines.LogisticRegressionClassifier.describe evalml.pipelines.LogisticRegressionClassifier.feature_importance evalml.pipelines.LogisticRegressionClassifier.fit evalml.pipelines.LogisticRegressionClassifier.get_prediction_intervals evalml.pipelines.LogisticRegressionClassifier.load evalml.pipelines.LogisticRegressionClassifier.needs_fitting evalml.pipelines.LogisticRegressionClassifier.parameters evalml.pipelines.LogisticRegressionClassifier.predict evalml.pipelines.LogisticRegressionClassifier.predict_proba evalml.pipelines.LogisticRegressionClassifier.save evalml.pipelines.LogisticRegressionClassifier.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Feature importance for fitted logistic regression classifier. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: MulticlassClassificationPipeline(component_graph, parameters=None, custom_name=None, random_seed=0) Pipeline subclass for all multiclass classification pipelines. :param component_graph: ComponentGraph instance, list of components in order, or dictionary of components. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component's index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"] :type component_graph: ComponentGraph, list, dict :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param custom_name: Custom name for the pipeline. Defaults to None. :type custom_name: str :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int .. rubric:: Example >>> pipeline = MulticlassClassificationPipeline(component_graph=["Simple Imputer", "Logistic Regression Classifier"], ... parameters={"Logistic Regression Classifier": {"penalty": "elasticnet", ... "solver": "liblinear"}}, ... custom_name="My Multiclass Pipeline") ... >>> assert pipeline.custom_name == "My Multiclass Pipeline" >>> assert pipeline.component_graph.component_dict.keys() == {'Simple Imputer', 'Logistic Regression Classifier'} The pipeline parameters will be chosen from the default parameters for every component, unless specific parameters were passed in as they were above. >>> assert pipeline.parameters == { ... 'Simple Imputer': {'impute_strategy': 'most_frequent', 'fill_value': None}, ... 'Logistic Regression Classifier': {'penalty': 'elasticnet', ... 'C': 1.0, ... 'n_jobs': -1, ... 'multi_class': 'auto', ... 'solver': 'liblinear'}} **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **problem_type** - ProblemTypes.MULTICLASS **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.MulticlassClassificationPipeline.can_tune_threshold_with_objective evalml.pipelines.MulticlassClassificationPipeline.classes_ evalml.pipelines.MulticlassClassificationPipeline.clone evalml.pipelines.MulticlassClassificationPipeline.create_objectives evalml.pipelines.MulticlassClassificationPipeline.custom_name evalml.pipelines.MulticlassClassificationPipeline.describe evalml.pipelines.MulticlassClassificationPipeline.feature_importance evalml.pipelines.MulticlassClassificationPipeline.fit evalml.pipelines.MulticlassClassificationPipeline.fit_transform evalml.pipelines.MulticlassClassificationPipeline.get_component evalml.pipelines.MulticlassClassificationPipeline.get_hyperparameter_ranges evalml.pipelines.MulticlassClassificationPipeline.graph evalml.pipelines.MulticlassClassificationPipeline.graph_dict evalml.pipelines.MulticlassClassificationPipeline.graph_feature_importance evalml.pipelines.MulticlassClassificationPipeline.inverse_transform evalml.pipelines.MulticlassClassificationPipeline.load evalml.pipelines.MulticlassClassificationPipeline.model_family evalml.pipelines.MulticlassClassificationPipeline.name evalml.pipelines.MulticlassClassificationPipeline.new evalml.pipelines.MulticlassClassificationPipeline.parameters evalml.pipelines.MulticlassClassificationPipeline.predict evalml.pipelines.MulticlassClassificationPipeline.predict_proba evalml.pipelines.MulticlassClassificationPipeline.save evalml.pipelines.MulticlassClassificationPipeline.score evalml.pipelines.MulticlassClassificationPipeline.summary evalml.pipelines.MulticlassClassificationPipeline.transform evalml.pipelines.MulticlassClassificationPipeline.transform_all_but_final .. py:method:: can_tune_threshold_with_objective(self, objective) Determine whether the threshold of a binary classification pipeline can be tuned. :param objective: Primary AutoMLSearch objective. :type objective: ObjectiveBase :returns: True if the pipeline threshold can be tuned. :rtype: bool .. py:method:: classes_(self) :property: Gets the class names for the pipeline. Will return None before pipeline is fit. .. py:method:: clone(self) Constructs a new pipeline with the same components, parameters, and random seed. :returns: A new instance of this pipeline with identical components, parameters, and random seed. .. py:method:: create_objectives(objectives) :staticmethod: Create objective instances from a list of strings or objective classes. .. py:method:: custom_name(self) :property: Custom name of the pipeline. .. py:method:: describe(self, return_dict=False) Outputs pipeline details including component parameters. :param return_dict: If True, return dictionary of information about pipeline. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None. :rtype: dict .. py:method:: feature_importance(self) :property: Importance associated with each feature. Features dropped by the feature selection are excluded. :returns: Feature names and their corresponding importance :rtype: pd.DataFrame .. py:method:: fit(self, X, y) Build a classification model. For string and categorical targets, classes are sorted by sorted(set(y)) and then are mapped to values between 0 and n_classes-1. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: The target training labels of length [n_samples] :type y: pd.Series, np.ndarray :returns: self :raises ValueError: If the number of unique classes in y are not appropriate for the type of pipeline. :raises TypeError: If the dtype is boolean but pd.NA exists in the series. :raises Exception: For all other exceptions. .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: get_component(self, name) Returns component by name. :param name: Name of component. :type name: str :returns: Component to return :rtype: Component .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters) Returns hyperparameter ranges from all components as a dictionary. :param custom_hyperparameters: Custom hyperparameters for the pipeline. :type custom_hyperparameters: dict :returns: Dictionary of hyperparameter ranges for each component in the pipeline. :rtype: dict .. py:method:: graph(self, filepath=None) Generate an image representing the pipeline graph. :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. :raises ValueError: If path is not writeable. .. py:method:: graph_dict(self) Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases. x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}}, "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...], "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]} :returns: A dictionary representing the DAG structure. :rtype: dag_dict (dict) .. py:method:: graph_feature_importance(self, importance_threshold=0) Generate a bar graph of the pipeline's feature importance. :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero. :type importance_threshold: float, optional :returns: A bar graph showing features and their corresponding importance. :rtype: plotly.Figure :raises ValueError: If importance threshold is not valid. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: Final component features. :type y: pd.Series :returns: The inverse transform of the target. :rtype: pd.Series .. py:method:: load(file_path: Union[str, io.BytesIO]) :staticmethod: Loads pipeline at file path. :param file_path: load filepath or a BytesIO object. :type file_path: str|BytesIO :returns: PipelineBase object .. py:method:: model_family(self) :property: Returns model family of this pipeline. .. py:method:: name(self) :property: Name of the pipeline. .. py:method:: new(self, parameters, random_seed=0) Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: A new instance of this pipeline with identical components. .. py:method:: parameters(self) :property: Parameter dictionary for this pipeline. :returns: Dictionary of all component parameters. :rtype: dict .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None) Make predictions using selected features. Note: we cast y as ints first to address boolean values that may be returned from calculating predictions which we would not be able to otherwise transform if we originally had integer targets. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param objective: The objective to use to make predictions. :type objective: Object or string :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series :returns: Estimated labels. :rtype: pd.Series .. py:method:: predict_proba(self, X, X_train=None, y_train=None) Make probability estimates for labels. :param X: Data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series or None :returns: Probability estimates :rtype: pd.DataFrame :raises ValueError: If final component is not an estimator. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves pipeline at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None) Evaluate model performance on objectives. :param X: Data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: True labels of length [n_samples] :type y: pd.Series :param objectives: List of objectives to score :type objectives: list :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series :returns: Ordered dictionary of objective scores. :rtype: dict .. py:method:: summary(self) :property: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder :returns: A string describing the pipeline structure. .. py:method:: transform(self, X, y=None) Transform the input. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None) Transforms the data by applying all pre-processing components. :param X: Input data to the pipeline to transform. :type X: pd.DataFrame :param y: Targets corresponding to X. Optional. :type y: pd.Series or None :param X_train: Training data. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Only used for time series. :type y_train: pd.Series or None :returns: New transformed features. :rtype: pd.DataFrame .. py:class:: OneHotEncoder(top_n=10, features_to_encode=None, categories=None, drop='if_binary', handle_unknown='ignore', handle_missing='error', random_seed=0, **kwargs) A transformer that encodes categorical features in a one-hot numeric array. :param top_n: Number of categories per column to encode. If None, all categories will be encoded. Otherwise, the `n` most frequent will be encoded and all others will be dropped. Defaults to 10. :type top_n: int :param features_to_encode: List of columns to encode. All other columns will remain untouched. If None, all appropriate columns will be encoded. Defaults to None. :type features_to_encode: list[str] :param categories: A two dimensional list of categories, where `categories[i]` is a list of the categories for the column at index `i`. This can also be `None`, or `"auto"` if `top_n` is not None. Defaults to None. :type categories: list :param drop: Method ("first" or "if_binary") to use to drop one category per feature. Can also be a list specifying which categories to drop for each feature. Defaults to 'if_binary'. :type drop: string, list :param handle_unknown: Whether to ignore or error for unknown categories for a feature encountered during `fit` or `transform`. If either `top_n` or `categories` is used to limit the number of categories per column, this must be "ignore". Defaults to "ignore". :type handle_unknown: string :param handle_missing: Options for how to handle missing (NaN) values encountered during `fit` or `transform`. If this is set to "as_category" and NaN values are within the `n` most frequent, "nan" values will be encoded as their own column. If this is set to "error", any missing values encountered will raise an error. Defaults to "error". :type handle_missing: string :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - One Hot Encoder * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.OneHotEncoder.categories evalml.pipelines.OneHotEncoder.clone evalml.pipelines.OneHotEncoder.default_parameters evalml.pipelines.OneHotEncoder.describe evalml.pipelines.OneHotEncoder.fit evalml.pipelines.OneHotEncoder.fit_transform evalml.pipelines.OneHotEncoder.get_feature_names evalml.pipelines.OneHotEncoder.load evalml.pipelines.OneHotEncoder.needs_fitting evalml.pipelines.OneHotEncoder.parameters evalml.pipelines.OneHotEncoder.save evalml.pipelines.OneHotEncoder.transform evalml.pipelines.OneHotEncoder.update_parameters .. py:method:: categories(self, feature_name) Returns a list of the unique categories to be encoded for the particular feature, in order. :param feature_name: The name of any feature provided to one-hot encoder during fit. :type feature_name: str :returns: The unique categories, in the same dtype as they were provided during fit. :rtype: np.ndarray :raises ValueError: If feature was not provided to one-hot encoder as a training feature. .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the one-hot encoder component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self :raises ValueError: If encoding a column failed. .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: get_feature_names(self) Return feature names for the categorical features after fitting. Feature names are formatted as {column name}_{category name}. In the event of a duplicate name, an integer will be added at the end of the feature name to distinguish it. For example, consider a dataframe with a column called "A" and category "x_y" and another column called "A_x" with "y". In this example, the feature names would be "A_x_y" and "A_x_y_1". :returns: The feature names after encoding, provided in the same order as input_features. :rtype: np.ndarray .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) One-hot encode the input data. :param X: Features to one-hot encode. :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series :returns: Transformed data, where each categorical feature has been encoded into numerical columns using one-hot encoding. :rtype: pd.DataFrame .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: OrdinalEncoder(features_to_encode=None, categories=None, handle_unknown='error', unknown_value=None, encoded_missing_value=None, random_seed=0, **kwargs) A transformer that encodes ordinal features as an array of ordinal integers representing the relative order of categories. :param features_to_encode: List of columns to encode. All other columns will remain untouched. If None, all appropriate columns will be encoded. Defaults to None. The order of columns does not matter. :type features_to_encode: list[str] :param categories: A dictionary mapping column names to their categories in the dataframes passed in at fit and transform. The order of categories specified for a column does not matter. Any category found in the data that is not present in categories will be handled as an unknown value. To not have unknown values raise an error, set handle_unknown to "use_encoded_value". Defaults to None. :type categories: dict[str, list[str]] :param handle_unknown: Whether to ignore or error for unknown categories for a feature encountered during `fit` or `transform`. When set to "error", an error will be raised when an unknown category is found. When set to "use_encoded_value", unknown categories will be encoded as the value given for the parameter unknown_value. Defaults to "error." :type handle_unknown: "error" or "use_encoded_value" :param unknown_value: The value to use for unknown categories seen during fit or transform. Required when the parameter handle_unknown is set to "use_encoded_value." The value has to be distinct from the values used to encode any of the categories in fit. Defaults to None. :type unknown_value: int or np.nan :param encoded_missing_value: The value to use for missing (null) values seen during fit or transform. Defaults to np.nan. :type encoded_missing_value: int or np.nan :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Ordinal Encoder * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.OrdinalEncoder.categories evalml.pipelines.OrdinalEncoder.clone evalml.pipelines.OrdinalEncoder.default_parameters evalml.pipelines.OrdinalEncoder.describe evalml.pipelines.OrdinalEncoder.fit evalml.pipelines.OrdinalEncoder.fit_transform evalml.pipelines.OrdinalEncoder.get_feature_names evalml.pipelines.OrdinalEncoder.load evalml.pipelines.OrdinalEncoder.needs_fitting evalml.pipelines.OrdinalEncoder.parameters evalml.pipelines.OrdinalEncoder.save evalml.pipelines.OrdinalEncoder.transform evalml.pipelines.OrdinalEncoder.update_parameters .. py:method:: categories(self, feature_name) Returns a list of the unique categories to be encoded for the particular feature, in order. :param feature_name: The name of any feature provided to ordinal encoder during fit. :type feature_name: str :returns: The unique categories, in the same dtype as they were provided during fit. :rtype: np.ndarray :raises ValueError: If feature was not provided to ordinal encoder as a training feature. .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the ordinal encoder component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self :raises ValueError: If encoding a column failed. :raises TypeError: If non-Ordinal columns are specified in features_to_encode. .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: get_feature_names(self) Return feature names for the ordinal features after fitting. Feature names are formatted as {column name}_ordinal_encoding. :returns: The feature names after encoding, provided in the same order as input_features. :rtype: np.ndarray .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Ordinally encode the input data. :param X: Features to encode. :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series :returns: Transformed data, where each ordinal feature has been encoded into a numerical column where ordinal integers represent the relative order of categories. :rtype: pd.DataFrame .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: PerColumnImputer(impute_strategies=None, random_seed=0, **kwargs) Imputes missing data according to a specified imputation strategy per column. :param impute_strategies: Column and {"impute_strategy": strategy, "fill_value":value} pairings. Valid values for impute strategy include "mean", "median", "most_frequent", "constant" for numerical data, and "most_frequent", "constant" for object data types. Defaults to None, which uses "most_frequent" for all columns. When impute_strategy == "constant", fill_value is used to replace missing data. When None, uses 0 when imputing numerical data and "missing_value" for strings or object data types. :type impute_strategies: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Per Column Imputer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.PerColumnImputer.clone evalml.pipelines.PerColumnImputer.default_parameters evalml.pipelines.PerColumnImputer.describe evalml.pipelines.PerColumnImputer.fit evalml.pipelines.PerColumnImputer.fit_transform evalml.pipelines.PerColumnImputer.load evalml.pipelines.PerColumnImputer.needs_fitting evalml.pipelines.PerColumnImputer.parameters evalml.pipelines.PerColumnImputer.save evalml.pipelines.PerColumnImputer.transform evalml.pipelines.PerColumnImputer.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits imputers on input data. :param X: The input training data of shape [n_samples, n_features] to fit. :type X: pd.DataFrame or np.ndarray :param y: The target training data of length [n_samples]. Ignored. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms input data by imputing missing values. :param X: The input training data of shape [n_samples, n_features] to transform. :type X: pd.DataFrame or np.ndarray :param y: The target training data of length [n_samples]. Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: PipelineBase(component_graph, parameters=None, custom_name=None, random_seed=0) Machine learning pipeline. :param component_graph: ComponentGraph instance, list of components in order, or dictionary of components. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component's index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"]. :type component_graph: ComponentGraph, list, dict :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param custom_name: Custom name for the pipeline. Defaults to None. :type custom_name: str :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **problem_type** - None **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.PipelineBase.can_tune_threshold_with_objective evalml.pipelines.PipelineBase.clone evalml.pipelines.PipelineBase.create_objectives evalml.pipelines.PipelineBase.custom_name evalml.pipelines.PipelineBase.describe evalml.pipelines.PipelineBase.feature_importance evalml.pipelines.PipelineBase.fit evalml.pipelines.PipelineBase.fit_transform evalml.pipelines.PipelineBase.get_component evalml.pipelines.PipelineBase.get_hyperparameter_ranges evalml.pipelines.PipelineBase.graph evalml.pipelines.PipelineBase.graph_dict evalml.pipelines.PipelineBase.graph_feature_importance evalml.pipelines.PipelineBase.inverse_transform evalml.pipelines.PipelineBase.load evalml.pipelines.PipelineBase.model_family evalml.pipelines.PipelineBase.name evalml.pipelines.PipelineBase.new evalml.pipelines.PipelineBase.parameters evalml.pipelines.PipelineBase.predict evalml.pipelines.PipelineBase.save evalml.pipelines.PipelineBase.score evalml.pipelines.PipelineBase.summary evalml.pipelines.PipelineBase.transform evalml.pipelines.PipelineBase.transform_all_but_final .. py:method:: can_tune_threshold_with_objective(self, objective) Determine whether the threshold of a binary classification pipeline can be tuned. :param objective: Primary AutoMLSearch objective. :type objective: ObjectiveBase :returns: True if the pipeline threshold can be tuned. :rtype: bool .. py:method:: clone(self) Constructs a new pipeline with the same components, parameters, and random seed. :returns: A new instance of this pipeline with identical components, parameters, and random seed. .. py:method:: create_objectives(objectives) :staticmethod: Create objective instances from a list of strings or objective classes. .. py:method:: custom_name(self) :property: Custom name of the pipeline. .. py:method:: describe(self, return_dict=False) Outputs pipeline details including component parameters. :param return_dict: If True, return dictionary of information about pipeline. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None. :rtype: dict .. py:method:: feature_importance(self) :property: Importance associated with each feature. Features dropped by the feature selection are excluded. :returns: Feature names and their corresponding importance :rtype: pd.DataFrame .. py:method:: fit(self, X, y) :abstractmethod: Build a model. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: The target training data of length [n_samples]. :type y: pd.Series, np.ndarray :returns: self .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: get_component(self, name) Returns component by name. :param name: Name of component. :type name: str :returns: Component to return :rtype: Component .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters) Returns hyperparameter ranges from all components as a dictionary. :param custom_hyperparameters: Custom hyperparameters for the pipeline. :type custom_hyperparameters: dict :returns: Dictionary of hyperparameter ranges for each component in the pipeline. :rtype: dict .. py:method:: graph(self, filepath=None) Generate an image representing the pipeline graph. :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. :raises ValueError: If path is not writeable. .. py:method:: graph_dict(self) Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases. x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}}, "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...], "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]} :returns: A dictionary representing the DAG structure. :rtype: dag_dict (dict) .. py:method:: graph_feature_importance(self, importance_threshold=0) Generate a bar graph of the pipeline's feature importance. :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero. :type importance_threshold: float, optional :returns: A bar graph showing features and their corresponding importance. :rtype: plotly.Figure :raises ValueError: If importance threshold is not valid. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: Final component features. :type y: pd.Series :returns: The inverse transform of the target. :rtype: pd.Series .. py:method:: load(file_path: Union[str, io.BytesIO]) :staticmethod: Loads pipeline at file path. :param file_path: load filepath or a BytesIO object. :type file_path: str|BytesIO :returns: PipelineBase object .. py:method:: model_family(self) :property: Returns model family of this pipeline. .. py:method:: name(self) :property: Name of the pipeline. .. py:method:: new(self, parameters, random_seed=0) Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: A new instance of this pipeline with identical components. .. py:method:: parameters(self) :property: Parameter dictionary for this pipeline. :returns: Dictionary of all component parameters. :rtype: dict .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None) Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param objective: The objective to use to make predictions. :type objective: Object or string :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series or None :returns: Predicted values. :rtype: pd.Series .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves pipeline at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None) :abstractmethod: Evaluate model performance on current and additional objectives. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: True labels of length [n_samples]. :type y: pd.Series, np.ndarray :param objectives: Non-empty list of objectives to score on. :type objectives: list :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series or None :returns: Ordered dictionary of objective scores. :rtype: dict .. py:method:: summary(self) :property: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder :returns: A string describing the pipeline structure. .. py:method:: transform(self, X, y=None) Transform the input. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None) Transforms the data by applying all pre-processing components. :param X: Input data to the pipeline to transform. :type X: pd.DataFrame :param y: Targets corresponding to X. Optional. :type y: pd.Series or None :param X_train: Training data. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Only used for time series. :type y_train: pd.Series or None :returns: New transformed features. :rtype: pd.DataFrame .. py:class:: ProphetRegressor(time_index: Optional[Hashable] = None, changepoint_prior_scale: float = 0.05, seasonality_prior_scale: int = 10, holidays_prior_scale: int = 10, seasonality_mode: str = 'additive', stan_backend: str = 'CMDSTANPY', interval_width: float = 0.95, random_seed: Union[int, float] = 0, **kwargs) Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well. More information here: https://facebook.github.io/prophet/ :param time_index: Specifies the name of the column in X that provides the datetime objects. Defaults to None. :type time_index: str :param changepoint_prior_scale: Determines the strength of the sparse prior for fitting on rate changes. Increasing this value increases the flexibility of the trend. Defaults to 0.05. :type changepoint_prior_scale: float :param seasonality_prior_scale: Similar to changepoint_prior_scale. Adjusts the extent to which the seasonality model will fit the data. Defaults to 10. :type seasonality_prior_scale: int :param holidays_prior_scale: Similar to changepoint_prior_scale. Adjusts the extent to which holidays will fit the data. Defaults to 10. :type holidays_prior_scale: int :param seasonality_mode: Determines how this component fits the seasonality. Options are "additive" and "multiplicative". Defaults to "additive". :type seasonality_mode: str :param stan_backend: Determines the backend that should be used to run Prophet. Options are "CMDSTANPY" and "PYSTAN". Defaults to "CMDSTANPY". :type stan_backend: str :param interval_width: Determines the confidence of the prediction interval range when calling `get_prediction_intervals`. Accepts values in the range (0,1). Defaults to 0.95. :type interval_width: float :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "changepoint_prior_scale": Real(0.001, 0.5), "seasonality_prior_scale": Real(0.01, 10), "holidays_prior_scale": Real(0.01, 10), "seasonality_mode": ["additive", "multiplicative"],} * - **model_family** - ModelFamily.PROPHET * - **modifies_features** - True * - **modifies_target** - False * - **name** - Prophet Regressor * - **supported_problem_types** - [ProblemTypes.TIME_SERIES_REGRESSION] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.ProphetRegressor.build_prophet_df evalml.pipelines.ProphetRegressor.clone evalml.pipelines.ProphetRegressor.default_parameters evalml.pipelines.ProphetRegressor.describe evalml.pipelines.ProphetRegressor.feature_importance evalml.pipelines.ProphetRegressor.fit evalml.pipelines.ProphetRegressor.get_params evalml.pipelines.ProphetRegressor.get_prediction_intervals evalml.pipelines.ProphetRegressor.load evalml.pipelines.ProphetRegressor.needs_fitting evalml.pipelines.ProphetRegressor.parameters evalml.pipelines.ProphetRegressor.predict evalml.pipelines.ProphetRegressor.predict_proba evalml.pipelines.ProphetRegressor.save evalml.pipelines.ProphetRegressor.update_parameters .. py:method:: build_prophet_df(X: pandas.DataFrame, y: Optional[pandas.Series] = None, time_index: str = 'ds') -> pandas.DataFrame :staticmethod: Build the Prophet data to pass fit and predict on. .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) -> dict Returns the default parameters for this component. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) -> numpy.ndarray :property: Returns array of 0's with len(1) as feature_importance is not defined for Prophet regressor. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits Prophet regressor component to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self .. py:method:: get_params(self) -> dict Get parameters for the Prophet regressor. .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted ProphetRegressor. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: List[float] :param predictions: Not used for Prophet estimator. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) -> pandas.Series Make predictions using fitted Prophet regressor. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :returns: Predicted values. :rtype: pd.Series .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: RandomForestClassifier(n_estimators=100, max_depth=6, n_jobs=-1, random_seed=0, **kwargs) Random Forest Classifier. :param n_estimators: The number of trees in the forest. Defaults to 100. :type n_estimators: float :param max_depth: Maximum tree depth for base learners. Defaults to 6. :type max_depth: int :param n_jobs: Number of jobs to run in parallel. -1 uses all processes. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "n_estimators": Integer(10, 1000), "max_depth": Integer(1, 10),} * - **model_family** - ModelFamily.RANDOM_FOREST * - **modifies_features** - True * - **modifies_target** - False * - **name** - Random Forest Classifier * - **supported_problem_types** - [ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.RandomForestClassifier.clone evalml.pipelines.RandomForestClassifier.default_parameters evalml.pipelines.RandomForestClassifier.describe evalml.pipelines.RandomForestClassifier.feature_importance evalml.pipelines.RandomForestClassifier.fit evalml.pipelines.RandomForestClassifier.get_prediction_intervals evalml.pipelines.RandomForestClassifier.load evalml.pipelines.RandomForestClassifier.needs_fitting evalml.pipelines.RandomForestClassifier.parameters evalml.pipelines.RandomForestClassifier.predict evalml.pipelines.RandomForestClassifier.predict_proba evalml.pipelines.RandomForestClassifier.save evalml.pipelines.RandomForestClassifier.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) -> pandas.Series :property: Returns importance associated with each feature. :returns: Importance associated with each feature. :rtype: np.ndarray :raises MethodPropertyNotFoundError: If estimator does not have a feature_importance method or a component_obj that implements feature_importance. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: RandomForestRegressor(n_estimators: int = 100, max_depth: int = 6, n_jobs: int = -1, random_seed: Union[int, float] = 0, **kwargs) Random Forest Regressor. :param n_estimators: The number of trees in the forest. Defaults to 100. :type n_estimators: float :param max_depth: Maximum tree depth for base learners. Defaults to 6. :type max_depth: int :param n_jobs: Number of jobs to run in parallel. -1 uses all processes. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "n_estimators": Integer(10, 1000), "max_depth": Integer(1, 32),} * - **model_family** - ModelFamily.RANDOM_FOREST * - **modifies_features** - True * - **modifies_target** - False * - **name** - Random Forest Regressor * - **supported_problem_types** - [ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.RandomForestRegressor.clone evalml.pipelines.RandomForestRegressor.default_parameters evalml.pipelines.RandomForestRegressor.describe evalml.pipelines.RandomForestRegressor.feature_importance evalml.pipelines.RandomForestRegressor.fit evalml.pipelines.RandomForestRegressor.get_prediction_intervals evalml.pipelines.RandomForestRegressor.load evalml.pipelines.RandomForestRegressor.needs_fitting evalml.pipelines.RandomForestRegressor.parameters evalml.pipelines.RandomForestRegressor.predict evalml.pipelines.RandomForestRegressor.predict_proba evalml.pipelines.RandomForestRegressor.save evalml.pipelines.RandomForestRegressor.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) -> pandas.Series :property: Returns importance associated with each feature. :returns: Importance associated with each feature. :rtype: np.ndarray :raises MethodPropertyNotFoundError: If estimator does not have a feature_importance method or a component_obj that implements feature_importance. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted RandomForestRegressor. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Optional. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: RegressionPipeline(component_graph, parameters=None, custom_name=None, random_seed=0) Pipeline subclass for all regression pipelines. :param component_graph: ComponentGraph instance, list of components in order, or dictionary of components. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component's index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"] :type component_graph: ComponentGraph, list, dict :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param custom_name: Custom name for the pipeline. Defaults to None. :type custom_name: str :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int .. rubric:: Example >>> pipeline = RegressionPipeline(component_graph=["Simple Imputer", "Linear Regressor"], ... parameters={"Simple Imputer": {"impute_strategy": "mean"}}, ... custom_name="My Regression Pipeline") ... >>> assert pipeline.custom_name == "My Regression Pipeline" >>> assert pipeline.component_graph.component_dict.keys() == {'Simple Imputer', 'Linear Regressor'} The pipeline parameters will be chosen from the default parameters for every component, unless specific parameters were passed in as they were above. >>> assert pipeline.parameters == { ... 'Simple Imputer': {'impute_strategy': 'mean', 'fill_value': None}, ... 'Linear Regressor': {'fit_intercept': True, 'n_jobs': -1}} **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **problem_type** - ProblemTypes.REGRESSION **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.RegressionPipeline.can_tune_threshold_with_objective evalml.pipelines.RegressionPipeline.clone evalml.pipelines.RegressionPipeline.create_objectives evalml.pipelines.RegressionPipeline.custom_name evalml.pipelines.RegressionPipeline.describe evalml.pipelines.RegressionPipeline.feature_importance evalml.pipelines.RegressionPipeline.fit evalml.pipelines.RegressionPipeline.fit_transform evalml.pipelines.RegressionPipeline.get_component evalml.pipelines.RegressionPipeline.get_hyperparameter_ranges evalml.pipelines.RegressionPipeline.graph evalml.pipelines.RegressionPipeline.graph_dict evalml.pipelines.RegressionPipeline.graph_feature_importance evalml.pipelines.RegressionPipeline.inverse_transform evalml.pipelines.RegressionPipeline.load evalml.pipelines.RegressionPipeline.model_family evalml.pipelines.RegressionPipeline.name evalml.pipelines.RegressionPipeline.new evalml.pipelines.RegressionPipeline.parameters evalml.pipelines.RegressionPipeline.predict evalml.pipelines.RegressionPipeline.save evalml.pipelines.RegressionPipeline.score evalml.pipelines.RegressionPipeline.summary evalml.pipelines.RegressionPipeline.transform evalml.pipelines.RegressionPipeline.transform_all_but_final .. py:method:: can_tune_threshold_with_objective(self, objective) Determine whether the threshold of a binary classification pipeline can be tuned. :param objective: Primary AutoMLSearch objective. :type objective: ObjectiveBase :returns: True if the pipeline threshold can be tuned. :rtype: bool .. py:method:: clone(self) Constructs a new pipeline with the same components, parameters, and random seed. :returns: A new instance of this pipeline with identical components, parameters, and random seed. .. py:method:: create_objectives(objectives) :staticmethod: Create objective instances from a list of strings or objective classes. .. py:method:: custom_name(self) :property: Custom name of the pipeline. .. py:method:: describe(self, return_dict=False) Outputs pipeline details including component parameters. :param return_dict: If True, return dictionary of information about pipeline. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None. :rtype: dict .. py:method:: feature_importance(self) :property: Importance associated with each feature. Features dropped by the feature selection are excluded. :returns: Feature names and their corresponding importance :rtype: pd.DataFrame .. py:method:: fit(self, X, y) Build a regression model. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: The target training data of length [n_samples] :type y: pd.Series, np.ndarray :returns: self :raises ValueError: If the target is not numeric. .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: get_component(self, name) Returns component by name. :param name: Name of component. :type name: str :returns: Component to return :rtype: Component .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters) Returns hyperparameter ranges from all components as a dictionary. :param custom_hyperparameters: Custom hyperparameters for the pipeline. :type custom_hyperparameters: dict :returns: Dictionary of hyperparameter ranges for each component in the pipeline. :rtype: dict .. py:method:: graph(self, filepath=None) Generate an image representing the pipeline graph. :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. :raises ValueError: If path is not writeable. .. py:method:: graph_dict(self) Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases. x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}}, "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...], "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]} :returns: A dictionary representing the DAG structure. :rtype: dag_dict (dict) .. py:method:: graph_feature_importance(self, importance_threshold=0) Generate a bar graph of the pipeline's feature importance. :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero. :type importance_threshold: float, optional :returns: A bar graph showing features and their corresponding importance. :rtype: plotly.Figure :raises ValueError: If importance threshold is not valid. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: Final component features. :type y: pd.Series :returns: The inverse transform of the target. :rtype: pd.Series .. py:method:: load(file_path: Union[str, io.BytesIO]) :staticmethod: Loads pipeline at file path. :param file_path: load filepath or a BytesIO object. :type file_path: str|BytesIO :returns: PipelineBase object .. py:method:: model_family(self) :property: Returns model family of this pipeline. .. py:method:: name(self) :property: Name of the pipeline. .. py:method:: new(self, parameters, random_seed=0) Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: A new instance of this pipeline with identical components. .. py:method:: parameters(self) :property: Parameter dictionary for this pipeline. :returns: Dictionary of all component parameters. :rtype: dict .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None) Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param objective: The objective to use to make predictions. :type objective: Object or string :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series or None :returns: Predicted values. :rtype: pd.Series .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves pipeline at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None) Evaluate model performance on current and additional objectives. :param X: Data of shape [n_samples, n_features] :type X: pd.DataFrame, or np.ndarray :param y: True values of length [n_samples] :type y: pd.Series, or np.ndarray :param objectives: Non-empty list of objectives to score on :type objectives: list :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series or None :returns: Ordered dictionary of objective scores. :rtype: dict .. py:method:: summary(self) :property: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder :returns: A string describing the pipeline structure. .. py:method:: transform(self, X, y=None) Transform the input. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None) Transforms the data by applying all pre-processing components. :param X: Input data to the pipeline to transform. :type X: pd.DataFrame :param y: Targets corresponding to X. Optional. :type y: pd.Series or None :param X_train: Training data. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Only used for time series. :type y_train: pd.Series or None :returns: New transformed features. :rtype: pd.DataFrame .. py:class:: RFClassifierSelectFromModel(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold='median', n_jobs=-1, random_seed=0, **kwargs) Selects top features based on importance weights using a Random Forest classifier. :param number_features: The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to None. :type number_features: int :param n_estimators: The number of trees in the forest. Defaults to 10. :type n_estimators: int :param max_depth: Maximum tree depth for base learners. Defaults to None. :type max_depth: int :param percent_features: Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. :type percent_features: float :param threshold: The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If "median", then the threshold value is the median of the feature importances. A scaling factor (e.g., "1.25*mean") may also be used. Defaults to median. :type threshold: string or float :param n_jobs: Number of jobs to run in parallel. -1 uses all processes. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "percent_features": Real(0.01, 1), "threshold": ["mean", "median"],} * - **modifies_features** - True * - **modifies_target** - False * - **name** - RF Classifier Select From Model * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.RFClassifierSelectFromModel.clone evalml.pipelines.RFClassifierSelectFromModel.default_parameters evalml.pipelines.RFClassifierSelectFromModel.describe evalml.pipelines.RFClassifierSelectFromModel.fit evalml.pipelines.RFClassifierSelectFromModel.fit_transform evalml.pipelines.RFClassifierSelectFromModel.get_names evalml.pipelines.RFClassifierSelectFromModel.load evalml.pipelines.RFClassifierSelectFromModel.needs_fitting evalml.pipelines.RFClassifierSelectFromModel.parameters evalml.pipelines.RFClassifierSelectFromModel.save evalml.pipelines.RFClassifierSelectFromModel.transform evalml.pipelines.RFClassifierSelectFromModel.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises MethodPropertyNotFoundError: If component does not have a fit method or a component_obj that implements fit. .. py:method:: fit_transform(self, X, y=None) Fit and transform data using the feature selector. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:method:: get_names(self) Get names of selected features. :returns: List of the names of features selected. :rtype: list[str] .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception. :param X: Data to transform. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If feature selector does not have a transform method or a component_obj that implements transform .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: RFRegressorSelectFromModel(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold='median', n_jobs=-1, random_seed=0, **kwargs) Selects top features based on importance weights using a Random Forest regressor. :param number_features: The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. :type number_features: int :param n_estimators: The number of trees in the forest. Defaults to 10. :type n_estimators: int :param max_depth: Maximum tree depth for base learners. Defaults to None. :type max_depth: int :param percent_features: Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. :type percent_features: float :param threshold: The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If "median", then the threshold value is the median of the feature importances. A scaling factor (e.g., "1.25*mean") may also be used. Defaults to median. :type threshold: string or float :param n_jobs: Number of jobs to run in parallel. -1 uses all processes. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "percent_features": Real(0.01, 1), "threshold": ["mean", "median"],} * - **modifies_features** - True * - **modifies_target** - False * - **name** - RF Regressor Select From Model * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.RFRegressorSelectFromModel.clone evalml.pipelines.RFRegressorSelectFromModel.default_parameters evalml.pipelines.RFRegressorSelectFromModel.describe evalml.pipelines.RFRegressorSelectFromModel.fit evalml.pipelines.RFRegressorSelectFromModel.fit_transform evalml.pipelines.RFRegressorSelectFromModel.get_names evalml.pipelines.RFRegressorSelectFromModel.load evalml.pipelines.RFRegressorSelectFromModel.needs_fitting evalml.pipelines.RFRegressorSelectFromModel.parameters evalml.pipelines.RFRegressorSelectFromModel.save evalml.pipelines.RFRegressorSelectFromModel.transform evalml.pipelines.RFRegressorSelectFromModel.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises MethodPropertyNotFoundError: If component does not have a fit method or a component_obj that implements fit. .. py:method:: fit_transform(self, X, y=None) Fit and transform data using the feature selector. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:method:: get_names(self) Get names of selected features. :returns: List of the names of features selected. :rtype: list[str] .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception. :param X: Data to transform. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If feature selector does not have a transform method or a component_obj that implements transform .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: SimpleImputer(impute_strategy='most_frequent', fill_value=None, random_seed=0, **kwargs) Imputes missing data according to a specified imputation strategy. Natural language columns are ignored. :param impute_strategy: Impute strategy to use. Valid values include "mean", "median", "most_frequent", "constant" for numerical data, and "most_frequent", "constant" for object data types. :type impute_strategy: string :param fill_value: When impute_strategy == "constant", fill_value is used to replace missing data. Defaults to 0 when imputing numerical data and "missing_value" for strings or object data types. :type fill_value: string :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "impute_strategy": ["mean", "median", "most_frequent"]} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Simple Imputer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.SimpleImputer.clone evalml.pipelines.SimpleImputer.default_parameters evalml.pipelines.SimpleImputer.describe evalml.pipelines.SimpleImputer.fit evalml.pipelines.SimpleImputer.fit_transform evalml.pipelines.SimpleImputer.load evalml.pipelines.SimpleImputer.needs_fitting evalml.pipelines.SimpleImputer.parameters evalml.pipelines.SimpleImputer.save evalml.pipelines.SimpleImputer.transform evalml.pipelines.SimpleImputer.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits imputer to data. 'None' values are converted to np.nan before imputation and are treated as the same. :param X: the input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: the target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises ValueError: if the SimpleImputer receives a dataframe with both Boolean and Categorical data. .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform :type X: pd.DataFrame :param y: Target data. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms input by imputing missing values. 'None' and np.nan values are treated as the same. :param X: Data to transform. :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: StackedEnsembleBase(final_estimator=None, n_jobs=-1, random_seed=0, **kwargs) Stacked Ensemble Base Class. :param final_estimator: The estimator used to combine the base estimators. :type final_estimator: Estimator or subclass :param n_jobs: Integer describing level of parallelism used for pipelines. None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs greater than -1, (n_cpus + 1 + n_jobs) are used. Defaults to -1. - Note: there could be some multi-process errors thrown for values of `n_jobs != 1`. If this is the case, please use `n_jobs = 1`. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **model_family** - ModelFamily.ENSEMBLE * - **modifies_features** - True * - **modifies_target** - False * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.StackedEnsembleBase.clone evalml.pipelines.StackedEnsembleBase.default_parameters evalml.pipelines.StackedEnsembleBase.describe evalml.pipelines.StackedEnsembleBase.feature_importance evalml.pipelines.StackedEnsembleBase.fit evalml.pipelines.StackedEnsembleBase.get_prediction_intervals evalml.pipelines.StackedEnsembleBase.load evalml.pipelines.StackedEnsembleBase.name evalml.pipelines.StackedEnsembleBase.needs_fitting evalml.pipelines.StackedEnsembleBase.parameters evalml.pipelines.StackedEnsembleBase.predict evalml.pipelines.StackedEnsembleBase.predict_proba evalml.pipelines.StackedEnsembleBase.save evalml.pipelines.StackedEnsembleBase.supported_problem_types evalml.pipelines.StackedEnsembleBase.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for stacked ensemble classes. :returns: default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Not implemented for StackedEnsembleClassifier and StackedEnsembleRegressor. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: name(cls) :property: Returns string name of this component. .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: supported_problem_types(cls) :property: Problem types this estimator supports. .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: StackedEnsembleClassifier(final_estimator=None, n_jobs=-1, random_seed=0, **kwargs) Stacked Ensemble Classifier. :param final_estimator: The classifier used to combine the base estimators. If None, uses ElasticNetClassifier. :type final_estimator: Estimator or subclass :param n_jobs: Integer describing level of parallelism used for pipelines. None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Defaults to -1. - Note: there could be some multi-process errors thrown for values of `n_jobs != 1`. If this is the case, please use `n_jobs = 1`. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int .. rubric:: Example >>> from evalml.pipelines.component_graph import ComponentGraph >>> from evalml.pipelines.components.estimators.classifiers.decision_tree_classifier import DecisionTreeClassifier >>> from evalml.pipelines.components.estimators.classifiers.elasticnet_classifier import ElasticNetClassifier ... >>> component_graph = { ... "Decision Tree": [DecisionTreeClassifier(random_seed=3), "X", "y"], ... "Decision Tree B": [DecisionTreeClassifier(random_seed=4), "X", "y"], ... "Stacked Ensemble": [ ... StackedEnsembleClassifier(n_jobs=1, final_estimator=DecisionTreeClassifier()), ... "Decision Tree.x", ... "Decision Tree B.x", ... "y", ... ], ... } ... >>> cg = ComponentGraph(component_graph) >>> assert cg.default_parameters == { ... 'Decision Tree Classifier': {'criterion': 'gini', ... 'max_features': 'auto', ... 'max_depth': 6, ... 'min_samples_split': 2, ... 'min_weight_fraction_leaf': 0.0}, ... 'Stacked Ensemble Classifier': {'final_estimator': ElasticNetClassifier, ... 'n_jobs': -1}} **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **model_family** - ModelFamily.ENSEMBLE * - **modifies_features** - True * - **modifies_target** - False * - **name** - Stacked Ensemble Classifier * - **supported_problem_types** - [ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.StackedEnsembleClassifier.clone evalml.pipelines.StackedEnsembleClassifier.default_parameters evalml.pipelines.StackedEnsembleClassifier.describe evalml.pipelines.StackedEnsembleClassifier.feature_importance evalml.pipelines.StackedEnsembleClassifier.fit evalml.pipelines.StackedEnsembleClassifier.get_prediction_intervals evalml.pipelines.StackedEnsembleClassifier.load evalml.pipelines.StackedEnsembleClassifier.needs_fitting evalml.pipelines.StackedEnsembleClassifier.parameters evalml.pipelines.StackedEnsembleClassifier.predict evalml.pipelines.StackedEnsembleClassifier.predict_proba evalml.pipelines.StackedEnsembleClassifier.save evalml.pipelines.StackedEnsembleClassifier.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for stacked ensemble classes. :returns: default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Not implemented for StackedEnsembleClassifier and StackedEnsembleRegressor. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: StackedEnsembleRegressor(final_estimator=None, n_jobs=-1, random_seed=0, **kwargs) Stacked Ensemble Regressor. :param final_estimator: The regressor used to combine the base estimators. If None, uses ElasticNetRegressor. :type final_estimator: Estimator or subclass :param n_jobs: Integer describing level of parallelism used for pipelines. None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs greater than -1, (n_cpus + 1 + n_jobs) are used. Defaults to -1. - Note: there could be some multi-process errors thrown for values of `n_jobs != 1`. If this is the case, please use `n_jobs = 1`. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int .. rubric:: Example >>> from evalml.pipelines.component_graph import ComponentGraph >>> from evalml.pipelines.components.estimators.regressors.rf_regressor import RandomForestRegressor >>> from evalml.pipelines.components.estimators.regressors.elasticnet_regressor import ElasticNetRegressor ... >>> component_graph = { ... "Random Forest": [RandomForestRegressor(random_seed=3), "X", "y"], ... "Random Forest B": [RandomForestRegressor(random_seed=4), "X", "y"], ... "Stacked Ensemble": [ ... StackedEnsembleRegressor(n_jobs=1, final_estimator=RandomForestRegressor()), ... "Random Forest.x", ... "Random Forest B.x", ... "y", ... ], ... } ... >>> cg = ComponentGraph(component_graph) >>> assert cg.default_parameters == { ... 'Random Forest Regressor': {'n_estimators': 100, ... 'max_depth': 6, ... 'n_jobs': -1}, ... 'Stacked Ensemble Regressor': {'final_estimator': ElasticNetRegressor, ... 'n_jobs': -1}} **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **model_family** - ModelFamily.ENSEMBLE * - **modifies_features** - True * - **modifies_target** - False * - **name** - Stacked Ensemble Regressor * - **supported_problem_types** - [ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.StackedEnsembleRegressor.clone evalml.pipelines.StackedEnsembleRegressor.default_parameters evalml.pipelines.StackedEnsembleRegressor.describe evalml.pipelines.StackedEnsembleRegressor.feature_importance evalml.pipelines.StackedEnsembleRegressor.fit evalml.pipelines.StackedEnsembleRegressor.get_prediction_intervals evalml.pipelines.StackedEnsembleRegressor.load evalml.pipelines.StackedEnsembleRegressor.needs_fitting evalml.pipelines.StackedEnsembleRegressor.parameters evalml.pipelines.StackedEnsembleRegressor.predict evalml.pipelines.StackedEnsembleRegressor.predict_proba evalml.pipelines.StackedEnsembleRegressor.save evalml.pipelines.StackedEnsembleRegressor.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for stacked ensemble classes. :returns: default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Not implemented for StackedEnsembleClassifier and StackedEnsembleRegressor. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: StandardScaler(random_seed=0, **kwargs) A transformer that standardizes input features by removing the mean and scaling to unit variance. :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Standard Scaler * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.StandardScaler.clone evalml.pipelines.StandardScaler.default_parameters evalml.pipelines.StandardScaler.describe evalml.pipelines.StandardScaler.fit evalml.pipelines.StandardScaler.fit_transform evalml.pipelines.StandardScaler.load evalml.pipelines.StandardScaler.needs_fitting evalml.pipelines.StandardScaler.parameters evalml.pipelines.StandardScaler.save evalml.pipelines.StandardScaler.transform evalml.pipelines.StandardScaler.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the standard scalar on the given data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fit and transform data using the standard scaler component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transform data using the fitted standard scaler. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: SVMClassifier(C=1.0, kernel='rbf', gamma='auto', probability=True, random_seed=0, **kwargs) Support Vector Machine Classifier. :param C: The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0. :type C: float :param kernel: Specifies the kernel type to be used in the algorithm. Defaults to "rbf". :type kernel: {"poly", "rbf", "sigmoid"} :param gamma: Kernel coefficient for "rbf", "poly" and "sigmoid". Defaults to "auto". - If gamma='scale' is passed then it uses 1 / (n_features * X.var()) as value of gamma - If "auto" (default), uses 1 / n_features :type gamma: {"scale", "auto"} or float :param probability: Whether to enable probability estimates. Defaults to True. :type probability: boolean :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "C": Real(0, 10), "kernel": ["poly", "rbf", "sigmoid"], "gamma": ["scale", "auto"],} * - **model_family** - ModelFamily.SVM * - **modifies_features** - True * - **modifies_target** - False * - **name** - SVM Classifier * - **supported_problem_types** - [ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.SVMClassifier.clone evalml.pipelines.SVMClassifier.default_parameters evalml.pipelines.SVMClassifier.describe evalml.pipelines.SVMClassifier.feature_importance evalml.pipelines.SVMClassifier.fit evalml.pipelines.SVMClassifier.get_prediction_intervals evalml.pipelines.SVMClassifier.load evalml.pipelines.SVMClassifier.needs_fitting evalml.pipelines.SVMClassifier.parameters evalml.pipelines.SVMClassifier.predict evalml.pipelines.SVMClassifier.predict_proba evalml.pipelines.SVMClassifier.save evalml.pipelines.SVMClassifier.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Feature importance only works with linear kernels. If the kernel isn't linear, we return a numpy array of zeros. :returns: Feature importance of fitted SVM classifier or a numpy array of zeroes if the kernel is not linear. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: SVMRegressor(C=1.0, kernel='rbf', gamma='auto', random_seed=0, **kwargs) Support Vector Machine Regressor. :param C: The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0. :type C: float :param kernel: Specifies the kernel type to be used in the algorithm. Defaults to "rbf". :type kernel: {"poly", "rbf", "sigmoid"} :param gamma: Kernel coefficient for "rbf", "poly" and "sigmoid". Defaults to "auto". - If gamma='scale' is passed then it uses 1 / (n_features * X.var()) as value of gamma - If "auto" (default), uses 1 / n_features :type gamma: {"scale", "auto"} or float :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "C": Real(0, 10), "kernel": ["poly", "rbf", "sigmoid"], "gamma": ["scale", "auto"],} * - **model_family** - ModelFamily.SVM * - **modifies_features** - True * - **modifies_target** - False * - **name** - SVM Regressor * - **supported_problem_types** - [ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.SVMRegressor.clone evalml.pipelines.SVMRegressor.default_parameters evalml.pipelines.SVMRegressor.describe evalml.pipelines.SVMRegressor.feature_importance evalml.pipelines.SVMRegressor.fit evalml.pipelines.SVMRegressor.get_prediction_intervals evalml.pipelines.SVMRegressor.load evalml.pipelines.SVMRegressor.needs_fitting evalml.pipelines.SVMRegressor.parameters evalml.pipelines.SVMRegressor.predict evalml.pipelines.SVMRegressor.predict_proba evalml.pipelines.SVMRegressor.save evalml.pipelines.SVMRegressor.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Feature importance of fitted SVM regresor. Only works with linear kernels. If the kernel isn't linear, we return a numpy array of zeros. :returns: The feature importance of the fitted SVM regressor, or an array of zeroes if the kernel is not linear. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: TargetEncoder(cols=None, smoothing=1, handle_unknown='value', handle_missing='value', random_seed=0, **kwargs) A transformer that encodes categorical features into target encodings. :param cols: Columns to encode. If None, all string columns will be encoded, otherwise only the columns provided will be encoded. Defaults to None :type cols: list :param smoothing: The smoothing factor to apply. The larger this value is, the more influence the expected target value has on the resulting target encodings. Must be strictly larger than 0. Defaults to 1.0 :type smoothing: float :param handle_unknown: Determines how to handle unknown categories for a feature encountered. Options are 'value', 'error', nd 'return_nan'. Defaults to 'value', which replaces with the target mean :type handle_unknown: string :param handle_missing: Determines how to handle missing values encountered during `fit` or `transform`. Options are 'value', 'error', and 'return_nan'. Defaults to 'value', which replaces with the target mean :type handle_missing: string :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Target Encoder * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.TargetEncoder.clone evalml.pipelines.TargetEncoder.default_parameters evalml.pipelines.TargetEncoder.describe evalml.pipelines.TargetEncoder.fit evalml.pipelines.TargetEncoder.fit_transform evalml.pipelines.TargetEncoder.get_feature_names evalml.pipelines.TargetEncoder.load evalml.pipelines.TargetEncoder.needs_fitting evalml.pipelines.TargetEncoder.parameters evalml.pipelines.TargetEncoder.save evalml.pipelines.TargetEncoder.transform evalml.pipelines.TargetEncoder.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y) Fits the target encoder. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y) Fit and transform data using the target encoder. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:method:: get_feature_names(self) Return feature names for the input features after fitting. :returns: The feature names after encoding. :rtype: np.array .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transform data using the fitted target encoder. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: TimeSeriesBinaryClassificationPipeline(component_graph, parameters=None, custom_name=None, random_seed=0) Pipeline base class for time series binary classification problems. :param component_graph: List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component's index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"] :type component_graph: list or dict :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as time_index, gap, and max_delay must be specified with the "pipeline" key. For example: Pipeline(parameters={"pipeline": {"time_index": "Date", "max_delay": 4, "gap": 2}}). :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int .. rubric:: Example >>> pipeline = TimeSeriesBinaryClassificationPipeline(component_graph=["Simple Imputer", "Logistic Regression Classifier"], ... parameters={"Logistic Regression Classifier": {"penalty": "elasticnet", ... "solver": "liblinear"}, ... "pipeline": {"gap": 1, "max_delay": 1, "forecast_horizon": 1, "time_index": "date"}}, ... custom_name="My TimeSeriesBinary Pipeline") ... >>> assert pipeline.custom_name == "My TimeSeriesBinary Pipeline" >>> assert pipeline.component_graph.component_dict.keys() == {'Simple Imputer', 'Logistic Regression Classifier'} ... >>> assert pipeline.parameters == { ... 'Simple Imputer': {'impute_strategy': 'most_frequent', 'fill_value': None}, ... 'Logistic Regression Classifier': {'penalty': 'elasticnet', ... 'C': 1.0, ... 'n_jobs': -1, ... 'multi_class': 'auto', ... 'solver': 'liblinear'}, ... 'pipeline': {'gap': 1, 'max_delay': 1, 'forecast_horizon': 1, 'time_index': "date"}} **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **problem_type** - None **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.TimeSeriesBinaryClassificationPipeline.can_tune_threshold_with_objective evalml.pipelines.TimeSeriesBinaryClassificationPipeline.classes_ evalml.pipelines.TimeSeriesBinaryClassificationPipeline.clone evalml.pipelines.TimeSeriesBinaryClassificationPipeline.create_objectives evalml.pipelines.TimeSeriesBinaryClassificationPipeline.custom_name evalml.pipelines.TimeSeriesBinaryClassificationPipeline.dates_needed_for_prediction evalml.pipelines.TimeSeriesBinaryClassificationPipeline.dates_needed_for_prediction_range evalml.pipelines.TimeSeriesBinaryClassificationPipeline.describe evalml.pipelines.TimeSeriesBinaryClassificationPipeline.feature_importance evalml.pipelines.TimeSeriesBinaryClassificationPipeline.fit evalml.pipelines.TimeSeriesBinaryClassificationPipeline.fit_transform evalml.pipelines.TimeSeriesBinaryClassificationPipeline.get_component evalml.pipelines.TimeSeriesBinaryClassificationPipeline.get_hyperparameter_ranges evalml.pipelines.TimeSeriesBinaryClassificationPipeline.graph evalml.pipelines.TimeSeriesBinaryClassificationPipeline.graph_dict evalml.pipelines.TimeSeriesBinaryClassificationPipeline.graph_feature_importance evalml.pipelines.TimeSeriesBinaryClassificationPipeline.inverse_transform evalml.pipelines.TimeSeriesBinaryClassificationPipeline.load evalml.pipelines.TimeSeriesBinaryClassificationPipeline.model_family evalml.pipelines.TimeSeriesBinaryClassificationPipeline.name evalml.pipelines.TimeSeriesBinaryClassificationPipeline.new evalml.pipelines.TimeSeriesBinaryClassificationPipeline.optimize_threshold evalml.pipelines.TimeSeriesBinaryClassificationPipeline.parameters evalml.pipelines.TimeSeriesBinaryClassificationPipeline.predict evalml.pipelines.TimeSeriesBinaryClassificationPipeline.predict_in_sample evalml.pipelines.TimeSeriesBinaryClassificationPipeline.predict_proba evalml.pipelines.TimeSeriesBinaryClassificationPipeline.predict_proba_in_sample evalml.pipelines.TimeSeriesBinaryClassificationPipeline.save evalml.pipelines.TimeSeriesBinaryClassificationPipeline.score evalml.pipelines.TimeSeriesBinaryClassificationPipeline.summary evalml.pipelines.TimeSeriesBinaryClassificationPipeline.threshold evalml.pipelines.TimeSeriesBinaryClassificationPipeline.transform evalml.pipelines.TimeSeriesBinaryClassificationPipeline.transform_all_but_final .. py:method:: can_tune_threshold_with_objective(self, objective) Determine whether the threshold of a binary classification pipeline can be tuned. :param objective: Primary AutoMLSearch objective. :type objective: ObjectiveBase :returns: True if the pipeline threshold can be tuned. :rtype: bool .. py:method:: classes_(self) :property: Gets the class names for the pipeline. Will return None before pipeline is fit. .. py:method:: clone(self) Constructs a new pipeline with the same components, parameters, and random seed. :returns: A new instance of this pipeline with identical components, parameters, and random seed. .. py:method:: create_objectives(objectives) :staticmethod: Create objective instances from a list of strings or objective classes. .. py:method:: custom_name(self) :property: Custom name of the pipeline. .. py:method:: dates_needed_for_prediction(self, date) Return dates needed to forecast the given date in the future. :param date: Date to forecast in the future. :type date: pd.Timestamp :returns: Range of dates needed to forecast the given date. :rtype: dates_needed (tuple(pd.Timestamp)) .. py:method:: dates_needed_for_prediction_range(self, start_date, end_date) Return dates needed to forecast the given date in the future. :param start_date: Start date of range to forecast in the future. :type start_date: pd.Timestamp :param end_date: End date of range to forecast in the future. :type end_date: pd.Timestamp :returns: Range of dates needed to forecast the given date. :rtype: dates_needed (tuple(pd.Timestamp)) :raises ValueError: If start_date doesn't come before end_date .. py:method:: describe(self, return_dict=False) Outputs pipeline details including component parameters. :param return_dict: If True, return dictionary of information about pipeline. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None. :rtype: dict .. py:method:: feature_importance(self) :property: Importance associated with each feature. Features dropped by the feature selection are excluded. :returns: Feature names and their corresponding importance :rtype: pd.DataFrame .. py:method:: fit(self, X, y) Fit a time series classification model. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: The target training labels of length [n_samples] :type y: pd.Series, np.ndarray :returns: self :raises ValueError: If the number of unique classes in y are not appropriate for the type of pipeline. .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: get_component(self, name) Returns component by name. :param name: Name of component. :type name: str :returns: Component to return :rtype: Component .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters) Returns hyperparameter ranges from all components as a dictionary. :param custom_hyperparameters: Custom hyperparameters for the pipeline. :type custom_hyperparameters: dict :returns: Dictionary of hyperparameter ranges for each component in the pipeline. :rtype: dict .. py:method:: graph(self, filepath=None) Generate an image representing the pipeline graph. :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. :raises ValueError: If path is not writeable. .. py:method:: graph_dict(self) Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases. x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}}, "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...], "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]} :returns: A dictionary representing the DAG structure. :rtype: dag_dict (dict) .. py:method:: graph_feature_importance(self, importance_threshold=0) Generate a bar graph of the pipeline's feature importance. :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero. :type importance_threshold: float, optional :returns: A bar graph showing features and their corresponding importance. :rtype: plotly.Figure :raises ValueError: If importance threshold is not valid. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: Final component features. :type y: pd.Series :returns: The inverse transform of the target. :rtype: pd.Series .. py:method:: load(file_path: Union[str, io.BytesIO]) :staticmethod: Loads pipeline at file path. :param file_path: load filepath or a BytesIO object. :type file_path: str|BytesIO :returns: PipelineBase object .. py:method:: model_family(self) :property: Returns model family of this pipeline. .. py:method:: name(self) :property: Name of the pipeline. .. py:method:: new(self, parameters, random_seed=0) Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: A new instance of this pipeline with identical components. .. py:method:: optimize_threshold(self, X, y, y_pred_proba, objective) Optimize the pipeline threshold given the objective to use. Only used for binary problems with objectives whose thresholds can be tuned. :param X: Input features. :type X: pd.DataFrame :param y: Input target values. :type y: pd.Series :param y_pred_proba: The predicted probabilities of the target outputted by the pipeline. :type y_pred_proba: pd.Series :param objective: The objective to threshold with. Must have a tunable threshold. :type objective: ObjectiveBase :raises ValueError: If objective is not optimizable. .. py:method:: parameters(self) :property: Parameter dictionary for this pipeline. :returns: Dictionary of all component parameters. :rtype: dict .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None) Predict on future data where target is not known. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param objective: The objective to use to make predictions. :type objective: Object or string :param X_train: Training data. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. :type y_train: pd.Series or None :raises ValueError: If X_train and/or y_train are None or if final component is not an Estimator. :returns: Predictions. .. py:method:: predict_in_sample(self, X, y, X_train, y_train, objective=None) Predict on future data where the target is known, e.g. cross validation. :param X: Future data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Future target of shape [n_samples]. :type y: pd.Series :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_feautures]. :type X_train: pd.DataFrame :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series :param objective: Objective used to threshold predicted probabilities, optional. Defaults to None. :type objective: ObjectiveBase, str :returns: Estimated labels. :rtype: pd.Series :raises ValueError: If objective is not defined for time-series binary classification problems. .. py:method:: predict_proba(self, X, X_train=None, y_train=None) Predict on future data where the target is unknown. :param X: Future data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_features]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :returns: Estimated probabilities. :rtype: pd.Series :raises ValueError: If final component is not an Estimator. .. py:method:: predict_proba_in_sample(self, X_holdout, y_holdout, X_train, y_train) Predict on future data where the target is known, e.g. cross validation. :param X_holdout: Future data of shape [n_samples, n_features]. :type X_holdout: pd.DataFrame or np.ndarray :param y_holdout: Future target of shape [n_samples]. :type y_holdout: pd.Series, np.ndarray :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_features]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :returns: Estimated probabilities. :rtype: pd.Series :raises ValueError: If the final component is not an Estimator. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves pipeline at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None) Evaluate model performance on current and additional objectives. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: True labels of length [n_samples]. :type y: pd.Series :param objectives: Non-empty list of objectives to score on. :type objectives: list :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_features]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :returns: Ordered dictionary of objective scores. :rtype: dict .. py:method:: summary(self) :property: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder :returns: A string describing the pipeline structure. .. py:method:: threshold(self) :property: Threshold used to make a prediction. Defaults to None. .. py:method:: transform(self, X, y=None) Transform the input. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None, calculating_residuals=False) Transforms the data by applying all pre-processing components. :param X: Input data to the pipeline to transform. :type X: pd.DataFrame :param y: Targets corresponding to the pipeline targets. :type y: pd.Series :param X_train: Training data used to generate generates from past observations. :type X_train: pd.DataFrame :param y_train: Training targets used to generate features from past observations. :type y_train: pd.Series :param calculating_residuals: Whether we're calling predict_in_sample to calculate the residuals. This means the X and y arguments are not future data, but actually the train data. :type calculating_residuals: bool :returns: New transformed features. :rtype: pd.DataFrame .. py:class:: TimeSeriesClassificationPipeline(component_graph, parameters=None, custom_name=None, random_seed=0) Pipeline base class for time series classification problems. :param component_graph: ComponentGraph instance, list of components in order, or dictionary of components. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component's index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"] :type component_graph: ComponentGraph, list, dict :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as time_index, gap, and max_delay must be specified with the "pipeline" key. For example: Pipeline(parameters={"pipeline": {"time_index": "Date", "max_delay": 4, "gap": 2}}). :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **problem_type** - None **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.TimeSeriesClassificationPipeline.can_tune_threshold_with_objective evalml.pipelines.TimeSeriesClassificationPipeline.classes_ evalml.pipelines.TimeSeriesClassificationPipeline.clone evalml.pipelines.TimeSeriesClassificationPipeline.create_objectives evalml.pipelines.TimeSeriesClassificationPipeline.custom_name evalml.pipelines.TimeSeriesClassificationPipeline.dates_needed_for_prediction evalml.pipelines.TimeSeriesClassificationPipeline.dates_needed_for_prediction_range evalml.pipelines.TimeSeriesClassificationPipeline.describe evalml.pipelines.TimeSeriesClassificationPipeline.feature_importance evalml.pipelines.TimeSeriesClassificationPipeline.fit evalml.pipelines.TimeSeriesClassificationPipeline.fit_transform evalml.pipelines.TimeSeriesClassificationPipeline.get_component evalml.pipelines.TimeSeriesClassificationPipeline.get_hyperparameter_ranges evalml.pipelines.TimeSeriesClassificationPipeline.graph evalml.pipelines.TimeSeriesClassificationPipeline.graph_dict evalml.pipelines.TimeSeriesClassificationPipeline.graph_feature_importance evalml.pipelines.TimeSeriesClassificationPipeline.inverse_transform evalml.pipelines.TimeSeriesClassificationPipeline.load evalml.pipelines.TimeSeriesClassificationPipeline.model_family evalml.pipelines.TimeSeriesClassificationPipeline.name evalml.pipelines.TimeSeriesClassificationPipeline.new evalml.pipelines.TimeSeriesClassificationPipeline.parameters evalml.pipelines.TimeSeriesClassificationPipeline.predict evalml.pipelines.TimeSeriesClassificationPipeline.predict_in_sample evalml.pipelines.TimeSeriesClassificationPipeline.predict_proba evalml.pipelines.TimeSeriesClassificationPipeline.predict_proba_in_sample evalml.pipelines.TimeSeriesClassificationPipeline.save evalml.pipelines.TimeSeriesClassificationPipeline.score evalml.pipelines.TimeSeriesClassificationPipeline.summary evalml.pipelines.TimeSeriesClassificationPipeline.transform evalml.pipelines.TimeSeriesClassificationPipeline.transform_all_but_final .. py:method:: can_tune_threshold_with_objective(self, objective) Determine whether the threshold of a binary classification pipeline can be tuned. :param objective: Primary AutoMLSearch objective. :type objective: ObjectiveBase :returns: True if the pipeline threshold can be tuned. :rtype: bool .. py:method:: classes_(self) :property: Gets the class names for the pipeline. Will return None before pipeline is fit. .. py:method:: clone(self) Constructs a new pipeline with the same components, parameters, and random seed. :returns: A new instance of this pipeline with identical components, parameters, and random seed. .. py:method:: create_objectives(objectives) :staticmethod: Create objective instances from a list of strings or objective classes. .. py:method:: custom_name(self) :property: Custom name of the pipeline. .. py:method:: dates_needed_for_prediction(self, date) Return dates needed to forecast the given date in the future. :param date: Date to forecast in the future. :type date: pd.Timestamp :returns: Range of dates needed to forecast the given date. :rtype: dates_needed (tuple(pd.Timestamp)) .. py:method:: dates_needed_for_prediction_range(self, start_date, end_date) Return dates needed to forecast the given date in the future. :param start_date: Start date of range to forecast in the future. :type start_date: pd.Timestamp :param end_date: End date of range to forecast in the future. :type end_date: pd.Timestamp :returns: Range of dates needed to forecast the given date. :rtype: dates_needed (tuple(pd.Timestamp)) :raises ValueError: If start_date doesn't come before end_date .. py:method:: describe(self, return_dict=False) Outputs pipeline details including component parameters. :param return_dict: If True, return dictionary of information about pipeline. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None. :rtype: dict .. py:method:: feature_importance(self) :property: Importance associated with each feature. Features dropped by the feature selection are excluded. :returns: Feature names and their corresponding importance :rtype: pd.DataFrame .. py:method:: fit(self, X, y) Fit a time series classification model. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: The target training labels of length [n_samples] :type y: pd.Series, np.ndarray :returns: self :raises ValueError: If the number of unique classes in y are not appropriate for the type of pipeline. .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: get_component(self, name) Returns component by name. :param name: Name of component. :type name: str :returns: Component to return :rtype: Component .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters) Returns hyperparameter ranges from all components as a dictionary. :param custom_hyperparameters: Custom hyperparameters for the pipeline. :type custom_hyperparameters: dict :returns: Dictionary of hyperparameter ranges for each component in the pipeline. :rtype: dict .. py:method:: graph(self, filepath=None) Generate an image representing the pipeline graph. :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. :raises ValueError: If path is not writeable. .. py:method:: graph_dict(self) Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases. x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}}, "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...], "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]} :returns: A dictionary representing the DAG structure. :rtype: dag_dict (dict) .. py:method:: graph_feature_importance(self, importance_threshold=0) Generate a bar graph of the pipeline's feature importance. :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero. :type importance_threshold: float, optional :returns: A bar graph showing features and their corresponding importance. :rtype: plotly.Figure :raises ValueError: If importance threshold is not valid. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: Final component features. :type y: pd.Series :returns: The inverse transform of the target. :rtype: pd.Series .. py:method:: load(file_path: Union[str, io.BytesIO]) :staticmethod: Loads pipeline at file path. :param file_path: load filepath or a BytesIO object. :type file_path: str|BytesIO :returns: PipelineBase object .. py:method:: model_family(self) :property: Returns model family of this pipeline. .. py:method:: name(self) :property: Name of the pipeline. .. py:method:: new(self, parameters, random_seed=0) Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: A new instance of this pipeline with identical components. .. py:method:: parameters(self) :property: Parameter dictionary for this pipeline. :returns: Dictionary of all component parameters. :rtype: dict .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None) Predict on future data where target is not known. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param objective: The objective to use to make predictions. :type objective: Object or string :param X_train: Training data. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. :type y_train: pd.Series or None :raises ValueError: If X_train and/or y_train are None or if final component is not an Estimator. :returns: Predictions. .. py:method:: predict_in_sample(self, X, y, X_train, y_train, objective=None) Predict on future data where the target is known, e.g. cross validation. Note: we cast y as ints first to address boolean values that may be returned from calculating predictions which we would not be able to otherwise transform if we originally had integer targets. :param X: Future data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: Future target of shape [n_samples]. :type y: pd.Series, np.ndarray :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_features]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :param objective: Objective used to threshold predicted probabilities, optional. :type objective: ObjectiveBase, str, None :returns: Estimated labels. :rtype: pd.Series :raises ValueError: If final component is not an Estimator. .. py:method:: predict_proba(self, X, X_train=None, y_train=None) Predict on future data where the target is unknown. :param X: Future data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_features]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :returns: Estimated probabilities. :rtype: pd.Series :raises ValueError: If final component is not an Estimator. .. py:method:: predict_proba_in_sample(self, X_holdout, y_holdout, X_train, y_train) Predict on future data where the target is known, e.g. cross validation. :param X_holdout: Future data of shape [n_samples, n_features]. :type X_holdout: pd.DataFrame or np.ndarray :param y_holdout: Future target of shape [n_samples]. :type y_holdout: pd.Series, np.ndarray :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_features]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :returns: Estimated probabilities. :rtype: pd.Series :raises ValueError: If the final component is not an Estimator. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves pipeline at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None) Evaluate model performance on current and additional objectives. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: True labels of length [n_samples]. :type y: pd.Series :param objectives: Non-empty list of objectives to score on. :type objectives: list :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_features]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :returns: Ordered dictionary of objective scores. :rtype: dict .. py:method:: summary(self) :property: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder :returns: A string describing the pipeline structure. .. py:method:: transform(self, X, y=None) Transform the input. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None, calculating_residuals=False) Transforms the data by applying all pre-processing components. :param X: Input data to the pipeline to transform. :type X: pd.DataFrame :param y: Targets corresponding to the pipeline targets. :type y: pd.Series :param X_train: Training data used to generate generates from past observations. :type X_train: pd.DataFrame :param y_train: Training targets used to generate features from past observations. :type y_train: pd.Series :param calculating_residuals: Whether we're calling predict_in_sample to calculate the residuals. This means the X and y arguments are not future data, but actually the train data. :type calculating_residuals: bool :returns: New transformed features. :rtype: pd.DataFrame .. py:class:: TimeSeriesFeaturizer(time_index=None, max_delay=2, gap=0, forecast_horizon=1, conf_level=0.05, rolling_window_size=0.25, delay_features=True, delay_target=True, random_seed=0, **kwargs) Transformer that delays input features and target variable for time series problems. This component uses an algorithm based on the autocorrelation values of the target variable to determine which lags to select from the set of all possible lags. The algorithm is based on the idea that the local maxima of the autocorrelation function indicate the lags that have the most impact on the present time. The algorithm computes the autocorrelation values and finds the local maxima, called "peaks", that are significant at the given conf_level. Since lags in the range [0, 10] tend to be predictive but not local maxima, the union of the peaks is taken with the significant lags in the range [0, 10]. At the end, only selected lags in the range [0, max_delay] are used. Parametrizing the algorithm by conf_level lets the AutoMLAlgorithm tune the set of lags chosen so that the chances of finding a good set of lags is higher. Using conf_level value of 1 selects all possible lags. :param time_index: Name of the column containing the datetime information used to order the data. Ignored. :type time_index: str :param max_delay: Maximum number of time units to delay each feature. Defaults to 2. :type max_delay: int :param forecast_horizon: The number of time periods the pipeline is expected to forecast. :type forecast_horizon: int :param conf_level: Float in range (0, 1] that determines the confidence interval size used to select which lags to compute from the set of [1, max_delay]. A delay of 1 will always be computed. If 1, selects all possible lags in the set of [1, max_delay], inclusive. :type conf_level: float :param rolling_window_size: Float in range (0, 1] that determines the size of the window used for rolling features. Size is computed as rolling_window_size * max_delay. :type rolling_window_size: float :param delay_features: Whether to delay the input features. Defaults to True. :type delay_features: bool :param delay_target: Whether to delay the target. Defaults to True. :type delay_target: bool :param gap: The number of time units between when the features are collected and when the target is collected. For example, if you are predicting the next time step's target, gap=1. This is only needed because when gap=0, we need to be sure to start the lagging of the target variable at 1. Defaults to 1. :type gap: int :param random_seed: Seed for the random number generator. This transformer performs the same regardless of the random seed provided. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - Real(0.001, 1.0), "rolling_window_size": Real(0.001, 1.0)}:type: {"conf_level" * - **modifies_features** - True * - **modifies_target** - False * - **name** - Time Series Featurizer * - **needs_fitting** - True * - **target_colname_prefix** - target_delay_{} * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.TimeSeriesFeaturizer.clone evalml.pipelines.TimeSeriesFeaturizer.default_parameters evalml.pipelines.TimeSeriesFeaturizer.describe evalml.pipelines.TimeSeriesFeaturizer.fit evalml.pipelines.TimeSeriesFeaturizer.fit_transform evalml.pipelines.TimeSeriesFeaturizer.load evalml.pipelines.TimeSeriesFeaturizer.parameters evalml.pipelines.TimeSeriesFeaturizer.save evalml.pipelines.TimeSeriesFeaturizer.transform evalml.pipelines.TimeSeriesFeaturizer.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the DelayFeatureTransformer. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises ValueError: if self.time_index is None .. py:method:: fit_transform(self, X, y=None) Fit the component and transform the input data. :param X: Data to transform. :type X: pd.DataFrame :param y: Target. :type y: pd.Series, or None :returns: Transformed X. :rtype: pd.DataFrame .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Computes the delayed values and rolling means for X and y. The chosen delays are determined by the autocorrelation function of the target variable. See the class docstring for more information on how they are chosen. If y is None, all possible lags are chosen. If y is not None, it will also compute the delayed values for the target variable. The rolling means for all numeric features in X and y, if y is numeric, are also returned. :param X: Data to transform. None is expected when only the target variable is being used. :type X: pd.DataFrame or None :param y: Target. :type y: pd.Series, or None :returns: Transformed X. No original features are returned. :rtype: pd.DataFrame .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: TimeSeriesImputer(categorical_impute_strategy='forwards_fill', numeric_impute_strategy='interpolate', target_impute_strategy='forwards_fill', random_seed=0, **kwargs) Imputes missing data according to a specified timeseries-specific imputation strategy. This Transformer should be used after the `TimeSeriesRegularizer` in order to impute the missing values that were added to X and y (if passed). :param categorical_impute_strategy: Impute strategy to use for string, object, boolean, categorical dtypes. Valid values include "backwards_fill" and "forwards_fill". Defaults to "forwards_fill". :type categorical_impute_strategy: string :param numeric_impute_strategy: Impute strategy to use for numeric columns. Valid values include "backwards_fill", "forwards_fill", and "interpolate". Defaults to "interpolate". :type numeric_impute_strategy: string :param target_impute_strategy: Impute strategy to use for the target column. Valid values include "backwards_fill", "forwards_fill", and "interpolate". Defaults to "forwards_fill". :type target_impute_strategy: string :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :raises ValueError: If categorical_impute_strategy, numeric_impute_strategy, or target_impute_strategy is not one of the valid values. **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "categorical_impute_strategy": ["backwards_fill", "forwards_fill"], "numeric_impute_strategy": ["backwards_fill", "forwards_fill", "interpolate"], "target_impute_strategy": ["backwards_fill", "forwards_fill", "interpolate"],} * - **modifies_features** - True * - **modifies_target** - True * - **name** - Time Series Imputer * - **training_only** - True **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.TimeSeriesImputer.clone evalml.pipelines.TimeSeriesImputer.default_parameters evalml.pipelines.TimeSeriesImputer.describe evalml.pipelines.TimeSeriesImputer.fit evalml.pipelines.TimeSeriesImputer.fit_transform evalml.pipelines.TimeSeriesImputer.load evalml.pipelines.TimeSeriesImputer.needs_fitting evalml.pipelines.TimeSeriesImputer.parameters evalml.pipelines.TimeSeriesImputer.save evalml.pipelines.TimeSeriesImputer.transform evalml.pipelines.TimeSeriesImputer.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits imputer to data. 'None' values are converted to np.nan before imputation and are treated as the same. If a value is missing at the beginning or end of a column, that value will be imputed using backwards fill or forwards fill as necessary, respectively. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame, np.ndarray :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data X by imputing missing values using specified timeseries-specific strategies. 'None' values are converted to np.nan before imputation and are treated as the same. :param X: Data to transform. :type X: pd.DataFrame :param y: Optionally, target data to transform. :type y: pd.Series, optional :returns: Transformed X and y :rtype: pd.DataFrame .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: TimeSeriesMulticlassClassificationPipeline(component_graph, parameters=None, custom_name=None, random_seed=0) Pipeline base class for time series multiclass classification problems. :param component_graph: List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component's index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"] :type component_graph: list or dict :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as time_index, gap, and max_delay must be specified with the "pipeline" key. For example: Pipeline(parameters={"pipeline": {"time_index": "Date", "max_delay": 4, "gap": 2}}). :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int .. rubric:: Example >>> pipeline = TimeSeriesMulticlassClassificationPipeline(component_graph=["Simple Imputer", "Logistic Regression Classifier"], ... parameters={"Logistic Regression Classifier": {"penalty": "elasticnet", ... "solver": "liblinear"}, ... "pipeline": {"gap": 1, "max_delay": 1, "forecast_horizon": 1, "time_index": "date"}}, ... custom_name="My TimeSeriesMulticlass Pipeline") >>> assert pipeline.custom_name == "My TimeSeriesMulticlass Pipeline" >>> assert pipeline.component_graph.component_dict.keys() == {'Simple Imputer', 'Logistic Regression Classifier'} >>> assert pipeline.parameters == { ... 'Simple Imputer': {'impute_strategy': 'most_frequent', 'fill_value': None}, ... 'Logistic Regression Classifier': {'penalty': 'elasticnet', ... 'C': 1.0, ... 'n_jobs': -1, ... 'multi_class': 'auto', ... 'solver': 'liblinear'}, ... 'pipeline': {'gap': 1, 'max_delay': 1, 'forecast_horizon': 1, 'time_index': "date"}} **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **problem_type** - ProblemTypes.TIME_SERIES_MULTICLASS **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.can_tune_threshold_with_objective evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.classes_ evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.clone evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.create_objectives evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.custom_name evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.dates_needed_for_prediction evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.dates_needed_for_prediction_range evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.describe evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.feature_importance evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.fit evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.fit_transform evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.get_component evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.get_hyperparameter_ranges evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.graph evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.graph_dict evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.graph_feature_importance evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.inverse_transform evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.load evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.model_family evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.name evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.new evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.parameters evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.predict evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.predict_in_sample evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.predict_proba evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.predict_proba_in_sample evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.save evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.score evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.summary evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.transform evalml.pipelines.TimeSeriesMulticlassClassificationPipeline.transform_all_but_final .. py:method:: can_tune_threshold_with_objective(self, objective) Determine whether the threshold of a binary classification pipeline can be tuned. :param objective: Primary AutoMLSearch objective. :type objective: ObjectiveBase :returns: True if the pipeline threshold can be tuned. :rtype: bool .. py:method:: classes_(self) :property: Gets the class names for the pipeline. Will return None before pipeline is fit. .. py:method:: clone(self) Constructs a new pipeline with the same components, parameters, and random seed. :returns: A new instance of this pipeline with identical components, parameters, and random seed. .. py:method:: create_objectives(objectives) :staticmethod: Create objective instances from a list of strings or objective classes. .. py:method:: custom_name(self) :property: Custom name of the pipeline. .. py:method:: dates_needed_for_prediction(self, date) Return dates needed to forecast the given date in the future. :param date: Date to forecast in the future. :type date: pd.Timestamp :returns: Range of dates needed to forecast the given date. :rtype: dates_needed (tuple(pd.Timestamp)) .. py:method:: dates_needed_for_prediction_range(self, start_date, end_date) Return dates needed to forecast the given date in the future. :param start_date: Start date of range to forecast in the future. :type start_date: pd.Timestamp :param end_date: End date of range to forecast in the future. :type end_date: pd.Timestamp :returns: Range of dates needed to forecast the given date. :rtype: dates_needed (tuple(pd.Timestamp)) :raises ValueError: If start_date doesn't come before end_date .. py:method:: describe(self, return_dict=False) Outputs pipeline details including component parameters. :param return_dict: If True, return dictionary of information about pipeline. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None. :rtype: dict .. py:method:: feature_importance(self) :property: Importance associated with each feature. Features dropped by the feature selection are excluded. :returns: Feature names and their corresponding importance :rtype: pd.DataFrame .. py:method:: fit(self, X, y) Fit a time series classification model. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: The target training labels of length [n_samples] :type y: pd.Series, np.ndarray :returns: self :raises ValueError: If the number of unique classes in y are not appropriate for the type of pipeline. .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: get_component(self, name) Returns component by name. :param name: Name of component. :type name: str :returns: Component to return :rtype: Component .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters) Returns hyperparameter ranges from all components as a dictionary. :param custom_hyperparameters: Custom hyperparameters for the pipeline. :type custom_hyperparameters: dict :returns: Dictionary of hyperparameter ranges for each component in the pipeline. :rtype: dict .. py:method:: graph(self, filepath=None) Generate an image representing the pipeline graph. :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. :raises ValueError: If path is not writeable. .. py:method:: graph_dict(self) Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases. x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}}, "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...], "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]} :returns: A dictionary representing the DAG structure. :rtype: dag_dict (dict) .. py:method:: graph_feature_importance(self, importance_threshold=0) Generate a bar graph of the pipeline's feature importance. :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero. :type importance_threshold: float, optional :returns: A bar graph showing features and their corresponding importance. :rtype: plotly.Figure :raises ValueError: If importance threshold is not valid. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: Final component features. :type y: pd.Series :returns: The inverse transform of the target. :rtype: pd.Series .. py:method:: load(file_path: Union[str, io.BytesIO]) :staticmethod: Loads pipeline at file path. :param file_path: load filepath or a BytesIO object. :type file_path: str|BytesIO :returns: PipelineBase object .. py:method:: model_family(self) :property: Returns model family of this pipeline. .. py:method:: name(self) :property: Name of the pipeline. .. py:method:: new(self, parameters, random_seed=0) Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: A new instance of this pipeline with identical components. .. py:method:: parameters(self) :property: Parameter dictionary for this pipeline. :returns: Dictionary of all component parameters. :rtype: dict .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None) Predict on future data where target is not known. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param objective: The objective to use to make predictions. :type objective: Object or string :param X_train: Training data. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. :type y_train: pd.Series or None :raises ValueError: If X_train and/or y_train are None or if final component is not an Estimator. :returns: Predictions. .. py:method:: predict_in_sample(self, X, y, X_train, y_train, objective=None) Predict on future data where the target is known, e.g. cross validation. Note: we cast y as ints first to address boolean values that may be returned from calculating predictions which we would not be able to otherwise transform if we originally had integer targets. :param X: Future data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: Future target of shape [n_samples]. :type y: pd.Series, np.ndarray :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_features]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :param objective: Objective used to threshold predicted probabilities, optional. :type objective: ObjectiveBase, str, None :returns: Estimated labels. :rtype: pd.Series :raises ValueError: If final component is not an Estimator. .. py:method:: predict_proba(self, X, X_train=None, y_train=None) Predict on future data where the target is unknown. :param X: Future data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_features]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :returns: Estimated probabilities. :rtype: pd.Series :raises ValueError: If final component is not an Estimator. .. py:method:: predict_proba_in_sample(self, X_holdout, y_holdout, X_train, y_train) Predict on future data where the target is known, e.g. cross validation. :param X_holdout: Future data of shape [n_samples, n_features]. :type X_holdout: pd.DataFrame or np.ndarray :param y_holdout: Future target of shape [n_samples]. :type y_holdout: pd.Series, np.ndarray :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_features]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :returns: Estimated probabilities. :rtype: pd.Series :raises ValueError: If the final component is not an Estimator. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves pipeline at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None) Evaluate model performance on current and additional objectives. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: True labels of length [n_samples]. :type y: pd.Series :param objectives: Non-empty list of objectives to score on. :type objectives: list :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_features]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :returns: Ordered dictionary of objective scores. :rtype: dict .. py:method:: summary(self) :property: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder :returns: A string describing the pipeline structure. .. py:method:: transform(self, X, y=None) Transform the input. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None, calculating_residuals=False) Transforms the data by applying all pre-processing components. :param X: Input data to the pipeline to transform. :type X: pd.DataFrame :param y: Targets corresponding to the pipeline targets. :type y: pd.Series :param X_train: Training data used to generate generates from past observations. :type X_train: pd.DataFrame :param y_train: Training targets used to generate features from past observations. :type y_train: pd.Series :param calculating_residuals: Whether we're calling predict_in_sample to calculate the residuals. This means the X and y arguments are not future data, but actually the train data. :type calculating_residuals: bool :returns: New transformed features. :rtype: pd.DataFrame .. py:class:: TimeSeriesRegressionPipeline(component_graph, parameters=None, custom_name=None, random_seed=0) Pipeline base class for time series regression problems. :param component_graph: ComponentGraph instance, list of components in order, or dictionary of components. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component's index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"] :type component_graph: ComponentGraph, list, dict :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as time_index, gap, and max_delay must be specified with the "pipeline" key. For example: Pipeline(parameters={"pipeline": {"time_index": "Date", "max_delay": 4, "gap": 2}}). :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int .. rubric:: Example >>> pipeline = TimeSeriesRegressionPipeline(component_graph=["Simple Imputer", "Linear Regressor"], ... parameters={"Simple Imputer": {"impute_strategy": "mean"}, ... "pipeline": {"gap": 1, "max_delay": 1, "forecast_horizon": 1, "time_index": "date"}}, ... custom_name="My TimeSeriesRegression Pipeline") ... >>> assert pipeline.custom_name == "My TimeSeriesRegression Pipeline" >>> assert pipeline.component_graph.component_dict.keys() == {'Simple Imputer', 'Linear Regressor'} The pipeline parameters will be chosen from the default parameters for every component, unless specific parameters were passed in as they were above. >>> assert pipeline.parameters == { ... 'Simple Imputer': {'impute_strategy': 'mean', 'fill_value': None}, ... 'Linear Regressor': {'fit_intercept': True, 'n_jobs': -1}, ... 'pipeline': {'gap': 1, 'max_delay': 1, 'forecast_horizon': 1, 'time_index': "date"}} **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **NO_PREDS_PI_ESTIMATORS** - ProblemTypes.TIME_SERIES_REGRESSION * - **problem_type** - None **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.TimeSeriesRegressionPipeline.can_tune_threshold_with_objective evalml.pipelines.TimeSeriesRegressionPipeline.clone evalml.pipelines.TimeSeriesRegressionPipeline.create_objectives evalml.pipelines.TimeSeriesRegressionPipeline.custom_name evalml.pipelines.TimeSeriesRegressionPipeline.dates_needed_for_prediction evalml.pipelines.TimeSeriesRegressionPipeline.dates_needed_for_prediction_range evalml.pipelines.TimeSeriesRegressionPipeline.describe evalml.pipelines.TimeSeriesRegressionPipeline.feature_importance evalml.pipelines.TimeSeriesRegressionPipeline.fit evalml.pipelines.TimeSeriesRegressionPipeline.fit_transform evalml.pipelines.TimeSeriesRegressionPipeline.get_component evalml.pipelines.TimeSeriesRegressionPipeline.get_forecast_period evalml.pipelines.TimeSeriesRegressionPipeline.get_forecast_predictions evalml.pipelines.TimeSeriesRegressionPipeline.get_hyperparameter_ranges evalml.pipelines.TimeSeriesRegressionPipeline.get_prediction_intervals evalml.pipelines.TimeSeriesRegressionPipeline.graph evalml.pipelines.TimeSeriesRegressionPipeline.graph_dict evalml.pipelines.TimeSeriesRegressionPipeline.graph_feature_importance evalml.pipelines.TimeSeriesRegressionPipeline.inverse_transform evalml.pipelines.TimeSeriesRegressionPipeline.load evalml.pipelines.TimeSeriesRegressionPipeline.model_family evalml.pipelines.TimeSeriesRegressionPipeline.name evalml.pipelines.TimeSeriesRegressionPipeline.new evalml.pipelines.TimeSeriesRegressionPipeline.parameters evalml.pipelines.TimeSeriesRegressionPipeline.predict evalml.pipelines.TimeSeriesRegressionPipeline.predict_in_sample evalml.pipelines.TimeSeriesRegressionPipeline.save evalml.pipelines.TimeSeriesRegressionPipeline.score evalml.pipelines.TimeSeriesRegressionPipeline.summary evalml.pipelines.TimeSeriesRegressionPipeline.transform evalml.pipelines.TimeSeriesRegressionPipeline.transform_all_but_final .. py:method:: can_tune_threshold_with_objective(self, objective) Determine whether the threshold of a binary classification pipeline can be tuned. :param objective: Primary AutoMLSearch objective. :type objective: ObjectiveBase :returns: True if the pipeline threshold can be tuned. :rtype: bool .. py:method:: clone(self) Constructs a new pipeline with the same components, parameters, and random seed. :returns: A new instance of this pipeline with identical components, parameters, and random seed. .. py:method:: create_objectives(objectives) :staticmethod: Create objective instances from a list of strings or objective classes. .. py:method:: custom_name(self) :property: Custom name of the pipeline. .. py:method:: dates_needed_for_prediction(self, date) Return dates needed to forecast the given date in the future. :param date: Date to forecast in the future. :type date: pd.Timestamp :returns: Range of dates needed to forecast the given date. :rtype: dates_needed (tuple(pd.Timestamp)) .. py:method:: dates_needed_for_prediction_range(self, start_date, end_date) Return dates needed to forecast the given date in the future. :param start_date: Start date of range to forecast in the future. :type start_date: pd.Timestamp :param end_date: End date of range to forecast in the future. :type end_date: pd.Timestamp :returns: Range of dates needed to forecast the given date. :rtype: dates_needed (tuple(pd.Timestamp)) :raises ValueError: If start_date doesn't come before end_date .. py:method:: describe(self, return_dict=False) Outputs pipeline details including component parameters. :param return_dict: If True, return dictionary of information about pipeline. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None. :rtype: dict .. py:method:: feature_importance(self) :property: Importance associated with each feature. Features dropped by the feature selection are excluded. :returns: Feature names and their corresponding importance :rtype: pd.DataFrame .. py:method:: fit(self, X, y) Fit a time series pipeline. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: The target training targets of length [n_samples]. :type y: pd.Series, np.ndarray :returns: self :raises ValueError: If the target is not numeric. .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: get_component(self, name) Returns component by name. :param name: Name of component. :type name: str :returns: Component to return :rtype: Component .. py:method:: get_forecast_period(self, X) Generates all possible forecasting time points based on latest data point in X. :param X: Data the pipeline was trained on of shape [n_samples_train, n_feautures]. :type X: pd.DataFrame, np.ndarray :raises ValueError: If pipeline is not trained. :returns: Datetime periods out to `forecast_horizon + gap`. :rtype: pd.Series .. rubric:: Example >>> X = pd.DataFrame({'date': pd.date_range(start='1-1-2022', periods=10, freq='D'), 'feature': range(10, 20)}) >>> y = pd.Series(range(0, 10), name='target') >>> gap = 1 >>> forecast_horizon = 2 >>> pipeline = TimeSeriesRegressionPipeline(component_graph=["Linear Regressor"], ... parameters={"Simple Imputer": {"impute_strategy": "mean"}, ... "pipeline": {"gap": gap, "max_delay": 1, "forecast_horizon": forecast_horizon, "time_index": "date"}}, ... ) >>> pipeline.fit(X, y) pipeline = TimeSeriesRegressionPipeline(component_graph={'Linear Regressor': ['Linear Regressor', 'X', 'y']}, parameters={'Linear Regressor':{'fit_intercept': True, 'n_jobs': -1}, 'pipeline':{'gap': 1, 'max_delay': 1, 'forecast_horizon': 2, 'time_index': 'date'}}, random_seed=0) >>> dates = pipeline.get_forecast_period(X) >>> expected = pd.Series(pd.date_range(start='2022-01-11', periods=(gap + forecast_horizon), freq='D'), name='date', index=[10, 11, 12]) >>> assert dates.equals(expected) .. py:method:: get_forecast_predictions(self, X, y) Generates all possible forecasting predictions based on last period of X. :param X: Data the pipeline was trained on of shape [n_samples_train, n_feautures]. :type X: pd.DataFrame, np.ndarray :param y: Targets used to train the pipeline of shape [n_samples_train]. :type y: pd.Series, np.ndarray :returns: Predictions out to `forecast_horizon + gap` periods. .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters) Returns hyperparameter ranges from all components as a dictionary. :param custom_hyperparameters: Custom hyperparameters for the pipeline. :type custom_hyperparameters: dict :returns: Dictionary of hyperparameter ranges for each component in the pipeline. :rtype: dict .. py:method:: get_prediction_intervals(self, X, y=None, X_train=None, y_train=None, coverage=None) Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. Certain estimators (Extra Trees Estimator, XGBoost Estimator, Prophet Estimator, ARIMA, and Exponential Smoothing estimator) utilize a different methodology to calculate prediction intervals. See the docs for these estimators to learn more. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_features]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: graph(self, filepath=None) Generate an image representing the pipeline graph. :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. :raises ValueError: If path is not writeable. .. py:method:: graph_dict(self) Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases. x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}}, "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...], "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]} :returns: A dictionary representing the DAG structure. :rtype: dag_dict (dict) .. py:method:: graph_feature_importance(self, importance_threshold=0) Generate a bar graph of the pipeline's feature importance. :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero. :type importance_threshold: float, optional :returns: A bar graph showing features and their corresponding importance. :rtype: plotly.Figure :raises ValueError: If importance threshold is not valid. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: Final component features. :type y: pd.Series :returns: The inverse transform of the target. :rtype: pd.Series .. py:method:: load(file_path: Union[str, io.BytesIO]) :staticmethod: Loads pipeline at file path. :param file_path: load filepath or a BytesIO object. :type file_path: str|BytesIO :returns: PipelineBase object .. py:method:: model_family(self) :property: Returns model family of this pipeline. .. py:method:: name(self) :property: Name of the pipeline. .. py:method:: new(self, parameters, random_seed=0) Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: A new instance of this pipeline with identical components. .. py:method:: parameters(self) :property: Parameter dictionary for this pipeline. :returns: Dictionary of all component parameters. :rtype: dict .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None) Predict on future data where target is not known. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param objective: The objective to use to make predictions. :type objective: Object or string :param X_train: Training data. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. :type y_train: pd.Series or None :raises ValueError: If X_train and/or y_train are None or if final component is not an Estimator. :returns: Predictions. .. py:method:: predict_in_sample(self, X, y, X_train, y_train, objective=None, calculating_residuals=False) Predict on future data where the target is known, e.g. cross validation. :param X: Future data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: Future target of shape [n_samples] :type y: pd.Series, np.ndarray :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_feautures] :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train] :type y_train: pd.Series, np.ndarray :param objective: Objective used to threshold predicted probabilities, optional. :type objective: ObjectiveBase, str, None :param calculating_residuals: Whether we're calling predict_in_sample to calculate the residuals. This means the X and y arguments are not future data, but actually the train data. :type calculating_residuals: bool :returns: Estimated labels. :rtype: pd.Series :raises ValueError: If final component is not an Estimator. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves pipeline at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None) Evaluate model performance on current and additional objectives. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: True labels of length [n_samples]. :type y: pd.Series :param objectives: Non-empty list of objectives to score on. :type objectives: list :param X_train: Data the pipeline was trained on of shape [n_samples_train, n_feautures]. :type X_train: pd.DataFrame, np.ndarray :param y_train: Targets used to train the pipeline of shape [n_samples_train]. :type y_train: pd.Series, np.ndarray :returns: Ordered dictionary of objective scores. :rtype: dict .. py:method:: summary(self) :property: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder :returns: A string describing the pipeline structure. .. py:method:: transform(self, X, y=None) Transform the input. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None, calculating_residuals=False) Transforms the data by applying all pre-processing components. :param X: Input data to the pipeline to transform. :type X: pd.DataFrame :param y: Targets corresponding to the pipeline targets. :type y: pd.Series :param X_train: Training data used to generate generates from past observations. :type X_train: pd.DataFrame :param y_train: Training targets used to generate features from past observations. :type y_train: pd.Series :param calculating_residuals: Whether we're calling predict_in_sample to calculate the residuals. This means the X and y arguments are not future data, but actually the train data. :type calculating_residuals: bool :returns: New transformed features. :rtype: pd.DataFrame .. py:class:: TimeSeriesRegularizer(time_index=None, frequency_payload=None, window_length=4, threshold=0.4, random_seed=0, **kwargs) Transformer that regularizes an inconsistently spaced datetime column. If X is passed in to fit/transform, the column `time_index` will be checked for an inferrable offset frequency. If the `time_index` column is perfectly inferrable then this Transformer will do nothing and return the original X and y. If X does not have a perfectly inferrable frequency but one can be estimated, then X and y will be reformatted based on the estimated frequency for `time_index`. In the original X and y passed: - Missing datetime values will be added and will have their corresponding columns in X and y set to None. - Duplicate datetime values will be dropped. - Extra datetime values will be dropped. - If it can be determined that a duplicate or extra value is misaligned, then it will be repositioned to take the place of a missing value. This Transformer should be used before the `TimeSeriesImputer` in order to impute the missing values that were added to X and y (if passed). :param time_index: Name of the column containing the datetime information used to order the data, required. Defaults to None. :type time_index: string :param frequency_payload: Payload returned from Woodwork's infer_frequency function where debug is True. Defaults to None. :type frequency_payload: tuple :param window_length: The size of the rolling window over which inference is conducted to determine the prevalence of uninferrable frequencies. :type window_length: int :param Lower values make this component more sensitive to recognizing numerous faulty datetime values. Defaults to 5.: :param threshold: The minimum percentage of windows that need to have been able to infer a frequency. Lower values make this component more :type threshold: float :param sensitive to recognizing numerous faulty datetime values. Defaults to 0.8.: :param random_seed: Seed for the random number generator. This transformer performs the same regardless of the random seed provided. :type random_seed: int :param Defaults to 0.: :raises ValueError: if the frequency_payload parameter has not been passed a tuple **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - True * - **name** - Time Series Regularizer * - **training_only** - True **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.TimeSeriesRegularizer.clone evalml.pipelines.TimeSeriesRegularizer.default_parameters evalml.pipelines.TimeSeriesRegularizer.describe evalml.pipelines.TimeSeriesRegularizer.fit evalml.pipelines.TimeSeriesRegularizer.fit_transform evalml.pipelines.TimeSeriesRegularizer.load evalml.pipelines.TimeSeriesRegularizer.needs_fitting evalml.pipelines.TimeSeriesRegularizer.parameters evalml.pipelines.TimeSeriesRegularizer.save evalml.pipelines.TimeSeriesRegularizer.transform evalml.pipelines.TimeSeriesRegularizer.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the TimeSeriesRegularizer. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self :raises ValueError: if self.time_index is None, if X and y have different lengths, if `time_index` in X does not have an offset frequency that can be estimated :raises TypeError: if the `time_index` column is not of type Datetime :raises KeyError: if the `time_index` column doesn't exist .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Regularizes a dataframe and target data to an inferrable offset frequency. A 'clean' X and y (if y was passed in) are created based on an inferrable offset frequency and matching datetime values with the original X and y are imputed into the clean X and y. Datetime values identified as misaligned are shifted into their appropriate position. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Data with an inferrable `time_index` offset frequency. :rtype: (pd.DataFrame, pd.Series) .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: Transformer(parameters=None, component_obj=None, random_seed=0, **kwargs) A component that may or may not need fitting that transforms data. These components are used before an estimator. To implement a new Transformer, define your own class which is a subclass of Transformer, including a name and a list of acceptable ranges for any parameters to be tuned during the automl search (hyperparameters). Define an `__init__` method which sets up any necessary state and objects. Make sure your `__init__` only uses standard keyword arguments and calls `super().__init__()` with a parameters dict. You may also override the `fit`, `transform`, `fit_transform` and other methods in this class if appropriate. To see some examples, check out the definitions of any Transformer component. :param parameters: Dictionary of parameters for the component. Defaults to None. :type parameters: dict :param component_obj: Third-party objects useful in component implementation. Defaults to None. :type component_obj: obj :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **modifies_features** - True * - **modifies_target** - False * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.Transformer.clone evalml.pipelines.Transformer.default_parameters evalml.pipelines.Transformer.describe evalml.pipelines.Transformer.fit evalml.pipelines.Transformer.fit_transform evalml.pipelines.Transformer.load evalml.pipelines.Transformer.name evalml.pipelines.Transformer.needs_fitting evalml.pipelines.Transformer.parameters evalml.pipelines.Transformer.save evalml.pipelines.Transformer.transform evalml.pipelines.Transformer.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises MethodPropertyNotFoundError: If component does not have a fit method or a component_obj that implements fit. .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: name(cls) :property: Returns string name of this component. .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) :abstractmethod: Transforms data X. :param X: Data to transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: VowpalWabbitBinaryClassifier(loss_function='logistic', learning_rate=0.5, decay_learning_rate=1.0, power_t=0.5, passes=1, random_seed=0, **kwargs) Vowpal Wabbit Binary Classifier. :param loss_function: Specifies the loss function to use. One of {"squared", "classic", "hinge", "logistic", "quantile"}. Defaults to "logistic". :type loss_function: str :param learning_rate: Boosting learning rate. Defaults to 0.5. :type learning_rate: float :param decay_learning_rate: Decay factor for learning_rate. Defaults to 1.0. :type decay_learning_rate: float :param power_t: Power on learning rate decay. Defaults to 0.5. :type power_t: float :param passes: Number of training passes. Defaults to 1. :type passes: int :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - None * - **model_family** - ModelFamily.VOWPAL_WABBIT * - **modifies_features** - True * - **modifies_target** - False * - **name** - Vowpal Wabbit Binary Classifier * - **supported_problem_types** - [ ProblemTypes.BINARY, ProblemTypes.TIME_SERIES_BINARY,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.VowpalWabbitBinaryClassifier.clone evalml.pipelines.VowpalWabbitBinaryClassifier.default_parameters evalml.pipelines.VowpalWabbitBinaryClassifier.describe evalml.pipelines.VowpalWabbitBinaryClassifier.feature_importance evalml.pipelines.VowpalWabbitBinaryClassifier.fit evalml.pipelines.VowpalWabbitBinaryClassifier.get_prediction_intervals evalml.pipelines.VowpalWabbitBinaryClassifier.load evalml.pipelines.VowpalWabbitBinaryClassifier.needs_fitting evalml.pipelines.VowpalWabbitBinaryClassifier.parameters evalml.pipelines.VowpalWabbitBinaryClassifier.predict evalml.pipelines.VowpalWabbitBinaryClassifier.predict_proba evalml.pipelines.VowpalWabbitBinaryClassifier.save evalml.pipelines.VowpalWabbitBinaryClassifier.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Feature importance for Vowpal Wabbit classifiers. This is not implemented. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: VowpalWabbitMulticlassClassifier(loss_function='logistic', learning_rate=0.5, decay_learning_rate=1.0, power_t=0.5, passes=1, random_seed=0, **kwargs) Vowpal Wabbit Multiclass Classifier. :param loss_function: Specifies the loss function to use. One of {"squared", "classic", "hinge", "logistic", "quantile"}. Defaults to "logistic". :type loss_function: str :param learning_rate: Boosting learning rate. Defaults to 0.5. :type learning_rate: float :param decay_learning_rate: Decay factor for learning_rate. Defaults to 1.0. :type decay_learning_rate: float :param power_t: Power on learning rate decay. Defaults to 0.5. :type power_t: float :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - None * - **model_family** - ModelFamily.VOWPAL_WABBIT * - **modifies_features** - True * - **modifies_target** - False * - **name** - Vowpal Wabbit Multiclass Classifier * - **supported_problem_types** - [ ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_MULTICLASS,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.VowpalWabbitMulticlassClassifier.clone evalml.pipelines.VowpalWabbitMulticlassClassifier.default_parameters evalml.pipelines.VowpalWabbitMulticlassClassifier.describe evalml.pipelines.VowpalWabbitMulticlassClassifier.feature_importance evalml.pipelines.VowpalWabbitMulticlassClassifier.fit evalml.pipelines.VowpalWabbitMulticlassClassifier.get_prediction_intervals evalml.pipelines.VowpalWabbitMulticlassClassifier.load evalml.pipelines.VowpalWabbitMulticlassClassifier.needs_fitting evalml.pipelines.VowpalWabbitMulticlassClassifier.parameters evalml.pipelines.VowpalWabbitMulticlassClassifier.predict evalml.pipelines.VowpalWabbitMulticlassClassifier.predict_proba evalml.pipelines.VowpalWabbitMulticlassClassifier.save evalml.pipelines.VowpalWabbitMulticlassClassifier.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Feature importance for Vowpal Wabbit classifiers. This is not implemented. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: VowpalWabbitRegressor(learning_rate=0.5, decay_learning_rate=1.0, power_t=0.5, passes=1, random_seed=0, **kwargs) Vowpal Wabbit Regressor. :param learning_rate: Boosting learning rate. Defaults to 0.5. :type learning_rate: float :param decay_learning_rate: Decay factor for learning_rate. Defaults to 1.0. :type decay_learning_rate: float :param power_t: Power on learning rate decay. Defaults to 0.5. :type power_t: float :param passes: Number of training passes. Defaults to 1. :type passes: int :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - None * - **model_family** - ModelFamily.VOWPAL_WABBIT * - **modifies_features** - True * - **modifies_target** - False * - **name** - Vowpal Wabbit Regressor * - **supported_problem_types** - [ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.VowpalWabbitRegressor.clone evalml.pipelines.VowpalWabbitRegressor.default_parameters evalml.pipelines.VowpalWabbitRegressor.describe evalml.pipelines.VowpalWabbitRegressor.feature_importance evalml.pipelines.VowpalWabbitRegressor.fit evalml.pipelines.VowpalWabbitRegressor.get_prediction_intervals evalml.pipelines.VowpalWabbitRegressor.load evalml.pipelines.VowpalWabbitRegressor.needs_fitting evalml.pipelines.VowpalWabbitRegressor.parameters evalml.pipelines.VowpalWabbitRegressor.predict evalml.pipelines.VowpalWabbitRegressor.predict_proba evalml.pipelines.VowpalWabbitRegressor.save evalml.pipelines.VowpalWabbitRegressor.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Feature importance for Vowpal Wabbit regressor. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits estimator to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict method or a component_obj that implements predict. .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: XGBoostClassifier(eta=0.1, max_depth=6, min_child_weight=1, n_estimators=100, random_seed=0, eval_metric='logloss', n_jobs=12, **kwargs) XGBoost Classifier. :param eta: Boosting learning rate. Defaults to 0.1. :type eta: float :param max_depth: Maximum tree depth for base learners. Defaults to 6. :type max_depth: int :param min_child_weight: Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0 :type min_child_weight: float :param n_estimators: Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100. :type n_estimators: int :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :param n_jobs: Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to 12. :type n_jobs: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "eta": Real(0.000001, 1), "max_depth": Integer(1, 10), "min_child_weight": Real(1, 10), "n_estimators": Integer(1, 1000),} * - **model_family** - ModelFamily.XGBOOST * - **modifies_features** - True * - **modifies_target** - False * - **name** - XGBoost Classifier * - **SEED_MAX** - None * - **SEED_MIN** - None * - **supported_problem_types** - [ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.XGBoostClassifier.clone evalml.pipelines.XGBoostClassifier.default_parameters evalml.pipelines.XGBoostClassifier.describe evalml.pipelines.XGBoostClassifier.feature_importance evalml.pipelines.XGBoostClassifier.fit evalml.pipelines.XGBoostClassifier.get_prediction_intervals evalml.pipelines.XGBoostClassifier.load evalml.pipelines.XGBoostClassifier.needs_fitting evalml.pipelines.XGBoostClassifier.parameters evalml.pipelines.XGBoostClassifier.predict evalml.pipelines.XGBoostClassifier.predict_proba evalml.pipelines.XGBoostClassifier.save evalml.pipelines.XGBoostClassifier.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) :property: Feature importance of fitted XGBoost classifier. .. py:method:: fit(self, X, y=None) Fits XGBoost classifier component to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted regressor. This function takes the predictions of the fitted estimator and calculates the rolling standard deviation across all predictions using a window size of 5. The lower and upper predictions are determined by taking the percent point (quantile) function of the lower tail probability at each bound multiplied by the rolling standard deviation. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: list[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict :raises MethodPropertyNotFoundError: If the estimator does not support Time Series Regression as a problem type. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X) Make predictions using the fitted XGBoost classifier. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.DataFrame .. py:method:: predict_proba(self, X) Make predictions using the fitted CatBoost classifier. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.DataFrame .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional .. py:class:: XGBoostRegressor(eta: float = 0.1, max_depth: int = 6, min_child_weight: int = 1, n_estimators: int = 100, random_seed: Union[int, float] = 0, n_jobs: int = 12, **kwargs) XGBoost Regressor. :param eta: Boosting learning rate. Defaults to 0.1. :type eta: float :param max_depth: Maximum tree depth for base learners. Defaults to 6. :type max_depth: int :param min_child_weight: Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0 :type min_child_weight: float :param n_estimators: Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100. :type n_estimators: int :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :param n_jobs: Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to 12. :type n_jobs: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "eta": Real(0.000001, 1), "max_depth": Integer(1, 20), "min_child_weight": Real(1, 10), "n_estimators": Integer(1, 1000),} * - **model_family** - ModelFamily.XGBOOST * - **modifies_features** - True * - **modifies_target** - False * - **name** - XGBoost Regressor * - **SEED_MAX** - None * - **SEED_MIN** - None * - **supported_problem_types** - [ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,] * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.XGBoostRegressor.clone evalml.pipelines.XGBoostRegressor.default_parameters evalml.pipelines.XGBoostRegressor.describe evalml.pipelines.XGBoostRegressor.feature_importance evalml.pipelines.XGBoostRegressor.fit evalml.pipelines.XGBoostRegressor.get_prediction_intervals evalml.pipelines.XGBoostRegressor.load evalml.pipelines.XGBoostRegressor.needs_fitting evalml.pipelines.XGBoostRegressor.parameters evalml.pipelines.XGBoostRegressor.predict evalml.pipelines.XGBoostRegressor.predict_proba evalml.pipelines.XGBoostRegressor.save evalml.pipelines.XGBoostRegressor.update_parameters .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: feature_importance(self) -> pandas.Series :property: Feature importance of fitted XGBoost regressor. .. py:method:: fit(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None) Fits XGBoost regressor component to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_prediction_intervals(self, X: pandas.DataFrame, y: Optional[pandas.Series] = None, coverage: List[float] = None, predictions: pandas.Series = None) -> Dict[str, pandas.Series] Find the prediction intervals using the fitted XGBoostRegressor. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series :param coverage: A list of floats between the values 0 and 1 that the upper and lower bounds of the prediction interval should be calculated for. :type coverage: List[float] :param predictions: Optional list of predictions to use. If None, will generate predictions using `X`. :type predictions: pd.Series :returns: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. :rtype: dict .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: predict(self, X: pandas.DataFrame) -> pandas.Series Make predictions using fitted XGBoost regressor. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series .. py:method:: predict_proba(self, X: pandas.DataFrame) -> pandas.Series Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: pd.Series :raises MethodPropertyNotFoundError: If estimator does not have a predict_proba method or a component_obj that implements predict_proba. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: update_parameters(self, update_dict, reset_fit=True) Updates the parameter dictionary of the component. :param update_dict: A dict of parameters to update. :type update_dict: dict :param reset_fit: If True, will set `_is_fitted` to False. :type reset_fit: bool, optional