transformers ================================================== .. py:module:: evalml.pipelines.components.transformers .. autoapi-nested-parse:: Components that transform data. Subpackages ----------- .. toctree:: :titlesonly: :maxdepth: 3 dimensionality_reduction/index.rst encoders/index.rst feature_selection/index.rst imputers/index.rst preprocessing/index.rst samplers/index.rst scalers/index.rst Submodules ---------- .. toctree:: :titlesonly: :maxdepth: 1 column_selectors/index.rst transformer/index.rst Package Contents ---------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.pipelines.components.transformers.DateTimeFeaturizer evalml.pipelines.components.transformers.DFSTransformer evalml.pipelines.components.transformers.DropColumns evalml.pipelines.components.transformers.DropNaNRowsTransformer evalml.pipelines.components.transformers.DropNullColumns evalml.pipelines.components.transformers.DropRowsTransformer evalml.pipelines.components.transformers.EmailFeaturizer evalml.pipelines.components.transformers.FeatureSelector evalml.pipelines.components.transformers.Imputer evalml.pipelines.components.transformers.LabelEncoder evalml.pipelines.components.transformers.LinearDiscriminantAnalysis evalml.pipelines.components.transformers.LogTransformer evalml.pipelines.components.transformers.LSA evalml.pipelines.components.transformers.NaturalLanguageFeaturizer evalml.pipelines.components.transformers.OneHotEncoder evalml.pipelines.components.transformers.OrdinalEncoder evalml.pipelines.components.transformers.Oversampler evalml.pipelines.components.transformers.PCA evalml.pipelines.components.transformers.PerColumnImputer evalml.pipelines.components.transformers.PolynomialDecomposer evalml.pipelines.components.transformers.ReplaceNullableTypes evalml.pipelines.components.transformers.RFClassifierSelectFromModel evalml.pipelines.components.transformers.RFRegressorSelectFromModel evalml.pipelines.components.transformers.SelectByType evalml.pipelines.components.transformers.SelectColumns evalml.pipelines.components.transformers.SimpleImputer evalml.pipelines.components.transformers.StandardScaler evalml.pipelines.components.transformers.STLDecomposer evalml.pipelines.components.transformers.TargetEncoder evalml.pipelines.components.transformers.TargetImputer evalml.pipelines.components.transformers.TimeSeriesFeaturizer evalml.pipelines.components.transformers.TimeSeriesImputer evalml.pipelines.components.transformers.TimeSeriesRegularizer evalml.pipelines.components.transformers.Transformer evalml.pipelines.components.transformers.Undersampler evalml.pipelines.components.transformers.URLFeaturizer Contents ~~~~~~~~~~~~~~~~~~~ .. py:class:: DateTimeFeaturizer(features_to_extract=None, encode_as_categories=False, time_index=None, random_seed=0, **kwargs) Transformer that can automatically extract features from datetime columns. :param features_to_extract: List of features to extract. Valid options include "year", "month", "day_of_week", "hour". Defaults to None. :type features_to_extract: list :param encode_as_categories: Whether day-of-week and month features should be encoded as pandas "category" dtype. This allows OneHotEncoders to encode these features. Defaults to False. :type encode_as_categories: bool :param time_index: Name of the column containing the datetime information used to order the data. Ignored. :type time_index: str :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - DateTime Featurizer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.DateTimeFeaturizer.clone evalml.pipelines.components.transformers.DateTimeFeaturizer.default_parameters evalml.pipelines.components.transformers.DateTimeFeaturizer.describe evalml.pipelines.components.transformers.DateTimeFeaturizer.fit evalml.pipelines.components.transformers.DateTimeFeaturizer.fit_transform evalml.pipelines.components.transformers.DateTimeFeaturizer.get_feature_names evalml.pipelines.components.transformers.DateTimeFeaturizer.load evalml.pipelines.components.transformers.DateTimeFeaturizer.needs_fitting evalml.pipelines.components.transformers.DateTimeFeaturizer.parameters evalml.pipelines.components.transformers.DateTimeFeaturizer.save evalml.pipelines.components.transformers.DateTimeFeaturizer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fit the datetime featurizer component. :param X: Input features. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: get_feature_names(self) Gets the categories of each datetime feature. :returns: Dictionary, where each key-value pair is a column name and a dictionary mapping the unique feature values to their integer encoding. :rtype: dict .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data X by creating new features using existing DateTime columns, and then dropping those DateTime columns. :param X: Input features. :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame .. py:class:: DFSTransformer(index='index', features=None, random_seed=0, **kwargs) Featuretools DFS component that generates features for the input features. :param index: The name of the column that contains the indices. If no column with this name exists, then featuretools.EntitySet() creates a column with this name to serve as the index column. Defaults to 'index'. :type index: string :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :param features: List of features to run DFS on. Defaults to None. Features will only be computed if the columns used by the feature exist in the input and if the feature itself is not in input. :type features: list **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - DFS Transformer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.DFSTransformer.clone evalml.pipelines.components.transformers.DFSTransformer.default_parameters evalml.pipelines.components.transformers.DFSTransformer.describe evalml.pipelines.components.transformers.DFSTransformer.fit evalml.pipelines.components.transformers.DFSTransformer.fit_transform evalml.pipelines.components.transformers.DFSTransformer.load evalml.pipelines.components.transformers.DFSTransformer.needs_fitting evalml.pipelines.components.transformers.DFSTransformer.parameters evalml.pipelines.components.transformers.DFSTransformer.save evalml.pipelines.components.transformers.DFSTransformer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the DFSTransformer Transformer component. :param X: The input data to transform, of shape [n_samples, n_features]. :type X: pd.DataFrame, np.array :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Computes the feature matrix for the input X using featuretools' dfs algorithm. :param X: The input training data to transform. Has shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: Ignored. :type y: pd.Series, optional :returns: Feature matrix :rtype: pd.DataFrame .. py:class:: DropColumns(columns=None, random_seed=0, **kwargs) Drops specified columns in input data. :param columns: List of column names, used to determine which columns to drop. :type columns: list(string) :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Drop Columns Transformer * - **needs_fitting** - False * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.DropColumns.clone evalml.pipelines.components.transformers.DropColumns.default_parameters evalml.pipelines.components.transformers.DropColumns.describe evalml.pipelines.components.transformers.DropColumns.fit evalml.pipelines.components.transformers.DropColumns.fit_transform evalml.pipelines.components.transformers.DropColumns.load evalml.pipelines.components.transformers.DropColumns.parameters evalml.pipelines.components.transformers.DropColumns.save evalml.pipelines.components.transformers.DropColumns.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the transformer by checking if column names are present in the dataset. :param X: Data to check. :type X: pd.DataFrame :param y: Targets. :type y: pd.Series, ignored :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data X by dropping columns. :param X: Data to transform. :type X: pd.DataFrame :param y: Targets. :type y: pd.Series, optional :returns: Transformed X. :rtype: pd.DataFrame .. py:class:: DropNaNRowsTransformer(parameters=None, component_obj=None, random_seed=0, **kwargs) Transformer to drop rows with NaN values. :param random_seed: Seed for the random number generator. Is not used by this component. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - True * - **name** - Drop NaN Rows Transformer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.DropNaNRowsTransformer.clone evalml.pipelines.components.transformers.DropNaNRowsTransformer.default_parameters evalml.pipelines.components.transformers.DropNaNRowsTransformer.describe evalml.pipelines.components.transformers.DropNaNRowsTransformer.fit evalml.pipelines.components.transformers.DropNaNRowsTransformer.fit_transform evalml.pipelines.components.transformers.DropNaNRowsTransformer.load evalml.pipelines.components.transformers.DropNaNRowsTransformer.needs_fitting evalml.pipelines.components.transformers.DropNaNRowsTransformer.parameters evalml.pipelines.components.transformers.DropNaNRowsTransformer.save evalml.pipelines.components.transformers.DropNaNRowsTransformer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data using fitted component. :param X: Features. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series, optional :returns: Data with NaN rows dropped. :rtype: (pd.DataFrame, pd.Series) .. py:class:: DropNullColumns(pct_null_threshold=1.0, random_seed=0, **kwargs) Transformer to drop features whose percentage of NaN values exceeds a specified threshold. :param pct_null_threshold: The percentage of NaN values in an input feature to drop. Must be a value between [0, 1] inclusive. If equal to 0.0, will drop columns with any null values. If equal to 1.0, will drop columns with all null values. Defaults to 0.95. :type pct_null_threshold: float :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Drop Null Columns Transformer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.DropNullColumns.clone evalml.pipelines.components.transformers.DropNullColumns.default_parameters evalml.pipelines.components.transformers.DropNullColumns.describe evalml.pipelines.components.transformers.DropNullColumns.fit evalml.pipelines.components.transformers.DropNullColumns.fit_transform evalml.pipelines.components.transformers.DropNullColumns.load evalml.pipelines.components.transformers.DropNullColumns.needs_fitting evalml.pipelines.components.transformers.DropNullColumns.parameters evalml.pipelines.components.transformers.DropNullColumns.save evalml.pipelines.components.transformers.DropNullColumns.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data X by dropping columns that exceed the threshold of null values. :param X: Data to transform :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame .. py:class:: DropRowsTransformer(indices_to_drop=None, random_seed=0) Transformer to drop rows specified by row indices. :param indices_to_drop: List of indices to drop in the input data. Defaults to None. :type indices_to_drop: list :param random_seed: Seed for the random number generator. Is not used by this component. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - True * - **name** - Drop Rows Transformer * - **training_only** - True **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.DropRowsTransformer.clone evalml.pipelines.components.transformers.DropRowsTransformer.default_parameters evalml.pipelines.components.transformers.DropRowsTransformer.describe evalml.pipelines.components.transformers.DropRowsTransformer.fit evalml.pipelines.components.transformers.DropRowsTransformer.fit_transform evalml.pipelines.components.transformers.DropRowsTransformer.load evalml.pipelines.components.transformers.DropRowsTransformer.needs_fitting evalml.pipelines.components.transformers.DropRowsTransformer.parameters evalml.pipelines.components.transformers.DropRowsTransformer.save evalml.pipelines.components.transformers.DropRowsTransformer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self :raises ValueError: If indices to drop do not exist in input features or target. .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data using fitted component. :param X: Features. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series, optional :returns: Data with row indices dropped. :rtype: (pd.DataFrame, pd.Series) .. py:class:: EmailFeaturizer(random_seed=0, **kwargs) Transformer that can automatically extract features from emails. :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Email Featurizer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.EmailFeaturizer.clone evalml.pipelines.components.transformers.EmailFeaturizer.default_parameters evalml.pipelines.components.transformers.EmailFeaturizer.describe evalml.pipelines.components.transformers.EmailFeaturizer.fit evalml.pipelines.components.transformers.EmailFeaturizer.fit_transform evalml.pipelines.components.transformers.EmailFeaturizer.load evalml.pipelines.components.transformers.EmailFeaturizer.needs_fitting evalml.pipelines.components.transformers.EmailFeaturizer.parameters evalml.pipelines.components.transformers.EmailFeaturizer.save evalml.pipelines.components.transformers.EmailFeaturizer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises MethodPropertyNotFoundError: If component does not have a fit method or a component_obj that implements fit. .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data X. :param X: Data to transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:class:: FeatureSelector(parameters=None, component_obj=None, random_seed=0, **kwargs) Selects top features based on importance weights. :param parameters: Dictionary of parameters for the component. Defaults to None. :type parameters: dict :param component_obj: Third-party objects useful in component implementation. Defaults to None. :type component_obj: obj :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **modifies_features** - True * - **modifies_target** - False * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.FeatureSelector.clone evalml.pipelines.components.transformers.FeatureSelector.default_parameters evalml.pipelines.components.transformers.FeatureSelector.describe evalml.pipelines.components.transformers.FeatureSelector.fit evalml.pipelines.components.transformers.FeatureSelector.fit_transform evalml.pipelines.components.transformers.FeatureSelector.get_names evalml.pipelines.components.transformers.FeatureSelector.load evalml.pipelines.components.transformers.FeatureSelector.name evalml.pipelines.components.transformers.FeatureSelector.needs_fitting evalml.pipelines.components.transformers.FeatureSelector.parameters evalml.pipelines.components.transformers.FeatureSelector.save evalml.pipelines.components.transformers.FeatureSelector.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises MethodPropertyNotFoundError: If component does not have a fit method or a component_obj that implements fit. .. py:method:: fit_transform(self, X, y=None) Fit and transform data using the feature selector. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:method:: get_names(self) Get names of selected features. :returns: List of the names of features selected. :rtype: list[str] .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: name(cls) :property: Returns string name of this component. .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception. :param X: Data to transform. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If feature selector does not have a transform method or a component_obj that implements transform .. py:class:: Imputer(categorical_impute_strategy='most_frequent', categorical_fill_value=None, numeric_impute_strategy='mean', numeric_fill_value=None, boolean_impute_strategy='most_frequent', boolean_fill_value=None, random_seed=0, **kwargs) Imputes missing data according to a specified imputation strategy. :param categorical_impute_strategy: Impute strategy to use for string, object, boolean, categorical dtypes. Valid values include "most_frequent" and "constant". :type categorical_impute_strategy: string :param numeric_impute_strategy: Impute strategy to use for numeric columns. Valid values include "mean", "median", "most_frequent", and "constant". :type numeric_impute_strategy: string :param boolean_impute_strategy: Impute strategy to use for boolean columns. Valid values include "most_frequent" and "constant". :type boolean_impute_strategy: string :param categorical_fill_value: When categorical_impute_strategy == "constant", fill_value is used to replace missing data. The default value of None will fill with the string "missing_value". :type categorical_fill_value: string :param numeric_fill_value: When numeric_impute_strategy == "constant", fill_value is used to replace missing data. The default value of None will fill with 0. :type numeric_fill_value: int, float :param boolean_fill_value: When boolean_impute_strategy == "constant", fill_value is used to replace missing data. The default value of None will fill with True. :type boolean_fill_value: bool :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "categorical_impute_strategy": ["most_frequent"], "numeric_impute_strategy": ["mean", "median", "most_frequent", "knn"], "boolean_impute_strategy": ["most_frequent", "knn"]} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Imputer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.Imputer.clone evalml.pipelines.components.transformers.Imputer.default_parameters evalml.pipelines.components.transformers.Imputer.describe evalml.pipelines.components.transformers.Imputer.fit evalml.pipelines.components.transformers.Imputer.fit_transform evalml.pipelines.components.transformers.Imputer.load evalml.pipelines.components.transformers.Imputer.needs_fitting evalml.pipelines.components.transformers.Imputer.parameters evalml.pipelines.components.transformers.Imputer.save evalml.pipelines.components.transformers.Imputer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits imputer to data. 'None' values are converted to np.nan before imputation and are treated as the same. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame, np.ndarray :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data X by imputing missing values. :param X: Data to transform :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame .. py:class:: LabelEncoder(positive_label=None, random_seed=0, **kwargs) A transformer that encodes target labels using values between 0 and num_classes - 1. :param positive_label: The label for the class that should be treated as positive (1) for binary classification problems. Ignored for multiclass problems. Defaults to None. :type positive_label: int, str :param random_seed: Seed for the random number generator. Defaults to 0. Ignored. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - False * - **modifies_target** - True * - **name** - Label Encoder * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.LabelEncoder.clone evalml.pipelines.components.transformers.LabelEncoder.default_parameters evalml.pipelines.components.transformers.LabelEncoder.describe evalml.pipelines.components.transformers.LabelEncoder.fit evalml.pipelines.components.transformers.LabelEncoder.fit_transform evalml.pipelines.components.transformers.LabelEncoder.inverse_transform evalml.pipelines.components.transformers.LabelEncoder.load evalml.pipelines.components.transformers.LabelEncoder.needs_fitting evalml.pipelines.components.transformers.LabelEncoder.parameters evalml.pipelines.components.transformers.LabelEncoder.save evalml.pipelines.components.transformers.LabelEncoder.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y) Fits the label encoder. :param X: The input training data of shape [n_samples, n_features]. Ignored. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self :raises ValueError: If input `y` is None. .. py:method:: fit_transform(self, X, y) Fit and transform data using the label encoder. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: The original features and an encoded version of the target. :rtype: pd.DataFrame, pd.Series .. py:method:: inverse_transform(self, y) Decodes the target data. :param y: Target data. :type y: pd.Series :returns: The decoded version of the target. :rtype: pd.Series :raises ValueError: If input `y` is None. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transform the target using the fitted label encoder. :param X: The input training data of shape [n_samples, n_features]. Ignored. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: The original features and an encoded version of the target. :rtype: pd.DataFrame, pd.Series :raises ValueError: If input `y` is None. .. py:class:: LinearDiscriminantAnalysis(n_components=None, random_seed=0, **kwargs) Reduces the number of features by using Linear Discriminant Analysis. :param n_components: The number of features to maintain after computation. Defaults to None. :type n_components: int :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Linear Discriminant Analysis Transformer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.LinearDiscriminantAnalysis.clone evalml.pipelines.components.transformers.LinearDiscriminantAnalysis.default_parameters evalml.pipelines.components.transformers.LinearDiscriminantAnalysis.describe evalml.pipelines.components.transformers.LinearDiscriminantAnalysis.fit evalml.pipelines.components.transformers.LinearDiscriminantAnalysis.fit_transform evalml.pipelines.components.transformers.LinearDiscriminantAnalysis.load evalml.pipelines.components.transformers.LinearDiscriminantAnalysis.needs_fitting evalml.pipelines.components.transformers.LinearDiscriminantAnalysis.parameters evalml.pipelines.components.transformers.LinearDiscriminantAnalysis.save evalml.pipelines.components.transformers.LinearDiscriminantAnalysis.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y) Fits the LDA component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self :raises ValueError: If input data is not all numeric. .. py:method:: fit_transform(self, X, y=None) Fit and transform data using the LDA component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame :raises ValueError: If input data is not all numeric. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transform data using the fitted LDA component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame :raises ValueError: If input data is not all numeric. .. py:class:: LogTransformer(random_seed=0) Applies a log transformation to the target data. **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - False * - **modifies_target** - True * - **name** - Log Transformer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.LogTransformer.clone evalml.pipelines.components.transformers.LogTransformer.default_parameters evalml.pipelines.components.transformers.LogTransformer.describe evalml.pipelines.components.transformers.LogTransformer.fit evalml.pipelines.components.transformers.LogTransformer.fit_transform evalml.pipelines.components.transformers.LogTransformer.inverse_transform evalml.pipelines.components.transformers.LogTransformer.load evalml.pipelines.components.transformers.LogTransformer.needs_fitting evalml.pipelines.components.transformers.LogTransformer.parameters evalml.pipelines.components.transformers.LogTransformer.save evalml.pipelines.components.transformers.LogTransformer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the LogTransformer. :param X: Ignored. :type X: pd.DataFrame or np.ndarray :param y: Ignored. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Log transforms the target variable. :param X: Ignored. :type X: pd.DataFrame, optional :param y: Target variable to log transform. :type y: pd.Series :returns: The input features are returned without modification. The target variable y is log transformed. :rtype: tuple of pd.DataFrame, pd.Series .. py:method:: inverse_transform(self, y) Apply exponential to target data. :param y: Target variable. :type y: pd.Series :returns: Target with exponential applied. :rtype: pd.Series .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Log transforms the target variable. :param X: Ignored. :type X: pd.DataFrame, optional :param y: Target data to log transform. :type y: pd.Series :returns: The input features are returned without modification. The target variable y is log transformed. :rtype: tuple of pd.DataFrame, pd.Series .. py:class:: LSA(random_seed=0, **kwargs) Transformer to calculate the Latent Semantic Analysis Values of text input. :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - LSA Transformer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.LSA.clone evalml.pipelines.components.transformers.LSA.default_parameters evalml.pipelines.components.transformers.LSA.describe evalml.pipelines.components.transformers.LSA.fit evalml.pipelines.components.transformers.LSA.fit_transform evalml.pipelines.components.transformers.LSA.load evalml.pipelines.components.transformers.LSA.needs_fitting evalml.pipelines.components.transformers.LSA.parameters evalml.pipelines.components.transformers.LSA.save evalml.pipelines.components.transformers.LSA.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the input data. :param X: The data to transform. :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data X by applying the LSA pipeline. :param X: The data to transform. :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series, optional :returns: Transformed X. The original column is removed and replaced with two columns of the format `LSA(original_column_name)[feature_number]`, where `feature_number` is 0 or 1. :rtype: pd.DataFrame .. py:class:: NaturalLanguageFeaturizer(random_seed=0, **kwargs) Transformer that can automatically featurize text columns using featuretools' nlp_primitives. Since models cannot handle non-numeric data, any text must be broken down into features that provide useful information about that text. This component splits each text column into several informative features: Diversity Score, Mean Characters per Word, Polarity Score, LSA (Latent Semantic Analysis), Number of Characters, and Number of Words. Calling transform on this component will replace any text columns in the given dataset with these numeric columns. :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Natural Language Featurizer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.NaturalLanguageFeaturizer.clone evalml.pipelines.components.transformers.NaturalLanguageFeaturizer.default_parameters evalml.pipelines.components.transformers.NaturalLanguageFeaturizer.describe evalml.pipelines.components.transformers.NaturalLanguageFeaturizer.fit evalml.pipelines.components.transformers.NaturalLanguageFeaturizer.fit_transform evalml.pipelines.components.transformers.NaturalLanguageFeaturizer.load evalml.pipelines.components.transformers.NaturalLanguageFeaturizer.needs_fitting evalml.pipelines.components.transformers.NaturalLanguageFeaturizer.parameters evalml.pipelines.components.transformers.NaturalLanguageFeaturizer.save evalml.pipelines.components.transformers.NaturalLanguageFeaturizer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: The target training data of length [n_samples] :type y: pd.Series :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data X by creating new features using existing text columns. :param X: The data to transform. :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame .. py:class:: OneHotEncoder(top_n=10, features_to_encode=None, categories=None, drop='if_binary', handle_unknown='ignore', handle_missing='error', random_seed=0, **kwargs) A transformer that encodes categorical features in a one-hot numeric array. :param top_n: Number of categories per column to encode. If None, all categories will be encoded. Otherwise, the `n` most frequent will be encoded and all others will be dropped. Defaults to 10. :type top_n: int :param features_to_encode: List of columns to encode. All other columns will remain untouched. If None, all appropriate columns will be encoded. Defaults to None. :type features_to_encode: list[str] :param categories: A two dimensional list of categories, where `categories[i]` is a list of the categories for the column at index `i`. This can also be `None`, or `"auto"` if `top_n` is not None. Defaults to None. :type categories: list :param drop: Method ("first" or "if_binary") to use to drop one category per feature. Can also be a list specifying which categories to drop for each feature. Defaults to 'if_binary'. :type drop: string, list :param handle_unknown: Whether to ignore or error for unknown categories for a feature encountered during `fit` or `transform`. If either `top_n` or `categories` is used to limit the number of categories per column, this must be "ignore". Defaults to "ignore". :type handle_unknown: string :param handle_missing: Options for how to handle missing (NaN) values encountered during `fit` or `transform`. If this is set to "as_category" and NaN values are within the `n` most frequent, "nan" values will be encoded as their own column. If this is set to "error", any missing values encountered will raise an error. Defaults to "error". :type handle_missing: string :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - One Hot Encoder * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.OneHotEncoder.categories evalml.pipelines.components.transformers.OneHotEncoder.clone evalml.pipelines.components.transformers.OneHotEncoder.default_parameters evalml.pipelines.components.transformers.OneHotEncoder.describe evalml.pipelines.components.transformers.OneHotEncoder.fit evalml.pipelines.components.transformers.OneHotEncoder.fit_transform evalml.pipelines.components.transformers.OneHotEncoder.get_feature_names evalml.pipelines.components.transformers.OneHotEncoder.load evalml.pipelines.components.transformers.OneHotEncoder.needs_fitting evalml.pipelines.components.transformers.OneHotEncoder.parameters evalml.pipelines.components.transformers.OneHotEncoder.save evalml.pipelines.components.transformers.OneHotEncoder.transform .. py:method:: categories(self, feature_name) Returns a list of the unique categories to be encoded for the particular feature, in order. :param feature_name: The name of any feature provided to one-hot encoder during fit. :type feature_name: str :returns: The unique categories, in the same dtype as they were provided during fit. :rtype: np.ndarray :raises ValueError: If feature was not provided to one-hot encoder as a training feature. .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the one-hot encoder component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self :raises ValueError: If encoding a column failed. .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: get_feature_names(self) Return feature names for the categorical features after fitting. Feature names are formatted as {column name}_{category name}. In the event of a duplicate name, an integer will be added at the end of the feature name to distinguish it. For example, consider a dataframe with a column called "A" and category "x_y" and another column called "A_x" with "y". In this example, the feature names would be "A_x_y" and "A_x_y_1". :returns: The feature names after encoding, provided in the same order as input_features. :rtype: np.ndarray .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) One-hot encode the input data. :param X: Features to one-hot encode. :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series :returns: Transformed data, where each categorical feature has been encoded into numerical columns using one-hot encoding. :rtype: pd.DataFrame .. py:class:: OrdinalEncoder(features_to_encode=None, categories=None, handle_unknown='error', unknown_value=None, encoded_missing_value=None, random_seed=0, **kwargs) A transformer that encodes ordinal features as an array of ordinal integers representing the relative order of categories. :param features_to_encode: List of columns to encode. All other columns will remain untouched. If None, all appropriate columns will be encoded. Defaults to None. The order of columns does not matter. :type features_to_encode: list[str] :param categories: A dictionary mapping column names to their categories in the dataframes passed in at fit and transform. The order of categories specified for a column does not matter. Any category found in the data that is not present in categories will be handled as an unknown value. To not have unknown values raise an error, set handle_unknown to "use_encoded_value". Defaults to None. :type categories: dict[str, list[str]] :param handle_unknown: Whether to ignore or error for unknown categories for a feature encountered during `fit` or `transform`. When set to "error", an error will be raised when an unknown category is found. When set to "use_encoded_value", unknown categories will be encoded as the value given for the parameter unknown_value. Defaults to "error." :type handle_unknown: "error" or "use_encoded_value" :param unknown_value: The value to use for unknown categories seen during fit or transform. Required when the parameter handle_unknown is set to "use_encoded_value." The value has to be distinct from the values used to encode any of the categories in fit. Defaults to None. :type unknown_value: int or np.nan :param encoded_missing_value: The value to use for missing (null) values seen during fit or transform. Defaults to np.nan. :type encoded_missing_value: int or np.nan :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Ordinal Encoder * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.OrdinalEncoder.categories evalml.pipelines.components.transformers.OrdinalEncoder.clone evalml.pipelines.components.transformers.OrdinalEncoder.default_parameters evalml.pipelines.components.transformers.OrdinalEncoder.describe evalml.pipelines.components.transformers.OrdinalEncoder.fit evalml.pipelines.components.transformers.OrdinalEncoder.fit_transform evalml.pipelines.components.transformers.OrdinalEncoder.get_feature_names evalml.pipelines.components.transformers.OrdinalEncoder.load evalml.pipelines.components.transformers.OrdinalEncoder.needs_fitting evalml.pipelines.components.transformers.OrdinalEncoder.parameters evalml.pipelines.components.transformers.OrdinalEncoder.save evalml.pipelines.components.transformers.OrdinalEncoder.transform .. py:method:: categories(self, feature_name) Returns a list of the unique categories to be encoded for the particular feature, in order. :param feature_name: The name of any feature provided to ordinal encoder during fit. :type feature_name: str :returns: The unique categories, in the same dtype as they were provided during fit. :rtype: np.ndarray :raises ValueError: If feature was not provided to ordinal encoder as a training feature. .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the ordinal encoder component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self :raises ValueError: If encoding a column failed. :raises TypeError: If non-Ordinal columns are specified in features_to_encode. .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: get_feature_names(self) Return feature names for the ordinal features after fitting. Feature names are formatted as {column name}_ordinal_encoding. :returns: The feature names after encoding, provided in the same order as input_features. :rtype: np.ndarray .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Ordinally encode the input data. :param X: Features to encode. :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series :returns: Transformed data, where each ordinal feature has been encoded into a numerical column where ordinal integers represent the relative order of categories. :rtype: pd.DataFrame .. py:class:: Oversampler(sampling_ratio=0.25, sampling_ratio_dict=None, k_neighbors_default=5, n_jobs=-1, random_seed=0, **kwargs) SMOTE Oversampler component. Will automatically select whether to use SMOTE, SMOTEN, or SMOTENC based on inputs to the component. :param sampling_ratio: This is the goal ratio of the minority to majority class, with range (0, 1]. A value of 0.25 means we want a 1:4 ratio of the minority to majority class after oversampling. We will create the a sampling dictionary using this ratio, with the keys corresponding to the class and the values responding to the number of samples. Defaults to 0.25. :type sampling_ratio: float :param sampling_ratio_dict: A dictionary specifying the desired balanced ratio for each target value. For instance, in a binary case where class 1 is the minority, we could specify: `sampling_ratio_dict={0: 0.5, 1: 1}`, which means we would undersample class 0 to have twice the number of samples as class 1 (minority:majority ratio = 0.5), and don't sample class 1. Overrides sampling_ratio if provided. Defaults to None. :type sampling_ratio_dict: dict :param k_neighbors_default: The number of nearest neighbors used to construct synthetic samples. This is the default value used, but the actual k_neighbors value might be smaller if there are less samples. Defaults to 5. :type k_neighbors_default: int :param n_jobs: The number of CPU cores to use. Defaults to -1. :type n_jobs: int :param random_seed: The seed to use for random sampling. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - True * - **name** - Oversampler * - **training_only** - True **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.Oversampler.clone evalml.pipelines.components.transformers.Oversampler.default_parameters evalml.pipelines.components.transformers.Oversampler.describe evalml.pipelines.components.transformers.Oversampler.fit evalml.pipelines.components.transformers.Oversampler.fit_transform evalml.pipelines.components.transformers.Oversampler.load evalml.pipelines.components.transformers.Oversampler.needs_fitting evalml.pipelines.components.transformers.Oversampler.parameters evalml.pipelines.components.transformers.Oversampler.save evalml.pipelines.components.transformers.Oversampler.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y) Fits oversampler to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y) Fit and transform data using the sampler component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: (pd.DataFrame, pd.Series) .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms the input data by sampling the data. :param X: Training features. :type X: pd.DataFrame :param y: Target. :type y: pd.Series :returns: Transformed features and target. :rtype: pd.DataFrame, pd.Series .. py:class:: PCA(variance=0.95, n_components=None, random_seed=0, **kwargs) Reduces the number of features by using Principal Component Analysis (PCA). :param variance: The percentage of the original data variance that should be preserved when reducing the number of features. Defaults to 0.95. :type variance: float :param n_components: The number of features to maintain after computing SVD. Defaults to None, but will override variance variable if set. :type n_components: int :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - Real(0.25, 1)}:type: {"variance" * - **modifies_features** - True * - **modifies_target** - False * - **name** - PCA Transformer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.PCA.clone evalml.pipelines.components.transformers.PCA.default_parameters evalml.pipelines.components.transformers.PCA.describe evalml.pipelines.components.transformers.PCA.fit evalml.pipelines.components.transformers.PCA.fit_transform evalml.pipelines.components.transformers.PCA.load evalml.pipelines.components.transformers.PCA.needs_fitting evalml.pipelines.components.transformers.PCA.parameters evalml.pipelines.components.transformers.PCA.save evalml.pipelines.components.transformers.PCA.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the PCA component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self :raises ValueError: If input data is not all numeric. .. py:method:: fit_transform(self, X, y=None) Fit and transform data using the PCA component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame :raises ValueError: If input data is not all numeric. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transform data using fitted PCA component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame :raises ValueError: If input data is not all numeric. .. py:class:: PerColumnImputer(impute_strategies=None, random_seed=0, **kwargs) Imputes missing data according to a specified imputation strategy per column. :param impute_strategies: Column and {"impute_strategy": strategy, "fill_value":value} pairings. Valid values for impute strategy include "mean", "median", "most_frequent", "constant" for numerical data, and "most_frequent", "constant" for object data types. Defaults to None, which uses "most_frequent" for all columns. When impute_strategy == "constant", fill_value is used to replace missing data. When None, uses 0 when imputing numerical data and "missing_value" for strings or object data types. :type impute_strategies: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Per Column Imputer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.PerColumnImputer.clone evalml.pipelines.components.transformers.PerColumnImputer.default_parameters evalml.pipelines.components.transformers.PerColumnImputer.describe evalml.pipelines.components.transformers.PerColumnImputer.fit evalml.pipelines.components.transformers.PerColumnImputer.fit_transform evalml.pipelines.components.transformers.PerColumnImputer.load evalml.pipelines.components.transformers.PerColumnImputer.needs_fitting evalml.pipelines.components.transformers.PerColumnImputer.parameters evalml.pipelines.components.transformers.PerColumnImputer.save evalml.pipelines.components.transformers.PerColumnImputer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits imputers on input data. :param X: The input training data of shape [n_samples, n_features] to fit. :type X: pd.DataFrame or np.ndarray :param y: The target training data of length [n_samples]. Ignored. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms input data by imputing missing values. :param X: The input training data of shape [n_samples, n_features] to transform. :type X: pd.DataFrame or np.ndarray :param y: The target training data of length [n_samples]. Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame .. py:class:: PolynomialDecomposer(time_index: str = None, degree: int = 1, seasonal_period: int = -1, random_seed: int = 0, **kwargs) Removes trends and seasonality from time series by fitting a polynomial and moving average to the data. Scikit-learn's PolynomialForecaster is used to generate the additive trend portion of the target data. A polynomial will be fit to the data during fit. That additive polynomial trend will be removed during fit so that statsmodel's seasonal_decompose can determine the addititve seasonality of the data by using rolling averages over the series' inferred periodicity. For example, daily time series data will generate rolling averages over the first week of data, normalize out the mean and return those 7 averages repeated over the entire length of the given series. Those seven averages, repeated as many times as necessary to match the length of the given target data, will be used as the seasonal signal of the data. :param time_index: Specifies the name of the column in X that provides the datetime objects. Defaults to None. :type time_index: str :param degree: Degree for the polynomial. If 1, linear model is fit to the data. If 2, quadratic model is fit, etc. Defaults to 1. :type degree: int :param seasonal_period: The number of entries in the time series data that corresponds to one period of a cyclic signal. For instance, if data is known to possess a weekly seasonal signal, and if the data is daily data, seasonal_period should be 7. For daily data with a yearly seasonal signal, seasonal_period should be 365. Defaults to -1, which uses the statsmodels libarary's freq_to_period function. https://github.com/statsmodels/statsmodels/blob/main/statsmodels/tsa/tsatools.py :type seasonal_period: int :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "degree": Integer(1, 3)} * - **modifies_features** - False * - **modifies_target** - True * - **name** - Polynomial Decomposer * - **needs_fitting** - True * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.PolynomialDecomposer.clone evalml.pipelines.components.transformers.PolynomialDecomposer.default_parameters evalml.pipelines.components.transformers.PolynomialDecomposer.describe evalml.pipelines.components.transformers.PolynomialDecomposer.determine_periodicity evalml.pipelines.components.transformers.PolynomialDecomposer.fit evalml.pipelines.components.transformers.PolynomialDecomposer.fit_transform evalml.pipelines.components.transformers.PolynomialDecomposer.get_trend_dataframe evalml.pipelines.components.transformers.PolynomialDecomposer.inverse_transform evalml.pipelines.components.transformers.PolynomialDecomposer.load evalml.pipelines.components.transformers.PolynomialDecomposer.parameters evalml.pipelines.components.transformers.PolynomialDecomposer.plot_decomposition evalml.pipelines.components.transformers.PolynomialDecomposer.save evalml.pipelines.components.transformers.PolynomialDecomposer.set_seasonal_period evalml.pipelines.components.transformers.PolynomialDecomposer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: determine_periodicity(self, X: pandas.DataFrame, y: pandas.Series, method: str = 'autocorrelation') Function that uses autocorrelative methods to determine the first, signficant period of the seasonal signal. :param X: The feature data of the time series problem. :type X: pandas.DataFrame :param y: The target data of a time series problem. :type y: pandas.Series :param method: Either "autocorrelation" or "partial-autocorrelation". The method by which to determine the first period of the seasonal part of the target signal. "partial-autocorrelation" should currently not be used. Defaults to "autocorrelation". :type method: str :returns: The integer numbers of entries in time series data over which the seasonal part of the target data repeats. If the time series data is in days, then this is the number of days that it takes the target's seasonal signal to repeat. Note: the target data can contain multiple seasonal signals. This function will only return the first, and thus, shortest period. E.g. if the target has both weekly and yearly seasonality, the function will only return "7" and not return "365". If no period is detected, returns [None]. :rtype: (list[int]) .. py:method:: fit(self, X: pandas.DataFrame, y: pandas.Series = None) -> PolynomialDecomposer Fits the PolynomialDecomposer and determine the seasonal signal. Currently only fits the polynomial detrender. The seasonality is determined by removing the trend from the signal and using statsmodels' seasonal_decompose(). Both the trend and seasonality are currently assumed to be additive. :param X: Conditionally used to build datetime index. :type X: pd.DataFrame, optional :param y: Target variable to detrend and deseasonalize. :type y: pd.Series :returns: self :raises NotImplementedError: If the input data has a frequency of "month-begin". This isn't supported by statsmodels decompose as the freqstr "MS" is misinterpreted as milliseconds. :raises ValueError: If y is None. :raises ValueError: If target data doesn't have DatetimeIndex AND no Datetime features in features data .. py:method:: fit_transform(self, X: pandas.DataFrame, y: pandas.Series = None) -> tuple[pandas.DataFrame, pandas.Series] Removes fitted trend and seasonality from target variable. :param X: Ignored. :type X: pd.DataFrame, optional :param y: Target variable to detrend and deseasonalize. :type y: pd.Series :returns: The first element are the input features returned without modification. The second element is the target variable y with the fitted trend removed. :rtype: tuple of pd.DataFrame, pd.Series .. py:method:: get_trend_dataframe(self, X: pandas.DataFrame, y: pandas.Series) -> list[pandas.DataFrame] Return a list of dataframes with 4 columns: signal, trend, seasonality, residual. Scikit-learn's PolynomialForecaster is used to generate the trend portion of the target data. statsmodel's seasonal_decompose is used to generate the seasonality of the data. :param X: Input data with time series data in index. :type X: pd.DataFrame :param y: Target variable data provided as a Series for univariate problems or a DataFrame for multivariate problems. :type y: pd.Series or pd.DataFrame :returns: Each DataFrame contains the columns "signal", "trend", "seasonality" and "residual," with the latter 3 column values being the decomposed elements of the target data. The "signal" column is simply the input target signal but reindexed with a datetime index to match the input features. :rtype: list of pd.DataFrame :raises TypeError: If X does not have time-series data in the index. :raises ValueError: If time series index of X does not have an inferred frequency. :raises ValueError: If the forecaster associated with the detrender has not been fit yet. :raises TypeError: If y is not provided as a pandas Series or DataFrame. .. py:method:: inverse_transform(self, y_t: pandas.Series) -> tuple[pandas.DataFrame, pandas.Series] Adds back fitted trend and seasonality to target variable. The polynomial trend is added back into the signal, calling the detrender's inverse_transform(). Then, the seasonality is projected forward to and added back into the signal. :param y_t: Target variable. :type y_t: pd.Series :returns: The first element are the input features returned without modification. The second element is the target variable y with the trend and seasonality added back in. :rtype: tuple of pd.DataFrame, pd.Series :raises ValueError: If y is None. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: plot_decomposition(self, X: pandas.DataFrame, y: pandas.Series, show: bool = False) -> tuple[matplotlib.pyplot.Figure, list] Plots the decomposition of the target signal. :param X: Input data with time series data in index. :type X: pd.DataFrame :param y: Target variable data provided as a Series for univariate problems or a DataFrame for multivariate problems. :type y: pd.Series or pd.DataFrame :param show: Whether to display the plot or not. Defaults to False. :type show: bool :returns: The figure and axes that have the decompositions plotted on them :rtype: matplotlib.pyplot.Figure, list[matplotlib.pyplot.Axes] .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: set_seasonal_period(self, X: pandas.DataFrame, y: pandas.Series) Function to set the component's seasonal period based on the target's seasonality. :param X: The feature data of the time series problem. :type X: pandas.DataFrame :param y: The target data of a time series problem. :type y: pandas.Series .. py:method:: transform(self, X: pandas.DataFrame, y: pandas.Series = None) -> tuple[pandas.DataFrame, pandas.Series] Transforms the target data by removing the polynomial trend and rolling average seasonality. Applies the fit polynomial detrender to the target data, removing the additive polynomial trend. Then, utilizes the first period's worth of seasonal data determined in the .fit() function to extrapolate the seasonal signal of the data to be transformed. This seasonal signal is also assumed to be additive and is removed. :param X: Conditionally used to build datetime index. :type X: pd.DataFrame, optional :param y: Target variable to detrend and deseasonalize. :type y: pd.Series :returns: The input features are returned without modification. The target variable y is detrended and deseasonalized. :rtype: tuple of pd.DataFrame, pd.Series :raises ValueError: If target data doesn't have DatetimeIndex AND no Datetime features in features data .. py:class:: ReplaceNullableTypes(random_seed=0, **kwargs) Transformer to replace features with the new nullable dtypes with a dtype that is compatible in EvalML. **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - None * - **modifies_features** - True * - **modifies_target** - {} * - **name** - Replace Nullable Types Transformer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.ReplaceNullableTypes.clone evalml.pipelines.components.transformers.ReplaceNullableTypes.default_parameters evalml.pipelines.components.transformers.ReplaceNullableTypes.describe evalml.pipelines.components.transformers.ReplaceNullableTypes.fit evalml.pipelines.components.transformers.ReplaceNullableTypes.fit_transform evalml.pipelines.components.transformers.ReplaceNullableTypes.load evalml.pipelines.components.transformers.ReplaceNullableTypes.needs_fitting evalml.pipelines.components.transformers.ReplaceNullableTypes.parameters evalml.pipelines.components.transformers.ReplaceNullableTypes.save evalml.pipelines.components.transformers.ReplaceNullableTypes.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Substitutes non-nullable types for the new pandas nullable types in the data and target data. :param X: Input features. :type X: pd.DataFrame, optional :param y: Target data. :type y: pd.Series :returns: The input features and target data with the non-nullable types set. :rtype: tuple of pd.DataFrame, pd.Series .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data by replacing columns that contain nullable types with the appropriate replacement type. "float64" for nullable integers and "category" for nullable booleans. :param X: Data to transform :type X: pd.DataFrame :param y: Target data to transform :type y: pd.Series, optional :returns: Transformed X pd.Series: Transformed y :rtype: pd.DataFrame .. py:class:: RFClassifierSelectFromModel(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold='median', n_jobs=-1, random_seed=0, **kwargs) Selects top features based on importance weights using a Random Forest classifier. :param number_features: The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. Defaults to None. :type number_features: int :param n_estimators: The number of trees in the forest. Defaults to 100. :type n_estimators: float :param max_depth: Maximum tree depth for base learners. Defaults to 6. :type max_depth: int :param percent_features: Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. :type percent_features: float :param threshold: The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If "median", then the threshold value is the median of the feature importances. A scaling factor (e.g., "1.25*mean") may also be used. Defaults to -np.inf. :type threshold: string or float :param n_jobs: Number of jobs to run in parallel. -1 uses all processes. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "percent_features": Real(0.01, 1), "threshold": ["mean", "median"],} * - **modifies_features** - True * - **modifies_target** - False * - **name** - RF Classifier Select From Model * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.RFClassifierSelectFromModel.clone evalml.pipelines.components.transformers.RFClassifierSelectFromModel.default_parameters evalml.pipelines.components.transformers.RFClassifierSelectFromModel.describe evalml.pipelines.components.transformers.RFClassifierSelectFromModel.fit evalml.pipelines.components.transformers.RFClassifierSelectFromModel.fit_transform evalml.pipelines.components.transformers.RFClassifierSelectFromModel.get_names evalml.pipelines.components.transformers.RFClassifierSelectFromModel.load evalml.pipelines.components.transformers.RFClassifierSelectFromModel.needs_fitting evalml.pipelines.components.transformers.RFClassifierSelectFromModel.parameters evalml.pipelines.components.transformers.RFClassifierSelectFromModel.save evalml.pipelines.components.transformers.RFClassifierSelectFromModel.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises MethodPropertyNotFoundError: If component does not have a fit method or a component_obj that implements fit. .. py:method:: fit_transform(self, X, y=None) Fit and transform data using the feature selector. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:method:: get_names(self) Get names of selected features. :returns: List of the names of features selected. :rtype: list[str] .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception. :param X: Data to transform. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If feature selector does not have a transform method or a component_obj that implements transform .. py:class:: RFRegressorSelectFromModel(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold='median', n_jobs=-1, random_seed=0, **kwargs) Selects top features based on importance weights using a Random Forest regressor. :param number_features: The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. Defaults to None. :type number_features: int :param n_estimators: The number of trees in the forest. Defaults to 100. :type n_estimators: float :param max_depth: Maximum tree depth for base learners. Defaults to 6. :type max_depth: int :param percent_features: Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. :type percent_features: float :param threshold: The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If "median", then the threshold value is the median of the feature importances. A scaling factor (e.g., "1.25*mean") may also be used. Defaults to -np.inf. :type threshold: string or float :param n_jobs: Number of jobs to run in parallel. -1 uses all processes. Defaults to -1. :type n_jobs: int or None :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "percent_features": Real(0.01, 1), "threshold": ["mean", "median"],} * - **modifies_features** - True * - **modifies_target** - False * - **name** - RF Regressor Select From Model * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.RFRegressorSelectFromModel.clone evalml.pipelines.components.transformers.RFRegressorSelectFromModel.default_parameters evalml.pipelines.components.transformers.RFRegressorSelectFromModel.describe evalml.pipelines.components.transformers.RFRegressorSelectFromModel.fit evalml.pipelines.components.transformers.RFRegressorSelectFromModel.fit_transform evalml.pipelines.components.transformers.RFRegressorSelectFromModel.get_names evalml.pipelines.components.transformers.RFRegressorSelectFromModel.load evalml.pipelines.components.transformers.RFRegressorSelectFromModel.needs_fitting evalml.pipelines.components.transformers.RFRegressorSelectFromModel.parameters evalml.pipelines.components.transformers.RFRegressorSelectFromModel.save evalml.pipelines.components.transformers.RFRegressorSelectFromModel.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises MethodPropertyNotFoundError: If component does not have a fit method or a component_obj that implements fit. .. py:method:: fit_transform(self, X, y=None) Fit and transform data using the feature selector. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:method:: get_names(self) Get names of selected features. :returns: List of the names of features selected. :rtype: list[str] .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception. :param X: Data to transform. :type X: pd.DataFrame :param y: Target data. Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If feature selector does not have a transform method or a component_obj that implements transform .. py:class:: SelectByType(column_types=None, exclude=False, random_seed=0, **kwargs) Selects columns by specified Woodwork logical type or semantic tag in input data. :param column_types: List of Woodwork types or tags, used to determine which columns to select or exclude. :type column_types: string, ww.LogicalType, list(string), list(ww.LogicalType) :param exclude: If true, exclude the column_types instead of including them. Defaults to False. :type exclude: bool :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Select Columns By Type Transformer * - **needs_fitting** - False * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.SelectByType.clone evalml.pipelines.components.transformers.SelectByType.default_parameters evalml.pipelines.components.transformers.SelectByType.describe evalml.pipelines.components.transformers.SelectByType.fit evalml.pipelines.components.transformers.SelectByType.fit_transform evalml.pipelines.components.transformers.SelectByType.load evalml.pipelines.components.transformers.SelectByType.parameters evalml.pipelines.components.transformers.SelectByType.save evalml.pipelines.components.transformers.SelectByType.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the transformer by checking if column names are present in the dataset. :param X: Data to check. :type X: pd.DataFrame :param y: Targets. :type y: pd.Series, ignored :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data X by selecting columns. :param X: Data to transform. :type X: pd.DataFrame :param y: Targets. :type y: pd.Series, optional :returns: Transformed X. :rtype: pd.DataFrame .. py:class:: SelectColumns(columns=None, random_seed=0, **kwargs) Selects specified columns in input data. :param columns: List of column names, used to determine which columns to select. If columns are not present, they will not be selected. :type columns: list(string) :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Select Columns Transformer * - **needs_fitting** - False * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.SelectColumns.clone evalml.pipelines.components.transformers.SelectColumns.default_parameters evalml.pipelines.components.transformers.SelectColumns.describe evalml.pipelines.components.transformers.SelectColumns.fit evalml.pipelines.components.transformers.SelectColumns.fit_transform evalml.pipelines.components.transformers.SelectColumns.load evalml.pipelines.components.transformers.SelectColumns.parameters evalml.pipelines.components.transformers.SelectColumns.save evalml.pipelines.components.transformers.SelectColumns.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the transformer by checking if column names are present in the dataset. :param X: Data to check. :type X: pd.DataFrame :param y: Targets. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transform data using fitted column selector component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:class:: SimpleImputer(impute_strategy='most_frequent', fill_value=None, random_seed=0, **kwargs) Imputes missing data according to a specified imputation strategy. Natural language columns are ignored. :param impute_strategy: Impute strategy to use. Valid values include "mean", "median", "most_frequent", "constant" for numerical data, and "most_frequent", "constant" for object data types. :type impute_strategy: string :param fill_value: When impute_strategy == "constant", fill_value is used to replace missing data. Defaults to 0 when imputing numerical data and "missing_value" for strings or object data types. :type fill_value: string :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "impute_strategy": ["mean", "median", "most_frequent"]} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Simple Imputer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.SimpleImputer.clone evalml.pipelines.components.transformers.SimpleImputer.default_parameters evalml.pipelines.components.transformers.SimpleImputer.describe evalml.pipelines.components.transformers.SimpleImputer.fit evalml.pipelines.components.transformers.SimpleImputer.fit_transform evalml.pipelines.components.transformers.SimpleImputer.load evalml.pipelines.components.transformers.SimpleImputer.needs_fitting evalml.pipelines.components.transformers.SimpleImputer.parameters evalml.pipelines.components.transformers.SimpleImputer.save evalml.pipelines.components.transformers.SimpleImputer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits imputer to data. 'None' values are converted to np.nan before imputation and are treated as the same. :param X: the input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: the target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises ValueError: if the SimpleImputer receives a dataframe with both Boolean and Categorical data. .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform :type X: pd.DataFrame :param y: Target data. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms input by imputing missing values. 'None' and np.nan values are treated as the same. :param X: Data to transform. :type X: pd.DataFrame :param y: Ignored. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame .. py:class:: StandardScaler(random_seed=0, **kwargs) A transformer that standardizes input features by removing the mean and scaling to unit variance. :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Standard Scaler * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.StandardScaler.clone evalml.pipelines.components.transformers.StandardScaler.default_parameters evalml.pipelines.components.transformers.StandardScaler.describe evalml.pipelines.components.transformers.StandardScaler.fit evalml.pipelines.components.transformers.StandardScaler.fit_transform evalml.pipelines.components.transformers.StandardScaler.load evalml.pipelines.components.transformers.StandardScaler.needs_fitting evalml.pipelines.components.transformers.StandardScaler.parameters evalml.pipelines.components.transformers.StandardScaler.save evalml.pipelines.components.transformers.StandardScaler.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the standard scalar on the given data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fit and transform data using the standard scaler component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transform data using the fitted standard scaler. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:class:: STLDecomposer(time_index: str = None, degree: int = 1, seasonal_period: int = 7, random_seed: int = 0, **kwargs) Removes trends and seasonality from time series using the STL algorithm. https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.STL.html :param time_index: Specifies the name of the column in X that provides the datetime objects. Defaults to None. :type time_index: str :param degree: Not currently used. STL 3x "degree-like" values. None are able to be set at this time. Defaults to 1. :type degree: int :param seasonal_period: The number of entries in the time series data that corresponds to one period of a cyclic signal. For instance, if data is known to possess a weekly seasonal signal, and if the data is daily data, seasonal_period should be 7. For daily data with a yearly seasonal signal, seasonal_period should be 365. For compatibility with the underlying STL algorithm, must be odd. If an even number is provided, the next, highest odd number will be used. Defaults to 7. :type seasonal_period: int :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - None * - **modifies_features** - False * - **modifies_target** - True * - **name** - STL Decomposer * - **needs_fitting** - True * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.STLDecomposer.clone evalml.pipelines.components.transformers.STLDecomposer.default_parameters evalml.pipelines.components.transformers.STLDecomposer.describe evalml.pipelines.components.transformers.STLDecomposer.determine_periodicity evalml.pipelines.components.transformers.STLDecomposer.fit evalml.pipelines.components.transformers.STLDecomposer.fit_transform evalml.pipelines.components.transformers.STLDecomposer.get_trend_dataframe evalml.pipelines.components.transformers.STLDecomposer.inverse_transform evalml.pipelines.components.transformers.STLDecomposer.load evalml.pipelines.components.transformers.STLDecomposer.parameters evalml.pipelines.components.transformers.STLDecomposer.plot_decomposition evalml.pipelines.components.transformers.STLDecomposer.save evalml.pipelines.components.transformers.STLDecomposer.set_seasonal_period evalml.pipelines.components.transformers.STLDecomposer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: determine_periodicity(self, X: pandas.DataFrame, y: pandas.Series, method: str = 'autocorrelation') Function that uses autocorrelative methods to determine the first, signficant period of the seasonal signal. :param X: The feature data of the time series problem. :type X: pandas.DataFrame :param y: The target data of a time series problem. :type y: pandas.Series :param method: Either "autocorrelation" or "partial-autocorrelation". The method by which to determine the first period of the seasonal part of the target signal. "partial-autocorrelation" should currently not be used. Defaults to "autocorrelation". :type method: str :returns: The integer numbers of entries in time series data over which the seasonal part of the target data repeats. If the time series data is in days, then this is the number of days that it takes the target's seasonal signal to repeat. Note: the target data can contain multiple seasonal signals. This function will only return the first, and thus, shortest period. E.g. if the target has both weekly and yearly seasonality, the function will only return "7" and not return "365". If no period is detected, returns [None]. :rtype: (list[int]) .. py:method:: fit(self, X: pandas.DataFrame, y: pandas.Series = None) -> STLDecomposer Fits the STLDecomposer and determine the seasonal signal. Instantiates a statsmodels STL decompose object with the component's stored parameters and fits it. Since the statsmodels object does not fit the sklearn api, it is not saved during __init__() in _component_obj and will be re-instantiated each time fit is called. To emulate the sklearn API, when the STL decomposer is fit, the full seasonal component, a single period sample of the seasonal component, the full trend-cycle component and the residual are saved. y(t) = S(t) + T(t) + R(t) :param X: Conditionally used to build datetime index. :type X: pd.DataFrame, optional :param y: Target variable to detrend and deseasonalize. :type y: pd.Series :returns: self :raises ValueError: If y is None. :raises ValueError: If target data doesn't have DatetimeIndex AND no Datetime features in features data .. py:method:: fit_transform(self, X: pandas.DataFrame, y: pandas.Series = None) -> tuple[pandas.DataFrame, pandas.Series] Removes fitted trend and seasonality from target variable. :param X: Ignored. :type X: pd.DataFrame, optional :param y: Target variable to detrend and deseasonalize. :type y: pd.Series :returns: The first element are the input features returned without modification. The second element is the target variable y with the fitted trend removed. :rtype: tuple of pd.DataFrame, pd.Series .. py:method:: get_trend_dataframe(self, X, y) Return a list of dataframes with 4 columns: signal, trend, seasonality, residual. :param X: Input data with time series data in index. :type X: pd.DataFrame :param y: Target variable data provided as a Series for univariate problems or a DataFrame for multivariate problems. :type y: pd.Series or pd.DataFrame :returns: Each DataFrame contains the columns "signal", "trend", "seasonality" and "residual," with the latter 3 column values being the decomposed elements of the target data. The "signal" column is simply the input target signal but reindexed with a datetime index to match the input features. :rtype: list of pd.DataFrame :raises TypeError: If X does not have time-series data in the index. :raises ValueError: If time series index of X does not have an inferred frequency. :raises ValueError: If the forecaster associated with the detrender has not been fit yet. :raises TypeError: If y is not provided as a pandas Series or DataFrame. .. py:method:: inverse_transform(self, y_t: pandas.Series) -> tuple[pandas.DataFrame, pandas.Series] Adds back fitted trend and seasonality to target variable. The STL trend is projected to cover the entire requested target range, then added back into the signal. Then, the seasonality is projected forward to and added back into the signal. :param y_t: Target variable. :type y_t: pd.Series :returns: The first element are the input features returned without modification. The second element is the target variable y with the trend and seasonality added back in. :rtype: tuple of pd.DataFrame, pd.Series :raises ValueError: If y is None. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: plot_decomposition(self, X: pandas.DataFrame, y: pandas.Series, show: bool = False) -> tuple[matplotlib.pyplot.Figure, list] Plots the decomposition of the target signal. :param X: Input data with time series data in index. :type X: pd.DataFrame :param y: Target variable data provided as a Series for univariate problems or a DataFrame for multivariate problems. :type y: pd.Series or pd.DataFrame :param show: Whether to display the plot or not. Defaults to False. :type show: bool :returns: The figure and axes that have the decompositions plotted on them :rtype: matplotlib.pyplot.Figure, list[matplotlib.pyplot.Axes] .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: set_seasonal_period(self, X: pandas.DataFrame, y: pandas.Series) Function to set the component's seasonal period based on the target's seasonality. :param X: The feature data of the time series problem. :type X: pandas.DataFrame :param y: The target data of a time series problem. :type y: pandas.Series .. py:method:: transform(self, X: pandas.DataFrame, y: pandas.Series = None) -> tuple[pandas.DataFrame, pandas.Series] Transforms the target data by removing the STL trend and seasonality. Uses an ARIMA model to project forward the addititve trend and removes it. Then, utilizes the first period's worth of seasonal data determined in the .fit() function to extrapolate the seasonal signal of the data to be transformed. This seasonal signal is also assumed to be additive and is removed. :param X: Conditionally used to build datetime index. :type X: pd.DataFrame, optional :param y: Target variable to detrend and deseasonalize. :type y: pd.Series :returns: The input features are returned without modification. The target variable y is detrended and deseasonalized. :rtype: tuple of pd.DataFrame, pd.Series :raises ValueError: If target data doesn't have DatetimeIndex AND no Datetime features in features data .. py:class:: TargetEncoder(cols=None, smoothing=1, handle_unknown='value', handle_missing='value', random_seed=0, **kwargs) A transformer that encodes categorical features into target encodings. :param cols: Columns to encode. If None, all string columns will be encoded, otherwise only the columns provided will be encoded. Defaults to None :type cols: list :param smoothing: The smoothing factor to apply. The larger this value is, the more influence the expected target value has on the resulting target encodings. Must be strictly larger than 0. Defaults to 1.0 :type smoothing: float :param handle_unknown: Determines how to handle unknown categories for a feature encountered. Options are 'value', 'error', nd 'return_nan'. Defaults to 'value', which replaces with the target mean :type handle_unknown: string :param handle_missing: Determines how to handle missing values encountered during `fit` or `transform`. Options are 'value', 'error', and 'return_nan'. Defaults to 'value', which replaces with the target mean :type handle_missing: string :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - Target Encoder * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.TargetEncoder.clone evalml.pipelines.components.transformers.TargetEncoder.default_parameters evalml.pipelines.components.transformers.TargetEncoder.describe evalml.pipelines.components.transformers.TargetEncoder.fit evalml.pipelines.components.transformers.TargetEncoder.fit_transform evalml.pipelines.components.transformers.TargetEncoder.get_feature_names evalml.pipelines.components.transformers.TargetEncoder.load evalml.pipelines.components.transformers.TargetEncoder.needs_fitting evalml.pipelines.components.transformers.TargetEncoder.parameters evalml.pipelines.components.transformers.TargetEncoder.save evalml.pipelines.components.transformers.TargetEncoder.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y) Fits the target encoder. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y) Fit and transform data using the target encoder. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:method:: get_feature_names(self) Return feature names for the input features after fitting. :returns: The feature names after encoding. :rtype: np.array .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transform data using the fitted target encoder. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: pd.DataFrame .. py:class:: TargetImputer(impute_strategy='most_frequent', fill_value=None, random_seed=0, **kwargs) Imputes missing target data according to a specified imputation strategy. :param impute_strategy: Impute strategy to use. Valid values include "mean", "median", "most_frequent", "constant" for numerical data, and "most_frequent", "constant" for object data types. Defaults to "most_frequent". :type impute_strategy: string :param fill_value: When impute_strategy == "constant", fill_value is used to replace missing data. Defaults to None which uses 0 when imputing numerical data and "missing_value" for strings or object data types. :type fill_value: string :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "impute_strategy": ["mean", "median", "most_frequent"]} * - **modifies_features** - False * - **modifies_target** - True * - **name** - Target Imputer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.TargetImputer.clone evalml.pipelines.components.transformers.TargetImputer.default_parameters evalml.pipelines.components.transformers.TargetImputer.describe evalml.pipelines.components.transformers.TargetImputer.fit evalml.pipelines.components.transformers.TargetImputer.fit_transform evalml.pipelines.components.transformers.TargetImputer.load evalml.pipelines.components.transformers.TargetImputer.needs_fitting evalml.pipelines.components.transformers.TargetImputer.parameters evalml.pipelines.components.transformers.TargetImputer.save evalml.pipelines.components.transformers.TargetImputer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y) Fits imputer to target data. 'None' values are converted to np.nan before imputation and are treated as the same. :param X: The input training data of shape [n_samples, n_features]. Ignored. :type X: pd.DataFrame or np.ndarray :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self :raises TypeError: If target is filled with all null values. .. py:method:: fit_transform(self, X, y) Fits on and transforms the input target data. :param X: Features. Ignored. :type X: pd.DataFrame :param y: Target data to impute. :type y: pd.Series :returns: The original X, transformed y :rtype: (pd.DataFrame, pd.Series) .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y) Transforms input target data by imputing missing values. 'None' and np.nan values are treated as the same. :param X: Features. Ignored. :type X: pd.DataFrame :param y: Target data to impute. :type y: pd.Series :returns: The original X, transformed y :rtype: (pd.DataFrame, pd.Series) .. py:class:: TimeSeriesFeaturizer(time_index=None, max_delay=2, gap=0, forecast_horizon=1, conf_level=0.05, rolling_window_size=0.25, delay_features=True, delay_target=True, random_seed=0, **kwargs) Transformer that delays input features and target variable for time series problems. This component uses an algorithm based on the autocorrelation values of the target variable to determine which lags to select from the set of all possible lags. The algorithm is based on the idea that the local maxima of the autocorrelation function indicate the lags that have the most impact on the present time. The algorithm computes the autocorrelation values and finds the local maxima, called "peaks", that are significant at the given conf_level. Since lags in the range [0, 10] tend to be predictive but not local maxima, the union of the peaks is taken with the significant lags in the range [0, 10]. At the end, only selected lags in the range [0, max_delay] are used. Parametrizing the algorithm by conf_level lets the AutoMLAlgorithm tune the set of lags chosen so that the chances of finding a good set of lags is higher. Using conf_level value of 1 selects all possible lags. :param time_index: Name of the column containing the datetime information used to order the data. Ignored. :type time_index: str :param max_delay: Maximum number of time units to delay each feature. Defaults to 2. :type max_delay: int :param forecast_horizon: The number of time periods the pipeline is expected to forecast. :type forecast_horizon: int :param conf_level: Float in range (0, 1] that determines the confidence interval size used to select which lags to compute from the set of [1, max_delay]. A delay of 1 will always be computed. If 1, selects all possible lags in the set of [1, max_delay], inclusive. :type conf_level: float :param rolling_window_size: Float in range (0, 1] that determines the size of the window used for rolling features. Size is computed as rolling_window_size * max_delay. :type rolling_window_size: float :param delay_features: Whether to delay the input features. Defaults to True. :type delay_features: bool :param delay_target: Whether to delay the target. Defaults to True. :type delay_target: bool :param gap: The number of time units between when the features are collected and when the target is collected. For example, if you are predicting the next time step's target, gap=1. This is only needed because when gap=0, we need to be sure to start the lagging of the target variable at 1. Defaults to 1. :type gap: int :param random_seed: Seed for the random number generator. This transformer performs the same regardless of the random seed provided. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - Real(0.001, 1.0), "rolling_window_size": Real(0.001, 1.0)}:type: {"conf_level" * - **modifies_features** - True * - **modifies_target** - False * - **name** - Time Series Featurizer * - **needs_fitting** - True * - **target_colname_prefix** - target_delay_{} * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.TimeSeriesFeaturizer.clone evalml.pipelines.components.transformers.TimeSeriesFeaturizer.default_parameters evalml.pipelines.components.transformers.TimeSeriesFeaturizer.describe evalml.pipelines.components.transformers.TimeSeriesFeaturizer.fit evalml.pipelines.components.transformers.TimeSeriesFeaturizer.fit_transform evalml.pipelines.components.transformers.TimeSeriesFeaturizer.load evalml.pipelines.components.transformers.TimeSeriesFeaturizer.parameters evalml.pipelines.components.transformers.TimeSeriesFeaturizer.save evalml.pipelines.components.transformers.TimeSeriesFeaturizer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the DelayFeatureTransformer. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises ValueError: if self.time_index is None .. py:method:: fit_transform(self, X, y=None) Fit the component and transform the input data. :param X: Data to transform. :type X: pd.DataFrame :param y: Target. :type y: pd.Series, or None :returns: Transformed X. :rtype: pd.DataFrame .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Computes the delayed values and rolling means for X and y. The chosen delays are determined by the autocorrelation function of the target variable. See the class docstring for more information on how they are chosen. If y is None, all possible lags are chosen. If y is not None, it will also compute the delayed values for the target variable. The rolling means for all numeric features in X and y, if y is numeric, are also returned. :param X: Data to transform. None is expected when only the target variable is being used. :type X: pd.DataFrame or None :param y: Target. :type y: pd.Series, or None :returns: Transformed X. No original features are returned. :rtype: pd.DataFrame .. py:class:: TimeSeriesImputer(categorical_impute_strategy='forwards_fill', numeric_impute_strategy='interpolate', target_impute_strategy='forwards_fill', random_seed=0, **kwargs) Imputes missing data according to a specified timeseries-specific imputation strategy. This Transformer should be used after the `TimeSeriesRegularizer` in order to impute the missing values that were added to X and y (if passed). :param categorical_impute_strategy: Impute strategy to use for string, object, boolean, categorical dtypes. Valid values include "backwards_fill" and "forwards_fill". Defaults to "forwards_fill". :type categorical_impute_strategy: string :param numeric_impute_strategy: Impute strategy to use for numeric columns. Valid values include "backwards_fill", "forwards_fill", and "interpolate". Defaults to "interpolate". :type numeric_impute_strategy: string :param target_impute_strategy: Impute strategy to use for the target column. Valid values include "backwards_fill", "forwards_fill", and "interpolate". Defaults to "forwards_fill". :type target_impute_strategy: string :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :raises ValueError: If categorical_impute_strategy, numeric_impute_strategy, or target_impute_strategy is not one of the valid values. **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - { "categorical_impute_strategy": ["backwards_fill", "forwards_fill"], "numeric_impute_strategy": ["backwards_fill", "forwards_fill", "interpolate"], "target_impute_strategy": ["backwards_fill", "forwards_fill", "interpolate"],} * - **modifies_features** - True * - **modifies_target** - True * - **name** - Time Series Imputer * - **training_only** - True **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.TimeSeriesImputer.clone evalml.pipelines.components.transformers.TimeSeriesImputer.default_parameters evalml.pipelines.components.transformers.TimeSeriesImputer.describe evalml.pipelines.components.transformers.TimeSeriesImputer.fit evalml.pipelines.components.transformers.TimeSeriesImputer.fit_transform evalml.pipelines.components.transformers.TimeSeriesImputer.load evalml.pipelines.components.transformers.TimeSeriesImputer.needs_fitting evalml.pipelines.components.transformers.TimeSeriesImputer.parameters evalml.pipelines.components.transformers.TimeSeriesImputer.save evalml.pipelines.components.transformers.TimeSeriesImputer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits imputer to data. 'None' values are converted to np.nan before imputation and are treated as the same. If a value is missing at the beginning or end of a column, that value will be imputed using backwards fill or forwards fill as necessary, respectively. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame, np.ndarray :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data X by imputing missing values using specified timeseries-specific strategies. 'None' values are converted to np.nan before imputation and are treated as the same. :param X: Data to transform. :type X: pd.DataFrame :param y: Optionally, target data to transform. :type y: pd.Series, optional :returns: Transformed X and y :rtype: pd.DataFrame .. py:class:: TimeSeriesRegularizer(time_index=None, frequency_payload=None, window_length=4, threshold=0.4, random_seed=0, **kwargs) Transformer that regularizes an inconsistently spaced datetime column. If X is passed in to fit/transform, the column `time_index` will be checked for an inferrable offset frequency. If the `time_index` column is perfectly inferrable then this Transformer will do nothing and return the original X and y. If X does not have a perfectly inferrable frequency but one can be estimated, then X and y will be reformatted based on the estimated frequency for `time_index`. In the original X and y passed: - Missing datetime values will be added and will have their corresponding columns in X and y set to None. - Duplicate datetime values will be dropped. - Extra datetime values will be dropped. - If it can be determined that a duplicate or extra value is misaligned, then it will be repositioned to take the place of a missing value. This Transformer should be used before the `TimeSeriesImputer` in order to impute the missing values that were added to X and y (if passed). :param time_index: Name of the column containing the datetime information used to order the data, required. Defaults to None. :type time_index: string :param frequency_payload: Payload returned from Woodwork's infer_frequency function where debug is True. Defaults to None. :type frequency_payload: tuple :param window_length: The size of the rolling window over which inference is conducted to determine the prevalence of uninferrable frequencies. :type window_length: int :param Lower values make this component more sensitive to recognizing numerous faulty datetime values. Defaults to 5.: :param threshold: The minimum percentage of windows that need to have been able to infer a frequency. Lower values make this component more :type threshold: float :param sensitive to recognizing numerous faulty datetime values. Defaults to 0.8.: :param random_seed: Seed for the random number generator. This transformer performs the same regardless of the random seed provided. :type random_seed: int :param Defaults to 0.: :raises ValueError: if the frequency_payload parameter has not been passed a tuple **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - True * - **name** - Time Series Regularizer * - **training_only** - True **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.TimeSeriesRegularizer.clone evalml.pipelines.components.transformers.TimeSeriesRegularizer.default_parameters evalml.pipelines.components.transformers.TimeSeriesRegularizer.describe evalml.pipelines.components.transformers.TimeSeriesRegularizer.fit evalml.pipelines.components.transformers.TimeSeriesRegularizer.fit_transform evalml.pipelines.components.transformers.TimeSeriesRegularizer.load evalml.pipelines.components.transformers.TimeSeriesRegularizer.needs_fitting evalml.pipelines.components.transformers.TimeSeriesRegularizer.parameters evalml.pipelines.components.transformers.TimeSeriesRegularizer.save evalml.pipelines.components.transformers.TimeSeriesRegularizer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits the TimeSeriesRegularizer. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self :raises ValueError: if self.time_index is None, if X and y have different lengths, if `time_index` in X does not have an offset frequency that can be estimated :raises TypeError: if the `time_index` column is not of type Datetime :raises KeyError: if the `time_index` column doesn't exist .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Regularizes a dataframe and target data to an inferrable offset frequency. A 'clean' X and y (if y was passed in) are created based on an inferrable offset frequency and matching datetime values with the original X and y are imputed into the clean X and y. Datetime values identified as misaligned are shifted into their appropriate position. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Data with an inferrable `time_index` offset frequency. :rtype: (pd.DataFrame, pd.Series) .. py:class:: Transformer(parameters=None, component_obj=None, random_seed=0, **kwargs) A component that may or may not need fitting that transforms data. These components are used before an estimator. To implement a new Transformer, define your own class which is a subclass of Transformer, including a name and a list of acceptable ranges for any parameters to be tuned during the automl search (hyperparameters). Define an `__init__` method which sets up any necessary state and objects. Make sure your `__init__` only uses standard keyword arguments and calls `super().__init__()` with a parameters dict. You may also override the `fit`, `transform`, `fit_transform` and other methods in this class if appropriate. To see some examples, check out the definitions of any Transformer component. :param parameters: Dictionary of parameters for the component. Defaults to None. :type parameters: dict :param component_obj: Third-party objects useful in component implementation. Defaults to None. :type component_obj: obj :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **modifies_features** - True * - **modifies_target** - False * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.Transformer.clone evalml.pipelines.components.transformers.Transformer.default_parameters evalml.pipelines.components.transformers.Transformer.describe evalml.pipelines.components.transformers.Transformer.fit evalml.pipelines.components.transformers.Transformer.fit_transform evalml.pipelines.components.transformers.Transformer.load evalml.pipelines.components.transformers.Transformer.name evalml.pipelines.components.transformers.Transformer.needs_fitting evalml.pipelines.components.transformers.Transformer.parameters evalml.pipelines.components.transformers.Transformer.save evalml.pipelines.components.transformers.Transformer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises MethodPropertyNotFoundError: If component does not have a fit method or a component_obj that implements fit. .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: name(cls) :property: Returns string name of this component. .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) :abstractmethod: Transforms data X. :param X: Data to transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:class:: Undersampler(sampling_ratio=0.25, sampling_ratio_dict=None, min_samples=100, min_percentage=0.1, random_seed=0, **kwargs) Initializes an undersampling transformer to downsample the majority classes in the dataset. This component is only run during training and not during predict. :param sampling_ratio: The smallest minority:majority ratio that is accepted as 'balanced'. For instance, a 1:4 ratio would be represented as 0.25, while a 1:1 ratio is 1.0. Must be between 0 and 1, inclusive. Defaults to 0.25. :type sampling_ratio: float :param sampling_ratio_dict: A dictionary specifying the desired balanced ratio for each target value. For instance, in a binary case where class 1 is the minority, we could specify: `sampling_ratio_dict={0: 0.5, 1: 1}`, which means we would undersample class 0 to have twice the number of samples as class 1 (minority:majority ratio = 0.5), and don't sample class 1. Overrides sampling_ratio if provided. Defaults to None. :type sampling_ratio_dict: dict :param min_samples: The minimum number of samples that we must have for any class, pre or post sampling. If a class must be downsampled, it will not be downsampled past this value. To determine severe imbalance, the minority class must occur less often than this and must have a class ratio below min_percentage. Must be greater than 0. Defaults to 100. :type min_samples: int :param min_percentage: The minimum percentage of the minimum class to total dataset that we tolerate, as long as it is above min_samples. If min_percentage and min_samples are not met, treat this as severely imbalanced, and we will not resample the data. Must be between 0 and 0.5, inclusive. Defaults to 0.1. :type min_percentage: float :param random_seed: The seed to use for random sampling. Defaults to 0. :type random_seed: int :raises ValueError: If sampling_ratio is not in the range (0, 1]. :raises ValueError: If min_sample is not greater than 0. :raises ValueError: If min_percentage is not between 0 and 0.5, inclusive. **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - True * - **name** - Undersampler * - **training_only** - True **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.Undersampler.clone evalml.pipelines.components.transformers.Undersampler.default_parameters evalml.pipelines.components.transformers.Undersampler.describe evalml.pipelines.components.transformers.Undersampler.fit evalml.pipelines.components.transformers.Undersampler.fit_resample evalml.pipelines.components.transformers.Undersampler.fit_transform evalml.pipelines.components.transformers.Undersampler.load evalml.pipelines.components.transformers.Undersampler.needs_fitting evalml.pipelines.components.transformers.Undersampler.parameters evalml.pipelines.components.transformers.Undersampler.save evalml.pipelines.components.transformers.Undersampler.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y) Fits the sampler to the data. :param X: Input features. :type X: pd.DataFrame :param y: Target. :type y: pd.Series :returns: self :raises ValueError: If y is None. .. py:method:: fit_resample(self, X, y) Resampling technique for this sampler. :param X: Training data to fit and resample. :type X: pd.DataFrame :param y: Training data targets to fit and resample. :type y: pd.Series :returns: Indices to keep for training data. :rtype: list .. py:method:: fit_transform(self, X, y) Fit and transform data using the sampler component. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: Transformed data. :rtype: (pd.DataFrame, pd.Series) .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms the input data by sampling the data. :param X: Training features. :type X: pd.DataFrame :param y: Target. :type y: pd.Series :returns: Transformed features and target. :rtype: pd.DataFrame, pd.Series .. py:class:: URLFeaturizer(random_seed=0, **kwargs) Transformer that can automatically extract features from URL. :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **hyperparameter_ranges** - {} * - **modifies_features** - True * - **modifies_target** - False * - **name** - URL Featurizer * - **training_only** - False **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.transformers.URLFeaturizer.clone evalml.pipelines.components.transformers.URLFeaturizer.default_parameters evalml.pipelines.components.transformers.URLFeaturizer.describe evalml.pipelines.components.transformers.URLFeaturizer.fit evalml.pipelines.components.transformers.URLFeaturizer.fit_transform evalml.pipelines.components.transformers.URLFeaturizer.load evalml.pipelines.components.transformers.URLFeaturizer.needs_fitting evalml.pipelines.components.transformers.URLFeaturizer.parameters evalml.pipelines.components.transformers.URLFeaturizer.save evalml.pipelines.components.transformers.URLFeaturizer.transform .. py:method:: clone(self) Constructs a new component with the same parameters and random state. :returns: A new instance of this component with identical parameters and random state. .. py:method:: default_parameters(cls) Returns the default parameters for this component. Our convention is that Component.default_parameters == Component().parameters. :returns: Default parameters for this component. :rtype: dict .. py:method:: describe(self, print_name=False, return_dict=False) Describe a component and its parameters. :param print_name: whether to print name of component :type print_name: bool, optional :param return_dict: whether to return description as dictionary in the format {"name": name, "parameters": parameters} :type return_dict: bool, optional :returns: Returns dictionary if return_dict is True, else None. :rtype: None or dict .. py:method:: fit(self, X, y=None) Fits component to data. :param X: The input training data of shape [n_samples, n_features] :type X: pd.DataFrame :param y: The target training data of length [n_samples] :type y: pd.Series, optional :returns: self :raises MethodPropertyNotFoundError: If component does not have a fit method or a component_obj that implements fit. .. py:method:: fit_transform(self, X, y=None) Fits on X and transforms X. :param X: Data to fit and transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series :returns: Transformed X. :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform. .. py:method:: load(file_path) :staticmethod: Loads component at file path. :param file_path: Location to load file. :type file_path: str :returns: ComponentBase object .. py:method:: needs_fitting(self) Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing. :returns: True. .. py:method:: parameters(self) :property: Returns the parameters which were used to initialize the component. .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves component at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: transform(self, X, y=None) Transforms data X. :param X: Data to transform. :type X: pd.DataFrame :param y: Target data. :type y: pd.Series, optional :returns: Transformed X :rtype: pd.DataFrame :raises MethodPropertyNotFoundError: If transformer does not have a transform method or a component_obj that implements transform.