gen_utils
================================

.. py:module:: evalml.utils.gen_utils

.. autoapi-nested-parse::

   General utility methods.


Module Contents
---------------

Classes Summary
~~~~~~~~~~~~~~~

.. autoapisummary::

   evalml.utils.gen_utils.classproperty


Functions
~~~~~~~~~

.. autoapisummary::
   :nosignatures:

   evalml.utils.gen_utils.are_datasets_separated_by_gap_time_index
   evalml.utils.gen_utils.are_ts_parameters_valid_for_split
   evalml.utils.gen_utils.contains_all_ts_parameters
   evalml.utils.gen_utils.convert_to_seconds
   evalml.utils.gen_utils.deprecate_arg
   evalml.utils.gen_utils.drop_rows_with_nans
   evalml.utils.gen_utils.get_importable_subclasses
   evalml.utils.gen_utils.get_random_seed
   evalml.utils.gen_utils.get_random_state
   evalml.utils.gen_utils.get_time_index
   evalml.utils.gen_utils.import_or_raise
   evalml.utils.gen_utils.is_all_numeric
   evalml.utils.gen_utils.is_categorical_actually_boolean
   evalml.utils.gen_utils.jupyter_check
   evalml.utils.gen_utils.pad_with_nans
   evalml.utils.gen_utils.safe_repr
   evalml.utils.gen_utils.save_plot
   evalml.utils.gen_utils.validate_holdout_datasets


Attributes Summary
~~~~~~~~~~~~~~~~~~~

.. autoapisummary::

   evalml.utils.gen_utils.logger
   evalml.utils.gen_utils.SEED_BOUNDS


Contents
~~~~~~~~~~~~~~~~~~~
.. py:function:: are_datasets_separated_by_gap_time_index(train, test, pipeline_params)

   Determine if the train and test datasets are separated by gap number of units using the time_index.

   This will be true when users are predicting on unseen data but not during cross
   validation since the target is known.

   :param train: Training data.
   :type train: pd.DataFrame
   :param test: Data of shape [n_samples, n_features].
   :type test: pd.DataFrame
   :param pipeline_params: Dictionary of time series parameters.
   :type pipeline_params: dict

   :returns: True if the difference in time units is equal to gap + 1.
   :rtype: bool


.. py:function:: are_ts_parameters_valid_for_split(gap, max_delay, forecast_horizon, n_obs, n_splits)

   Validates the time series parameters in problem_configuration are compatible with split sizes.

   :param gap: gap value.
   :type gap: int
   :param max_delay: max_delay value.
   :type max_delay: int
   :param forecast_horizon: forecast_horizon value.
   :type forecast_horizon: int
   :param n_obs: Number of observations in the dataset.
   :type n_obs: int
   :param n_splits: Number of cross validation splits.
   :type n_splits: int

   :returns:

             TsParameterValidationResult - named tuple with four fields
                 is_valid (bool): True if parameters are valid.
                 msg (str): Contains error message to display. Empty if is_valid.
                 smallest_split_size (int): Smallest split size given n_obs and n_splits.
                 max_window_size (int): Max window size given gap, max_delay, forecast_horizon.


.. py:class:: classproperty(func)

   Allows function to be accessed as a class level property.

   Example:
   .. code-block::

       class LogisticRegressionBinaryPipeline(PipelineBase):
           component_graph = ['Simple Imputer', 'Logistic Regression Classifier']

           @classproperty
           def summary(cls):
           summary = ""
           for component in cls.component_graph:
               component = handle_component_class(component)
               summary += component.name + " + "
           return summary

       assert LogisticRegressionBinaryPipeline.summary == "Simple Imputer + Logistic Regression Classifier + "
       assert LogisticRegressionBinaryPipeline().summary == "Simple Imputer + Logistic Regression Classifier + "


.. py:function:: contains_all_ts_parameters(problem_configuration)

   Validates that the problem configuration contains all required keys.

   :param problem_configuration: Problem configuration.
   :type problem_configuration: dict

   :returns:

             True if the configuration contains all parameters. If False, msg is a non-empty
                 string with error message.
   :rtype: bool, str


.. py:function:: convert_to_seconds(input_str)

   Converts a string describing a length of time to its length in seconds.

   :param input_str: The string to be parsed and converted to seconds.
   :type input_str: str

   :returns: Returns the library if importing succeeded.

   :raises AssertionError: If an invalid unit is used.

   .. rubric:: Examples

   >>> assert convert_to_seconds("10 hr") == 36000.0
   >>> assert convert_to_seconds("30 minutes") == 1800.0
   >>> assert convert_to_seconds("2.5 min") == 150.0


.. py:function:: deprecate_arg(old_arg, new_arg, old_value, new_value)

   Helper to raise warnings when a deprecated arg is used.

   :param old_arg: Name of old/deprecated argument.
   :type old_arg: str
   :param new_arg: Name of new argument.
   :type new_arg: str
   :param old_value: Value the user passed in for the old argument.
   :type old_value: Any
   :param new_value: Value the user passed in for the new argument.
   :type new_value: Any

   :returns: old_value if not None, else new_value


.. py:function:: drop_rows_with_nans(*pd_data)

   Drop rows that have any NaNs in all dataframes or series.

   :param \*pd_data: sequence of pd.Series or pd.DataFrame or None

   :returns: list of pd.DataFrame or pd.Series or None


.. py:function:: get_importable_subclasses(base_class, used_in_automl=True)

   Get importable subclasses of a base class. Used to list all of our estimators, transformers, components and pipelines dynamically.

   :param base_class: Base class to find all of the subclasses for.
   :type base_class: abc.ABCMeta
   :param used_in_automl: Not all components/pipelines/estimators are used in automl search. If True,
                          only include those subclasses that are used in the search. This would mean excluding classes related to
                          ExtraTrees, ElasticNet, and Baseline estimators.

   :returns: List of subclasses.


.. py:function:: get_random_seed(random_state, min_bound=SEED_BOUNDS.min_bound, max_bound=SEED_BOUNDS.max_bound)

   Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator. Or, if given an int, return that int.

   To protect against invalid input to a particular library's random number generator, if an int value is provided, and it is outside the bounds "[min_bound, max_bound)", the value will be projected into the range between the min_bound (inclusive) and max_bound (exclusive) using modular arithmetic.

   :param random_state: random state
   :type random_state: int, numpy.random.RandomState
   :param min_bound: if not default of None, will be min bound when generating seed (inclusive). Must be less than max_bound.
   :type min_bound: None, int
   :param max_bound: if not default of None, will be max bound when generating seed (exclusive). Must be greater than min_bound.
   :type max_bound: None, int

   :returns: Seed for random number generator
   :rtype: int

   :raises ValueError: If boundaries are not valid.


.. py:function:: get_random_state(seed)

   Generates a numpy.random.RandomState instance using seed.

   :param seed: seed to use to generate numpy.random.RandomState. Must be between SEED_BOUNDS.min_bound and SEED_BOUNDS.max_bound, inclusive.
   :type seed: None, int, np.random.RandomState object

   :raises ValueError: If the input seed is not within the acceptable range.

   :returns: A numpy.random.RandomState instance.


.. py:function:: get_time_index(X: pandas.DataFrame, y: pandas.Series, time_index_name: str)

   Determines the column in the given data that should be used as the time index.


.. py:function:: import_or_raise(library, error_msg=None, warning=False)

   Attempts to import the requested library by name. If the import fails, raises an ImportError or warning.

   :param library: The name of the library.
   :type library: str
   :param error_msg: Error message to return if the import fails.
   :type error_msg: str
   :param warning: If True, import_or_raise gives a warning instead of ImportError. Defaults to False.
   :type warning: bool

   :returns: Returns the library if importing succeeded.

   :raises ImportError: If attempting to import the library fails because the library is not installed.
   :raises Exception: If importing the library fails.


.. py:function:: is_all_numeric(df)

   Checks if the given DataFrame contains only numeric values.

   :param df: The DataFrame to check data types of.
   :type df: pd.DataFrame

   :returns: True if all the columns are numeric and are not missing any values, False otherwise.


.. py:function:: is_categorical_actually_boolean(df, df_col)

   Function to identify columns of a dataframe that contain True, False and null type.

   The function is intended to be applied to columns that are identified as Categorical
   by the Imputer/SimpleImputer.

   :param df: Pandas dataframe with data.
   :type df: pandas.DataFrame
   :param df_col: The column to identify as basically a nullable Boolean.
   :type df_col: str

   :returns: Whether the column contains True, False and a null type.
   :rtype: bool


.. py:function:: jupyter_check()

   Get whether or not the code is being run in a Ipython environment (such as Jupyter Notebook or Jupyter Lab).

   :returns: True if Ipython, False otherwise.
   :rtype: boolean


.. py:data:: logger
   

.. py:function:: pad_with_nans(pd_data, num_to_pad)

   Pad the beginning num_to_pad rows with nans.

   :param pd_data: Data to pad.
   :type pd_data: pd.DataFrame or pd.Series
   :param num_to_pad: Number of nans to pad.
   :type num_to_pad: int

   :returns: pd.DataFrame or pd.Series


.. py:function:: safe_repr(value)

   Convert the given value into a string that can safely be used for repr.

   :param value: The item to convert

   :returns: String representation of the value


.. py:function:: save_plot(fig, filepath=None, format='png', interactive=False, return_filepath=False)

   Saves fig to filepath if specified, or to a default location if not.

   :param fig: Figure to be saved.
   :type fig: Figure
   :param filepath: Location to save file. Default is with filename "test_plot".
   :type filepath: str or Path, optional
   :param format: Extension for figure to be saved as. Ignored if interactive is True and fig
                  is of type plotly.Figure. Defaults to 'png'.
   :type format: str
   :param interactive: If True and fig is of type plotly.Figure, saves the fig as interactive
                       instead of static, and format will be set to 'html'. Defaults to False.
   :type interactive: bool, optional
   :param return_filepath: Whether to return the final filepath the image is saved to. Defaults to False.
   :type return_filepath: bool, optional

   :returns: String representing the final filepath the image was saved to if return_filepath is set to True.
             Defaults to None.


.. py:data:: SEED_BOUNDS
   

.. py:function:: validate_holdout_datasets(X, X_train, pipeline_params)

   Validate the holdout datasets match our expectations.

   This function is run before calling predict in a time series pipeline. It verifies that X (the holdout set)
   is gap units away from the training set and is less than or equal to the forecast_horizon.

   :param X: Data of shape [n_samples, n_features].
   :type X: pd.DataFrame
   :param X_train: Training data.
   :type X_train: pd.DataFrame
   :param pipeline_params: Dictionary of time series parameters with gap, forecast_horizon, and time_index being required.
   :type pipeline_params: dict

   :returns:

             TSHoldoutValidationResult - named tuple with three fields
                 is_valid (bool): True if holdout data is valid.
                 error_messages (list): List of error messages to display. Empty if is_valid.
                 error_codes (list): List of error codes to display. Empty if is_valid.