Utils#

Utility methods.

Submodules#

Package Contents#

Classes Summary#

classproperty

Allows function to be accessed as a class level property.

Functions#

`convert_to_seconds`	Converts a string describing a length of time to its length in seconds.
`deprecate_arg`	Helper to raise warnings when a deprecated arg is used.
`downcast_nullable_types`	Downcasts IntegerNullable, BooleanNullable types to Double, Boolean in order to support certain estimators like ARIMA, CatBoost, and LightGBM.
`drop_rows_with_nans`	Drop rows that have any NaNs in all dataframes or series.
`get_importable_subclasses`	Get importable subclasses of a base class. Used to list all of our estimators, transformers, components and pipelines dynamically.
`get_logger`	Get the logger with the associated name.
`get_random_seed`	Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator. Or, if given an int, return that int.
`get_random_state`	Generates a numpy.random.RandomState instance using seed.
`get_time_index`	Determines the column in the given data that should be used as the time index.
`import_or_raise`	Attempts to import the requested library by name. If the import fails, raises an ImportError or warning.
`infer_feature_types`	Create a Woodwork structure from the given list, pandas, or numpy input, with specified types for columns. If a column's type is not specified, it will be inferred by Woodwork.
`is_all_numeric`	Checks if the given DataFrame contains only numeric values.
`jupyter_check`	Get whether or not the code is being run in a Ipython environment (such as Jupyter Notebook or Jupyter Lab).
`log_subtitle`	Log with a subtitle.
`log_title`	Log with a title.
`pad_with_nans`	Pad the beginning num_to_pad rows with nans.
`safe_repr`	Convert the given value into a string that can safely be used for repr.
`save_plot`	Saves fig to filepath if specified, or to a default location if not.

Attributes Summary#

SEED_BOUNDS

Contents#

class evalml.utils.classproperty(func)[source]#

Allows function to be accessed as a class level property.

Example: .. code-block:

class LogisticRegressionBinaryPipeline(PipelineBase):
    component_graph = ['Simple Imputer', 'Logistic Regression Classifier']

    @classproperty
    def summary(cls):
    summary = ""
    for component in cls.component_graph:
        component = handle_component_class(component)
        summary += component.name + " + "
    return summary

assert LogisticRegressionBinaryPipeline.summary == "Simple Imputer + Logistic Regression Classifier + "
assert LogisticRegressionBinaryPipeline().summary == "Simple Imputer + Logistic Regression Classifier + "

evalml.utils.convert_to_seconds(input_str)[source]#

Converts a string describing a length of time to its length in seconds.

Parameters: input_str (str) – The string to be parsed and converted to seconds.
Returns: Returns the library if importing succeeded.
Raises: AssertionError – If an invalid unit is used.

Examples

>>> assert convert_to_seconds("10 hr") == 36000.0
>>> assert convert_to_seconds("30 minutes") == 1800.0
>>> assert convert_to_seconds("2.5 min") == 150.0

evalml.utils.deprecate_arg(old_arg, new_arg, old_value, new_value)[source]#

Helper to raise warnings when a deprecated arg is used.

Parameters

old_arg (str) – Name of old/deprecated argument.
new_arg (str) – Name of new argument.
old_value (Any) – Value the user passed in for the old argument.
new_value (Any) – Value the user passed in for the new argument.

Returns

old_value if not None, else new_value

evalml.utils.downcast_nullable_types(data, ignore_null_cols=True)[source]#

Downcasts IntegerNullable, BooleanNullable types to Double, Boolean in order to support certain estimators like ARIMA, CatBoost, and LightGBM.

Parameters

data (pd.DataFrame, pd.Series) – Feature data.
ignore_null_cols (bool) – Whether to ignore downcasting columns with null values or not. Defaults to True.

Returns

DataFrame or Series initialized with logical type information where BooleanNullable are cast as Double.

Return type

data

evalml.utils.drop_rows_with_nans(*pd_data)[source]#

Drop rows that have any NaNs in all dataframes or series.

Parameters: *pd_data – sequence of pd.Series or pd.DataFrame or None
Returns: list of pd.DataFrame or pd.Series or None

evalml.utils.get_importable_subclasses(base_class, used_in_automl=True)[source]#

Get importable subclasses of a base class. Used to list all of our estimators, transformers, components and pipelines dynamically.

Parameters

base_class (abc.ABCMeta) – Base class to find all of the subclasses for.
used_in_automl – Not all components/pipelines/estimators are used in automl search. If True, only include those subclasses that are used in the search. This would mean excluding classes related to ExtraTrees, ElasticNet, and Baseline estimators.

Returns

List of subclasses.

evalml.utils.get_logger(name)[source]#

Get the logger with the associated name.

Parameters: name (str) – Name of the logger to get.
Returns: The logger object with the associated name.

evalml.utils.get_random_seed(random_state, min_bound=SEED_BOUNDS.min_bound, max_bound=SEED_BOUNDS.max_bound)[source]#

Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator. Or, if given an int, return that int.

To protect against invalid input to a particular library’s random number generator, if an int value is provided, and it is outside the bounds “[min_bound, max_bound)”, the value will be projected into the range between the min_bound (inclusive) and max_bound (exclusive) using modular arithmetic.

Parameters

random_state (int, numpy.random.RandomState) – random state
min_bound (None, int) – if not default of None, will be min bound when generating seed (inclusive). Must be less than max_bound.
max_bound (None, int) – if not default of None, will be max bound when generating seed (exclusive). Must be greater than min_bound.

Returns

Seed for random number generator

Return type

int

Raises

ValueError – If boundaries are not valid.

evalml.utils.get_random_state(seed)[source]#

Generates a numpy.random.RandomState instance using seed.

Parameters: seed (None, int, np.random.RandomState object) – seed to use to generate numpy.random.RandomState. Must be between SEED_BOUNDS.min_bound and SEED_BOUNDS.max_bound, inclusive.
Raises: ValueError – If the input seed is not within the acceptable range.
Returns: A numpy.random.RandomState instance.

evalml.utils.get_time_index(X: pandas.DataFrame, y: pandas.Series, time_index_name: str)[source]#: Determines the column in the given data that should be used as the time index.

evalml.utils.import_or_raise(library, error_msg=None, warning=False)[source]#

Attempts to import the requested library by name. If the import fails, raises an ImportError or warning.

Parameters

library (str) – The name of the library.
error_msg (str) – Error message to return if the import fails.
warning (bool) – If True, import_or_raise gives a warning instead of ImportError. Defaults to False.

Returns

Returns the library if importing succeeded.

Raises

ImportError – If attempting to import the library fails because the library is not installed.
Exception – If importing the library fails.

evalml.utils.infer_feature_types(data, feature_types=None)[source]#

Create a Woodwork structure from the given list, pandas, or numpy input, with specified types for columns. If a column’s type is not specified, it will be inferred by Woodwork.

Parameters

data (pd.DataFrame, pd.Series) – Input data to convert to a Woodwork data structure.
feature_types (string, ww.logical_type obj, dict, optional) – If data is a 2D structure, feature_types must be a dictionary mapping column names to the type of data represented in the column. If data is a 1D structure, then feature_types must be a Woodwork logical type or a string representing a Woodwork logical type (“Double”, “Integer”, “Boolean”, “Categorical”, “Datetime”, “NaturalLanguage”)

Returns

A Woodwork data structure where the data type of each column was either specified or inferred.

Raises

ValueError – If there is a mismatch between the dataframe and the woodwork schema.

evalml.utils.is_all_numeric(df)[source]#

Checks if the given DataFrame contains only numeric values.

Parameters: df (pd.DataFrame) – The DataFrame to check data types of.
Returns: True if all the columns are numeric and are not missing any values, False otherwise.

evalml.utils.jupyter_check()[source]#

Get whether or not the code is being run in a Ipython environment (such as Jupyter Notebook or Jupyter Lab).

Returns: True if Ipython, False otherwise.
Return type: boolean

evalml.utils.log_subtitle(logger, title, underline='=')[source]#: Log with a subtitle.

evalml.utils.log_title(logger, title)[source]#: Log with a title.

evalml.utils.pad_with_nans(pd_data, num_to_pad)[source]#

Pad the beginning num_to_pad rows with nans.

Parameters

pd_data (pd.DataFrame or pd.Series) – Data to pad.
num_to_pad (int) – Number of nans to pad.

Returns

pd.DataFrame or pd.Series

evalml.utils.safe_repr(value)[source]#

Convert the given value into a string that can safely be used for repr.

Parameters: value – The item to convert
Returns: String representation of the value

evalml.utils.save_plot(fig, filepath=None, format='png', interactive=False, return_filepath=False)[source]#

Saves fig to filepath if specified, or to a default location if not.

Parameters

fig (Figure) – Figure to be saved.
filepath (str or Path, optional) – Location to save file. Default is with filename “test_plot”.
format (str) – Extension for figure to be saved as. Ignored if interactive is True and fig is of type plotly.Figure. Defaults to ‘png’.
interactive (bool, optional) – If True and fig is of type plotly.Figure, saves the fig as interactive instead of static, and format will be set to ‘html’. Defaults to False.
return_filepath (bool, optional) – Whether to return the final filepath the image is saved to. Defaults to False.

Returns

String representing the final filepath the image was saved to if return_filepath is set to True. Defaults to None.

evalml.utils.SEED_BOUNDS#