natural_language_nan_data_check

Module Contents

Classes Summary

NaturalLanguageNaNDataCheck

Checks each column in the input for natural language features and will issue an error if NaN values are present.

Attributes Summary

error_contains_nan

Contents

evalml.data_checks.natural_language_nan_data_check.error_contains_nan = Input natural language column(s) ({}) contains NaN values. Please impute NaN values or drop...
class evalml.data_checks.natural_language_nan_data_check.NaturalLanguageNaNDataCheck[source]

Checks each column in the input for natural language features and will issue an error if NaN values are present.

Methods

name

Returns a name describing the data check.

validate

Checks if any natural language columns contain NaN values.

name(cls)

Returns a name describing the data check.

validate(self, X, y=None)[source]

Checks if any natural language columns contain NaN values.

Parameters
  • X (pd.DataFrame, np.ndarray) – Features.

  • y (pd.Series, np.ndarray) – Ignored. Defaults to None.

Returns

dict with a DataCheckError if NaN values are present in natural language columns.

Return type

dict

Example

>>> import pandas as pd
>>> import woodwork as ww
>>> import numpy as np
>>> data = pd.DataFrame()
>>> data['A'] = [None, "string_that_is_long_enough_for_natural_language"]
>>> data['B'] = ['string_that_is_long_enough_for_natural_language', 'string_that_is_long_enough_for_natural_language']
>>> data['C'] = np.random.randint(0, 3, size=len(data))
>>> data.ww.init(logical_types={'A': 'NaturalLanguage', 'B': 'NaturalLanguage'})
>>> nl_nan_check = NaturalLanguageNaNDataCheck()
>>> assert nl_nan_check.validate(data) == {
...        "warnings": [],
...        "actions": [],
...        "errors": [DataCheckError(message='Input natural language column(s) (A) contains NaN values. Please impute NaN values or drop these rows or columns.',
...                      data_check_name=NaturalLanguageNaNDataCheck.name,
...                      message_code=DataCheckMessageCode.NATURAL_LANGUAGE_HAS_NAN,
...                      details={"columns": 'A'}).to_dict()]
...    }