natural_language_nan_data_check¶
Module Contents¶
Classes Summary¶
Checks each column in the input for natural language features and will issue an error if NaN values are present. |
Attributes Summary¶
Contents¶
-
evalml.data_checks.natural_language_nan_data_check.
error_contains_nan
= Input natural language column(s) ({}) contains NaN values. Please impute NaN values or drop...¶
-
class
evalml.data_checks.natural_language_nan_data_check.
NaturalLanguageNaNDataCheck
[source]¶ Checks each column in the input for natural language features and will issue an error if NaN values are present.
Methods
Returns a name describing the data check.
Checks if any natural language columns contain NaN values.
-
name
(cls)¶ Returns a name describing the data check.
-
validate
(self, X, y=None)[source]¶ Checks if any natural language columns contain NaN values.
- Parameters
X (pd.DataFrame, np.ndarray) – Features.
y (pd.Series, np.ndarray) – Ignored. Defaults to None.
- Returns
dict with a DataCheckError if NaN values are present in natural language columns.
- Return type
dict
Example
>>> import pandas as pd >>> import woodwork as ww >>> import numpy as np >>> data = pd.DataFrame() >>> data['A'] = [None, "string_that_is_long_enough_for_natural_language"] >>> data['B'] = ['string_that_is_long_enough_for_natural_language', 'string_that_is_long_enough_for_natural_language'] >>> data['C'] = np.random.randint(0, 3, size=len(data)) >>> data.ww.init(logical_types={'A': 'NaturalLanguage', 'B': 'NaturalLanguage'}) >>> nl_nan_check = NaturalLanguageNaNDataCheck() >>> assert nl_nan_check.validate(data) == { ... "warnings": [], ... "actions": [], ... "errors": [DataCheckError(message='Input natural language column(s) (A) contains NaN values. Please impute NaN values or drop these rows or columns.', ... data_check_name=NaturalLanguageNaNDataCheck.name, ... message_code=DataCheckMessageCode.NATURAL_LANGUAGE_HAS_NAN, ... details={"columns": 'A'}).to_dict()] ... }
-