evalml.data_checks.HighlyNullDataCheck.validate

HighlyNullDataCheck.validate(X, y=None)[source]

Checks if there are any highly-null columns in the input.

Parameters
  • X (ww.DataTable, pd.DataFrame, np.ndarray) – Features

  • y (ww.DataColumn, pd.Series, np.ndarray) – Ignored.

Returns

dict with a DataCheckWarning if there are any highly-null columns.

Return type

dict

Example

>>> import pandas as pd
>>> df = pd.DataFrame({
...    'lots_of_null': [None, None, None, None, 5],
...    'no_null': [1, 2, 3, 4, 5]
... })
>>> null_check = HighlyNullDataCheck(pct_null_threshold=0.8)
>>> assert null_check.validate(df) == {"errors": [],                                                   "warnings": [{"message": "Column 'lots_of_null' is 80.0% or more null",                                                                 "data_check_name": "HighlyNullDataCheck",                                                                 "level": "warning",                                                                 "code": "HIGHLY_NULL",                                                                 "details": {"column": "lots_of_null"}}],                                                    "actions": [{"code": "DROP_COL",                                                                 "metadata": {"column": "lots_of_null"}}]}