evalml.data_checks.HighlyNullDataCheck.validate

HighlyNullDataCheck.validate(X, y=None)[source]

Checks if there are any highly-null columns in the input.

Parameters
  • X (pd.DataFrame, pd.Series, np.array, list) – features

  • y – Ignored.

Returns

list with a DataCheckWarning if there are any highly-null columns.

Return type

list (DataCheckWarning)

Example

>>> df = pd.DataFrame({
...    'lots_of_null': [None, None, None, None, 5],
...    'no_null': [1, 2, 3, 4, 5]
... })
>>> null_check = HighlyNullDataCheck(pct_null_threshold=0.8)
>>> assert null_check.validate(df) == [DataCheckWarning("Column 'lots_of_null' is 80.0% or more null", "HighlyNullDataCheck")]