evalml.guardrails.detect_highly_null

evalml.guardrails.detect_highly_null(X, percent_threshold=0.95)[source]

Checks if there are any highly-null columns in a dataframe.

Parameters
  • X (pd.DataFrame) – features

  • percent_threshold (float) – Require that percentage of null values to be considered “highly-null”, defaults to .95

Returns

A dictionary of features with column name or index and their percentage of null values

Example

>>> df = pd.DataFrame({
...    'lots_of_null': [None, None, None, None, 5],
...    'no_null': [1, 2, 3, 4, 5]
... })
>>> detect_highly_null(df, percent_threshold=0.8)
{'lots_of_null': 0.8}