no_variance_data_check

Data check that checks if the target or any of the features have no variance.

Module Contents

Classes Summary

NoVarianceDataCheck

Check if the target or any of the features have no variance.

Contents

class evalml.data_checks.no_variance_data_check.NoVarianceDataCheck(count_nan_as_value=False)[source]

Check if the target or any of the features have no variance.

Parameters

count_nan_as_value (bool) – If True, missing values will be counted as their own unique value. Additionally, if true, will return a DataCheckWarning instead of an error if the feature has mostly missing data and only one unique value. Defaults to False.

Methods

name

Return a name describing the data check.

validate

Check if the target or any of the features have no variance (1 unique value).

name(cls)

Return a name describing the data check.

validate(self, X, y)[source]

Check if the target or any of the features have no variance (1 unique value).

Parameters
  • X (pd.DataFrame, np.ndarray) – The input features.

  • y (pd.Series, np.ndarray) – The target data.

Returns

A dict of warnings/errors corresponding to features or target with no variance.

Return type

dict

Examples

>>> import pandas as pd
...
>>> X = pd.DataFrame([2, 2, 2, 2, 2, 2, 2, 2], columns=["First_Column"])
>>> y = pd.Series([1, 1, 1, 1, 1, 1, 1, 1])
...
>>> novar_dc = NoVarianceDataCheck()
>>> assert novar_dc.validate(X, y) == {
...     'warnings': [],
...     'errors': [{'message': "'First_Column' has 1 unique value.",
...                 'data_check_name': 'NoVarianceDataCheck',
...                 'level': 'error',
...                 'details': {'columns': ['First_Column'], 'rows': None},
...                 'code': 'NO_VARIANCE'},
...                {'message': 'Y has 1 unique value.',
...                 'data_check_name': 'NoVarianceDataCheck',
...                 'level': 'error',
...                 'details': {'columns': ['Y'], 'rows': None},
...                 'code': 'NO_VARIANCE'}],
...     'actions': [{'code': 'DROP_COL',
...                  'metadata': {'columns': ["First_Column"], 'rows': None}}]}
...
...
>>> X["First_Column"] = [2, 2, 2, 3, 3, 3, None, None]
>>> y = pd.Series([1, 1, 1, 2, 2, 2, None, None])
>>> assert novar_dc.validate(X, y) == {'warnings': [], 'errors': [], 'actions': []}
...
...
>>> y = pd.Series([None] * 7)
>>> assert novar_dc.validate(X, y) == {
...     'warnings': [],
...     'errors': [{'message': 'Y has 0 unique values.',
...                 'data_check_name': 'NoVarianceDataCheck',
...                 'level': 'error',
...                 'details': {'columns': ['Y'], 'rows': None},
...                 'code': 'NO_VARIANCE'}],
...     'actions': []}
...
...
>>> X["First_Column"] = [2, 2, 2, 2, None, None, None, None]
>>> y = pd.Series([1, 1, 1, 1, None, None, None, None])
>>> assert novar_dc.validate(X, y) == {
...     'warnings': [],
...     'errors': [{'message': "'First_Column' has 1 unique value.",
...                 'data_check_name': 'NoVarianceDataCheck',
...                 'level': 'error',
...                 'details': {'columns': ['First_Column'], 'rows': None},
...                 'code': 'NO_VARIANCE'},
...                {'message': 'Y has 1 unique value.',
...                 'data_check_name': 'NoVarianceDataCheck',
...                 'level': 'error',
...                 'details': {'columns': ['Y'], 'rows': None},
...                 'code': 'NO_VARIANCE'}],
...     'actions': [{'code': 'DROP_COL',
...                  'metadata': {'columns': ['First_Column'], 'rows': None}}]}
...
...
>>> novar_dc = NoVarianceDataCheck(count_nan_as_value=True)
>>> assert novar_dc.validate(X, y) == {
...     'warnings': [{'message': "'First_Column' has two unique values including nulls. Consider encoding the nulls for this column to be useful for machine learning.",
...                   'data_check_name': 'NoVarianceDataCheck',
...                   'level': 'warning',
...                   'details': {'columns': ['First_Column'], 'rows': None},
...                   'code': 'NO_VARIANCE_WITH_NULL'},
...                  {'message': 'Y has two unique values including nulls. Consider encoding the nulls for this column to be useful for machine learning.',
...                   'data_check_name': 'NoVarianceDataCheck',
...                   'level': 'warning',
...                   'details': {'columns': ['Y'], 'rows': None},
...                   'code': 'NO_VARIANCE_WITH_NULL'}],
...     'errors': [],
...     'actions': [{'code': 'DROP_COL',
...                  'metadata': {'columns': ['First_Column'], 'rows': None}}]}