id_columns_data_check¶

Module Contents¶

Classes Summary¶

IDColumnsDataCheck

Check if any of the features are likely to be ID columns.

Contents¶

class evalml.data_checks.id_columns_data_check.IDColumnsDataCheck(id_threshold=1.0)[source]¶

Check if any of the features are likely to be ID columns.

Parameters: id_threshold (float) – The probability threshold to be considered an ID column. Defaults to 1.0.

Methods

`name`	Returns a name describing the data check.
`validate`	Check if any of the features are likely to be ID columns. Currently performs these simple checks:

name(cls)¶: Returns a name describing the data check.

validate(self, X, y=None)[source]¶

Check if any of the features are likely to be ID columns. Currently performs these simple checks:

column name is “id”

column name ends in “_id”

column contains all unique values (and is categorical / integer type)

Parameters: X (pd.DataFrame, np.ndarray) – The input features to check
Returns: A dictionary of features with column name or index and their probability of being ID columns
Return type: dict

Example

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'df_id': [0, 1, 2, 3, 4],
...     'x': [10, 42, 31, 51, 61],
...     'y': [42, 54, 12, 64, 12]
... })
>>> id_col_check = IDColumnsDataCheck()
>>> assert id_col_check.validate(df) == {"errors": [],                                                     "warnings": [{"message": "Column 'df_id' is 100.0% or more likely to be an ID column",                                                                   "data_check_name": "IDColumnsDataCheck",                                                                   "level": "warning",                                                                   "code": "HAS_ID_COLUMN",                                                                   "details": {"column": "df_id"}}],                                                     "actions": [{"code": "DROP_COL",                                                                 "metadata": {"column": "df_id"}}]}

highly_null_data_check invalid_targets_data_check