evalml.guardrails.detect_id_columns

evalml.guardrails.detect_id_columns(X, threshold=1.0)[source]

Check if any of the features are ID columns. Currently performs these simple checks:

  • column name is “id”

  • column name ends in “_id”

  • column contains all unique values (and is not float / boolean)

Parameters
  • X (pd.DataFrame) – The input features to check

  • threshold (float) – the probability threshold to be considered an ID column. Defaults to 1.0

Returns

A dictionary of features with column name or index and their probability of being ID columns

Example

>>> df = pd.DataFrame({
...     'df_id': [0, 1, 2, 3, 4],
...     'x': [10, 42, 31, 51, 61],
...     'y': [42, 54, 12, 64, 12]
... })
>>> detect_id_columns(df)
{'df_id': 1.0}