evalml.data_checks.OutliersDataCheck.validate

OutliersDataCheck.validate(X, y=None)[source]

Checks if there are any outliers in a dataframe by using IQR to determine column anomalies. Column with anomalies are considered to contain outliers.

Parameters
  • X (pd.DataFrame, np.ndarray) – Features

  • y (pd.Series, np.ndarray) – Ignored.

Returns

A dictionary with warnings if any columns have outliers.

Return type

dict

Example

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'x': [1, 2, 3, 4, 5],
...     'y': [6, 7, 8, 9, 10],
...     'z': [-1, -2, -3, -1201, -4]
... })
>>> outliers_check = OutliersDataCheck()
>>> assert outliers_check.validate(df) == {"warnings": [{"message": "Column(s) 'z' are likely to have outlier data.",                                                                     "data_check_name": "OutliersDataCheck",                                                                     "level": "warning",                                                                     "code": "HAS_OUTLIERS",                                                                     "details": {"columns": ["z"]}}],                                                       "errors": [],                                                       "actions": []}