outliers_data_check

Module Contents

Classes Summary

OutliersDataCheck

Checks if there are any outliers in input data by using IQR to determine score anomalies. Columns with score anomalies are considered to contain outliers.

Contents

class evalml.data_checks.outliers_data_check.OutliersDataCheck[source]

Checks if there are any outliers in input data by using IQR to determine score anomalies. Columns with score anomalies are considered to contain outliers.

Methods

name

Returns a name describing the data check.

validate

Checks if there are any outliers in a dataframe by using IQR to determine column anomalies. Column with anomalies are considered to contain outliers.

name(cls)

Returns a name describing the data check.

validate(self, X, y=None)[source]

Checks if there are any outliers in a dataframe by using IQR to determine column anomalies. Column with anomalies are considered to contain outliers.

Parameters
  • X (pd.DataFrame, np.ndarray) – Features

  • y (pd.Series, np.ndarray) – Ignored.

Returns

A dictionary with warnings if any columns have outliers.

Return type

dict

Example

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'x': [1, 2, 3, 4, 5],
...     'y': [6, 7, 8, 9, 10],
...     'z': [-1, -2, -3, -1201, -4]
... })
>>> outliers_check = OutliersDataCheck()
>>> assert outliers_check.validate(df) == {"warnings": [{"message": "Column(s) 'z' are likely to have outlier data.",                                                                     "data_check_name": "OutliersDataCheck",                                                                     "level": "warning",                                                                     "code": "HAS_OUTLIERS",                                                                     "details": {"columns": ["z"]}}],                                                       "errors": [],                                                       "actions": []}