outliers_data_check¶
Data check that checks if there are any outliers in input data by using IQR to determine score anomalies.
Module Contents¶
Classes Summary¶
Checks if there are any outliers in input data by using IQR to determine score anomalies. |
Contents¶
-
class
evalml.data_checks.outliers_data_check.
OutliersDataCheck
[source]¶ Checks if there are any outliers in input data by using IQR to determine score anomalies.
Columns with score anomalies are considered to contain outliers.
Methods
Return a name describing the data check.
Check if there are any outliers in a dataframe by using IQR to determine column anomalies. Column with anomalies are considered to contain outliers.
-
name
(cls)¶ Return a name describing the data check.
-
validate
(self, X, y=None)[source]¶ Check if there are any outliers in a dataframe by using IQR to determine column anomalies. Column with anomalies are considered to contain outliers.
- Parameters
X (pd.DataFrame, np.ndarray) – Input features.
y (pd.Series, np.ndarray) – Ignored. Defaults to None.
- Returns
A dictionary with warnings if any columns have outliers.
- Return type
dict
Example
>>> import pandas as pd >>> df = pd.DataFrame({ ... 'x': [1, 2, 3, 4, 5], ... 'y': [6, 7, 8, 9, 10], ... 'z': [-1, -2, -3, -1201, -4] ... }) >>> outliers_check = OutliersDataCheck() >>> assert outliers_check.validate(df) == { ... "warnings": [{"message": "Column(s) 'z' are likely to have outlier data.", ... "data_check_name": "OutliersDataCheck", ... "level": "warning", ... "code": "HAS_OUTLIERS", ... "details": {"columns": ["z"]}}], ... "errors": [], ... "actions": []}
-