outliers_data_check¶
Module Contents¶
Classes Summary¶
Checks if there are any outliers in input data by using IQR to determine score anomalies. Columns with score anomalies are considered to contain outliers. |
Contents¶
-
class
evalml.data_checks.outliers_data_check.
OutliersDataCheck
[source]¶ Checks if there are any outliers in input data by using IQR to determine score anomalies. Columns with score anomalies are considered to contain outliers.
Methods
Returns a name describing the data check.
Checks if there are any outliers in a dataframe by using IQR to determine column anomalies. Column with anomalies are considered to contain outliers.
-
name
(cls)¶ Returns a name describing the data check.
-
validate
(self, X, y=None)[source]¶ Checks if there are any outliers in a dataframe by using IQR to determine column anomalies. Column with anomalies are considered to contain outliers.
- Parameters
X (pd.DataFrame, np.ndarray) – Features
y (pd.Series, np.ndarray) – Ignored.
- Returns
A dictionary with warnings if any columns have outliers.
- Return type
dict
Example
>>> import pandas as pd >>> df = pd.DataFrame({ ... 'x': [1, 2, 3, 4, 5], ... 'y': [6, 7, 8, 9, 10], ... 'z': [-1, -2, -3, -1201, -4] ... }) >>> outliers_check = OutliersDataCheck() >>> assert outliers_check.validate(df) == {"warnings": [{"message": "Column(s) 'z' are likely to have outlier data.", "data_check_name": "OutliersDataCheck", "level": "warning", "code": "HAS_OUTLIERS", "details": {"columns": ["z"]}}], "errors": [], "actions": []}
-