evalml.guardrails.detect_outliers

evalml.guardrails.detect_outliers(X, random_state=0)[source]

Checks if there are any outliers in a dataframe by using first Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies. Indices with score anomalies are considered outliers.

Parameters

X (pd.DataFrame) – features

Returns

A set of indices that may have outlier data.

Example

>>> df = pd.DataFrame({
...     'x': [1, 2, 3, 40, 5],
...     'y': [6, 7, 8, 990, 10],
...     'z': [-1, -2, -3, -1201, -4]
... })
>>> detect_outliers(df)
[3]