evalml.guardrails.detect_label_leakage¶

evalml.guardrails.detect_label_leakage(X, y, threshold=0.95)[source]¶

Check if any of the features are highly correlated with the target.

Currently only supports binary and numeric targets and features

Parameters

X (pd.DataFrame) – The input features to check
y (pd.Series) – the labels
threshold (float) – the correlation threshold to be considered leakage. Defaults to .95

Returns

leakage, dictionary of features with leakage and corresponding threshold

Example

>>> X = pd.DataFrame({
...    'leak': [10, 42, 31, 51, 61],
...    'x': [42, 54, 12, 64, 12],
...    'y': [12, 5, 13, 74, 24],
... })
>>> y = pd.Series([10, 42, 31, 51, 40])
>>> detect_label_leakage(X, y, threshold=0.8)
{'leak': 0.8827072320669518}