evalml.data_checks.TargetLeakageDataCheck.validate

TargetLeakageDataCheck.validate(X, y)[source]

Check if any of the features are highly correlated with the target.

Currently only supports binary and numeric targets and features.

Parameters
  • X (pd.DataFrame) – The input features to check

  • y (pd.Series) – The target data

Returns

List with a DataCheckWarning if target leakage is detected.

Return type

list (DataCheckWarning)

Example

>>> X = pd.DataFrame({
...    'leak': [10, 42, 31, 51, 61],
...    'x': [42, 54, 12, 64, 12],
...    'y': [12, 5, 13, 74, 24],
... })
>>> y = pd.Series([10, 42, 31, 51, 40])
>>> target_leakage_check = TargetLeakageDataCheck(pct_corr_threshold=0.8)
>>> assert target_leakage_check.validate(X, y) == [DataCheckWarning("Column 'leak' is 80.0% or more correlated with the target", "TargetLeakageDataCheck")]