multicollinearity_data_check¶
Data check to check if any set features are likely to be multicollinear.
Module Contents¶
Classes Summary¶
Check if any set features are likely to be multicollinear. |
Contents¶
-
class
evalml.data_checks.multicollinearity_data_check.
MulticollinearityDataCheck
(threshold=0.9)[source]¶ Check if any set features are likely to be multicollinear.
- Parameters
threshold (float) – The threshold to be considered. Defaults to 0.9.
Methods
Return a name describing the data check.
Check if any set of features are likely to be multicollinear.
-
name
(cls)¶ Return a name describing the data check.
-
validate
(self, X, y=None)[source]¶ Check if any set of features are likely to be multicollinear.
- Parameters
X (pd.DataFrame) – The input features to check.
y (pd.Series) – The target. Ignored.
- Returns
dict with a DataCheckWarning if there are any potentially multicollinear columns.
- Return type
dict
Example
>>> import pandas as pd ... >>> col = pd.Series([1, 0, 2, 3, 4]) >>> X = pd.DataFrame({"col_1": col, "col_2": col * 3}) >>> y = pd.Series([1, 0, 0, 1, 0]) ... >>> multicollinearity_check = MulticollinearityDataCheck(threshold=1.0) >>> assert multicollinearity_check.validate(X, y) == { ... "errors": [], ... "warnings": [{'message': "Columns are likely to be correlated: [('col_1', 'col_2')]", ... "data_check_name": "MulticollinearityDataCheck", ... "level": "warning", ... "code": "IS_MULTICOLLINEAR", ... 'details': {"columns": [('col_1', 'col_2')], "rows": None}}], ... "actions": []}