invalid_targets_data_check¶
Data check that checks if the target data contains missing or invalid values.
Module Contents¶
Classes Summary¶
Check if the target data contains missing or invalid values. |
Contents¶
-
class
evalml.data_checks.invalid_targets_data_check.
InvalidTargetDataCheck
(problem_type, objective, n_unique=100)[source]¶ Check if the target data contains missing or invalid values.
- Parameters
problem_type (str or ProblemTypes) – The specific problem type to data check for. e.g. ‘binary’, ‘multiclass’, ‘regression, ‘time series regression’
objective (str or ObjectiveBase) – Name or instance of the objective class.
n_unique (int) – Number of unique target values to store when problem type is binary and target incorrectly has more than 2 unique values. Non-negative integer. If None, stores all unique values. Defaults to 100.
Attributes
multiclass_continuous_threshold
0.05
Methods
Return a name describing the data check.
Check if the target data contains missing or invalid values.
-
name
(cls)¶ Return a name describing the data check.
-
validate
(self, X, y)[source]¶ Check if the target data contains missing or invalid values.
- Parameters
X (pd.DataFrame, np.ndarray) – Features. Ignored.
y (pd.Series, np.ndarray) – Target data to check for invalid values.
- Returns
List with DataCheckErrors if any invalid values are found in the target data.
- Return type
dict (DataCheckError)
Examples
>>> import pandas as pd ... >>> X = pd.DataFrame({"col": [1, 2, 3, 1]}) >>> y = pd.Series(["cat_1", "cat_2", "cat_1", "cat_2"]) >>> target_check = InvalidTargetDataCheck('regression', 'R2') >>> assert target_check.validate(X, y) == { ... 'warnings': [], ... 'errors': [{'message': 'Target is unsupported Unknown type. Valid Woodwork logical types include: integer, double, boolean', ... 'data_check_name': 'InvalidTargetDataCheck', ... 'level': 'error', ... 'details': {'columns': None, 'rows': None, 'unsupported_type': 'unknown'}, ... 'code': 'TARGET_UNSUPPORTED_TYPE'}, ... {'message': 'Target data type should be numeric for regression type problems.', ... 'data_check_name': 'InvalidTargetDataCheck', ... 'level': 'error', ... 'details': {'columns': None, 'rows': None}, ... 'code': 'TARGET_UNSUPPORTED_TYPE'}], ... 'actions': []} ... ... >>> y = pd.Series([None, pd.NA, pd.NaT, None]) >>> assert target_check.validate(X, y) == { ... 'warnings': [], ... 'errors': [{'message': 'Target is either empty or fully null.', ... 'data_check_name': 'InvalidTargetDataCheck', ... 'level': 'error', ... 'details': {'columns': None, 'rows': None}, ... 'code': 'TARGET_IS_EMPTY_OR_FULLY_NULL'}], ... 'actions': []} ... ... >>> y = pd.Series([1, None, 3, None]) >>> assert target_check.validate(None, y) == { ... 'warnings': [], ... 'errors': [{'message': '2 row(s) (50.0%) of target values are null', ... 'data_check_name': 'InvalidTargetDataCheck', ... 'level': 'error', ... 'details': {'columns': None, ... 'rows': None, ... 'num_null_rows': 2, ... 'pct_null_rows': 50.0}, ... 'code': 'TARGET_HAS_NULL'}], ... 'actions': [{'code': 'IMPUTE_COL', ... 'metadata': {'columns': None, ... 'rows': None, ... 'is_target': True, ... 'impute_strategy': 'mean'}}]} ... ... >>> X = pd.DataFrame([i for i in range(50)]) >>> y = pd.Series([i%2 for i in range(50)]) >>> target_check = InvalidTargetDataCheck('multiclass', 'Log Loss Multiclass') >>> assert target_check.validate(X, y) == { ... 'warnings': [], ... 'errors': [{'message': 'Target has two or less classes, which is too few for multiclass problems. Consider changing to binary.', ... 'data_check_name': 'InvalidTargetDataCheck', ... 'level': 'error', ... 'details': {'columns': None, 'rows': None, 'num_classes': 2}, ... 'code': 'TARGET_MULTICLASS_NOT_ENOUGH_CLASSES'}], ... 'actions': []} ... ... >>> target_check = InvalidTargetDataCheck('regression', 'R2') >>> X = pd.DataFrame([i for i in range(5)]) >>> y = pd.Series([1, 2, 4, 3], index=[1, 2, 4, 3]) >>> assert target_check.validate(X, y) == { ... 'warnings': [{'message': 'Input target and features have different lengths', ... 'data_check_name': 'InvalidTargetDataCheck', ... 'level': 'warning', ... 'details': {'columns': None, ... 'rows': None, ... 'features_length': 5, ... 'target_length': 4}, ... 'code': 'MISMATCHED_LENGTHS'}, ... {'message': 'Input target and features have mismatched indices', ... 'data_check_name': 'InvalidTargetDataCheck', ... 'level': 'warning', ... 'details': {'columns': None, ... 'rows': None, ... 'indices_not_in_features': [], ... 'indices_not_in_target': [0]}, ... 'code': 'MISMATCHED_INDICES'}], ... 'errors': [], ... 'actions': []}