invalid_targets_data_check¶
Module Contents¶
Classes Summary¶
Checks if the target data contains missing or invalid values. |
Contents¶
-
class
evalml.data_checks.invalid_targets_data_check.
InvalidTargetDataCheck
(problem_type, objective, n_unique=100)[source]¶ Checks if the target data contains missing or invalid values.
- Parameters
problem_type (str or ProblemTypes) – The specific problem type to data check for. e.g. ‘binary’, ‘multiclass’, ‘regression, ‘time series regression’
objective (str or ObjectiveBase) – Name or instance of the objective class.
n_unique (int) – Number of unique target values to store when problem type is binary and target incorrectly has more than 2 unique values. Non-negative integer. If None, stores all unique values. Defaults to 100.
Attributes
multiclass_continuous_threshold
0.05
Methods
Returns a name describing the data check.
Checks if the target data contains missing or invalid values.
-
name
(cls)¶ Returns a name describing the data check.
-
validate
(self, X, y)[source]¶ Checks if the target data contains missing or invalid values.
- Parameters
X (pd.DataFrame, np.ndarray) – Features. Ignored.
y (pd.Series, np.ndarray) – Target data to check for invalid values.
- Returns
List with DataCheckErrors if any invalid values are found in the target data.
- Return type
dict (DataCheckError)
Example
>>> import pandas as pd >>> X = pd.DataFrame({"col": [1, 2, 3, 1]}) >>> y = pd.Series([0, 1, None, None]) >>> target_check = InvalidTargetDataCheck('binary', 'Log Loss Binary') >>> assert target_check.validate(X, y) == {"errors": [{"message": "2 row(s) (50.0%) of target values are null", "data_check_name": "InvalidTargetDataCheck", "level": "error", "code": "TARGET_HAS_NULL", "details": {"num_null_rows": 2, "pct_null_rows": 50}}], "warnings": [], "actions": [{'code': 'IMPUTE_COL', 'metadata': {'column': None, 'impute_strategy': 'most_frequent', 'is_target': True}}]}