invalid_targets_data_check

Module Contents

Classes Summary

InvalidTargetDataCheck

Checks if the target data contains missing or invalid values.

Contents

class evalml.data_checks.invalid_targets_data_check.InvalidTargetDataCheck(problem_type, objective, n_unique=100)[source]

Checks if the target data contains missing or invalid values.

Parameters
  • problem_type (str or ProblemTypes) – The specific problem type to data check for. e.g. ‘binary’, ‘multiclass’, ‘regression, ‘time series regression’

  • objective (str or ObjectiveBase) – Name or instance of the objective class.

  • n_unique (int) – Number of unique target values to store when problem type is binary and target incorrectly has more than 2 unique values. Non-negative integer. If None, stores all unique values. Defaults to 100.

Attributes

multiclass_continuous_threshold

0.05

Methods

name

Returns a name describing the data check.

validate

Checks if the target data contains missing or invalid values.

name(cls)

Returns a name describing the data check.

validate(self, X, y)[source]

Checks if the target data contains missing or invalid values.

Parameters
  • X (pd.DataFrame, np.ndarray) – Features. Ignored.

  • y (pd.Series, np.ndarray) – Target data to check for invalid values.

Returns

List with DataCheckErrors if any invalid values are found in the target data.

Return type

dict (DataCheckError)

Example

>>> import pandas as pd
>>> X = pd.DataFrame({"col": [1, 2, 3, 1]})
>>> y = pd.Series([0, 1, None, None])
>>> target_check = InvalidTargetDataCheck('binary', 'Log Loss Binary')
>>> assert target_check.validate(X, y) == {"errors": [{"message": "2 row(s) (50.0%) of target values are null",                                                                   "data_check_name": "InvalidTargetDataCheck",                                                                   "level": "error",                                                                   "code": "TARGET_HAS_NULL",                                                                   "details": {"num_null_rows": 2, "pct_null_rows": 50}}],                                                       "warnings": [],                                                       "actions": [{'code': 'IMPUTE_COL', 'metadata': {'column': None, 'impute_strategy': 'most_frequent', 'is_target': True}}]}