target_distribution_data_check

Module Contents

Classes Summary

TargetDistributionDataCheck

Checks if the target data contains certain distributions that may need to be transformed prior training to

Contents

class evalml.data_checks.target_distribution_data_check.TargetDistributionDataCheck[source]

Checks if the target data contains certain distributions that may need to be transformed prior training to improve model performance.

Methods

name

Returns a name describing the data check.

validate

Checks if the target data has a certain distribution.

name(cls)

Returns a name describing the data check.

validate(self, X, y)[source]

Checks if the target data has a certain distribution.

Parameters
  • X (pd.DataFrame, np.ndarray) – Features. Ignored.

  • y (pd.Series, np.ndarray) – Target data to check for underlying distributions.

Returns

List with DataCheckErrors if certain distributions are found in the target data.

Return type

dict (DataCheckError)

Example

>>> from scipy.stats import lognorm
>>> X = None
>>> y = [0.946, 0.972, 1.154, 0.954, 0.969, 1.222, 1.038, 0.999, 0.973, 0.897]
>>> target_check = TargetDistributionDataCheck()
>>> assert target_check.validate(X, y) == {"errors": [],                                                       "warnings": [{"message": "Target may have a lognormal distribution.",                                                                    "data_check_name": "TargetDistributionDataCheck",                                                                    "level": "warning",                                                                    "code": "TARGET_LOGNORMAL_DISTRIBUTION",                                                                    "details": {"shapiro-statistic/pvalue": '0.84/0.045'}}],                                                        "actions": [{'code': 'TRANSFORM_TARGET', 'metadata': {'column': None, 'transformation_strategy': 'lognormal', 'is_target': True}}]}