target_distribution_data_check¶
Module Contents¶
Classes Summary¶
Checks if the target data contains certain distributions that may need to be transformed prior training to |
Contents¶
-
class
evalml.data_checks.target_distribution_data_check.
TargetDistributionDataCheck
[source]¶ Checks if the target data contains certain distributions that may need to be transformed prior training to improve model performance.
Methods
Returns a name describing the data check.
Checks if the target data has a certain distribution.
-
name
(cls)¶ Returns a name describing the data check.
-
validate
(self, X, y)[source]¶ Checks if the target data has a certain distribution.
- Parameters
X (pd.DataFrame, np.ndarray) – Features. Ignored.
y (pd.Series, np.ndarray) – Target data to check for underlying distributions.
- Returns
List with DataCheckErrors if certain distributions are found in the target data.
- Return type
dict (DataCheckError)
Example
>>> from scipy.stats import lognorm >>> X = None >>> y = [0.946, 0.972, 1.154, 0.954, 0.969, 1.222, 1.038, 0.999, 0.973, 0.897] >>> target_check = TargetDistributionDataCheck() >>> assert target_check.validate(X, y) == {"errors": [], "warnings": [{"message": "Target may have a lognormal distribution.", "data_check_name": "TargetDistributionDataCheck", "level": "warning", "code": "TARGET_LOGNORMAL_DISTRIBUTION", "details": {"shapiro-statistic/pvalue": '0.84/0.045'}}], "actions": [{'code': 'TRANSFORM_TARGET', 'metadata': {'column': None, 'transformation_strategy': 'lognormal', 'is_target': True}}]}
-