target_distribution_data_check¶
Data check that checks if the target data contains certain distributions that may need to be transformed prior training to improve model performance.
Module Contents¶
Classes Summary¶
Check if the target data contains certain distributions that may need to be transformed prior training to improve model performance. |
Contents¶
-
class
evalml.data_checks.target_distribution_data_check.
TargetDistributionDataCheck
[source]¶ Check if the target data contains certain distributions that may need to be transformed prior training to improve model performance.
Methods
Return a name describing the data check.
Check if the target data has a certain distribution.
-
name
(cls)¶ Return a name describing the data check.
-
validate
(self, X, y)[source]¶ Check if the target data has a certain distribution.
- Parameters
X (pd.DataFrame, np.ndarray) – Features. Ignored.
y (pd.Series, np.ndarray) – Target data to check for underlying distributions.
- Returns
List with DataCheckErrors if certain distributions are found in the target data.
- Return type
dict (DataCheckError)
Example
>>> from scipy.stats import lognorm >>> X = None >>> y = [0.946, 0.972, 1.154, 0.954, 0.969, 1.222, 1.038, 0.999, 0.973, 0.897] >>> target_check = TargetDistributionDataCheck() >>> assert target_check.validate(X, y) == { ... "errors": [], ... "warnings": [{"message": "Target may have a lognormal distribution.", ... "data_check_name": "TargetDistributionDataCheck", ... "level": "warning", ... "code": "TARGET_LOGNORMAL_DISTRIBUTION", ... "details": {"shapiro-statistic/pvalue": '0.84/0.045'}}], ... "actions": [{'code': 'TRANSFORM_TARGET', 'metadata': {'column': None, 'transformation_strategy': 'lognormal', 'is_target': True}}]}
-