target_distribution_data_check =========================================================== .. py:module:: evalml.data_checks.target_distribution_data_check .. autoapi-nested-parse:: Data check that checks if the target data contains certain distributions that may need to be transformed prior training to improve model performance. Module Contents --------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.data_checks.target_distribution_data_check.TargetDistributionDataCheck Contents ~~~~~~~~~~~~~~~~~~~ .. py:class:: TargetDistributionDataCheck Check if the target data contains certain distributions that may need to be transformed prior training to improve model performance. Uses the Shapiro-Wilks test when the dataset is <=5000 samples, otherwise uses Jarque-Bera. **Methods** .. autoapisummary:: :nosignatures: evalml.data_checks.target_distribution_data_check.TargetDistributionDataCheck.name evalml.data_checks.target_distribution_data_check.TargetDistributionDataCheck.validate .. py:method:: name(cls) Return a name describing the data check. .. py:method:: validate(self, X, y) Check if the target data has a certain distribution. :param X: Features. Ignored. :type X: pd.DataFrame, np.ndarray :param y: Target data to check for underlying distributions. :type y: pd.Series, np.ndarray :returns: List with DataCheckErrors if certain distributions are found in the target data. :rtype: dict (DataCheckError) .. rubric:: Examples >>> import pandas as pd Targets that exhibit a lognormal distribution will raise a warning for the user to transform the target. >>> y = [0.946, 0.972, 1.154, 0.954, 0.969, 1.222, 1.038, 0.999, 0.973, 0.897] >>> target_check = TargetDistributionDataCheck() >>> assert target_check.validate(None, y) == [ ... { ... "message": "Target may have a lognormal distribution.", ... "data_check_name": "TargetDistributionDataCheck", ... "level": "warning", ... "code": "TARGET_LOGNORMAL_DISTRIBUTION", ... "details": {"normalization_method": "shapiro", "statistic": 0.8, "p-value": 0.045, "columns": None, "rows": None}, ... "action_options": [ ... { ... "code": "TRANSFORM_TARGET", ... "data_check_name": "TargetDistributionDataCheck", ... "parameters": {}, ... "metadata": { ... "transformation_strategy": "lognormal", ... "is_target": True, ... "columns": None, ... "rows": None ... } ... } ... ] ... } ... ] ... >>> y = pd.Series([1, 1, 1, 2, 2, 3, 4, 4, 5, 5, 5]) >>> assert target_check.validate(None, y) == [] ... ... >>> y = pd.Series(pd.date_range("1/1/21", periods=10)) >>> assert target_check.validate(None, y) == [ ... { ... "message": "Target is unsupported datetime type. Valid Woodwork logical types include: integer, double, age, age_fractional", ... "data_check_name": "TargetDistributionDataCheck", ... "level": "error", ... "details": {"columns": None, "rows": None, "unsupported_type": "datetime"}, ... "code": "TARGET_UNSUPPORTED_TYPE", ... "action_options": [] ... } ... ]