ts_splitting_data_check ==================================================== .. py:module:: evalml.data_checks.ts_splitting_data_check .. autoapi-nested-parse:: Data check that checks whether the time series training and validation splits have adequate class representation. Module Contents --------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.data_checks.ts_splitting_data_check.TimeSeriesSplittingDataCheck Contents ~~~~~~~~~~~~~~~~~~~ .. py:class:: TimeSeriesSplittingDataCheck(problem_type, n_splits) Checks whether the time series target data is compatible with splitting. If the target data in the training and validation of every split doesn't have representation from all classes (for time series classification problems) this will prevent the estimators from training on all potential outcomes which will cause errors during prediction. :param problem_type: Problem type. :type problem_type: str or ProblemTypes :param n_splits: Number of time series splits. :type n_splits: int **Methods** .. autoapisummary:: :nosignatures: evalml.data_checks.ts_splitting_data_check.TimeSeriesSplittingDataCheck.name evalml.data_checks.ts_splitting_data_check.TimeSeriesSplittingDataCheck.validate .. py:method:: name(cls) Return a name describing the data check. .. py:method:: validate(self, X, y) Check if the training and validation targets are compatible with time series data splitting. :param X: Ignored. Features. :type X: pd.DataFrame, np.ndarray :param y: Target data. :type y: pd.Series, np.ndarray :returns: dict with a DataCheckError if splitting would result in inadequate class representation. :rtype: dict .. rubric:: Example >>> import pandas as pd Passing n_splits as 3 means that the data will be segmented into 4 parts to be iterated over for training and validation splits. The first split results in training indices of [0:25] and validation indices of [25:50]. The training indices of the first split result in only one unique value (0). The third split results in training indices of [0:75] and validation indices of [75:100]. The validation indices of the third split result in only one unique value (1). >>> X = None >>> y = pd.Series([0 if i < 45 else i % 2 if i < 55 else 1 for i in range(100)]) >>> ts_splitting_check = TimeSeriesSplittingDataCheck("time series binary", 3) >>> assert ts_splitting_check.validate(X, y) == [ ... { ... "message": "Time Series Binary and Time Series Multiclass problem " ... "types require every training and validation split to " ... "have at least one instance of all the target classes. " ... "The following splits are invalid: [1, 3]", ... "data_check_name": "TimeSeriesSplittingDataCheck", ... "level": "error", ... "details": { ... "columns": None, "rows": None, ... "invalid_splits": { ... 1: {"Training": [0, 25]}, ... 3: {"Validation": [75, 100]} ... } ... }, ... "code": "TIMESERIES_TARGET_NOT_COMPATIBLE_WITH_SPLIT", ... "action_options": [] ... } ... ]