datetime_format_data_check

Data check that checks if the datetime column has equally spaced intervals and is monotonically increasing or decreasing in order to be supported by time series estimators.

Module Contents

Classes Summary

DateTimeFormatDataCheck

Check if the datetime column has equally spaced intervals and is monotonically increasing or decreasing in order to be supported by time series estimators.

Contents

class evalml.data_checks.datetime_format_data_check.DateTimeFormatDataCheck(datetime_column='index')[source]

Check if the datetime column has equally spaced intervals and is monotonically increasing or decreasing in order to be supported by time series estimators.

Parameters

datetime_column (str, int) – The name of the datetime column. If the datetime values are in the index, then pass “index”.

Methods

name

Return a name describing the data check.

validate

Checks if the target data has equal intervals and is sorted.

name(cls)

Return a name describing the data check.

validate(self, X, y)[source]

Checks if the target data has equal intervals and is sorted.

Parameters
  • X (pd.DataFrame, np.ndarray) – Features.

  • y (pd.Series, np.ndarray) – Target data.

Returns

List with DataCheckErrors if unequal intervals are found in the datetime column.

Return type

dict (DataCheckError)

Examples

>>> import pandas as pd
...
>>> X = pd.DataFrame(pd.date_range("2021-01-01", periods=9).append(pd.date_range("2021-01-31", periods=1)), columns=["dates"])
>>> y = pd.Series([0, 1, 0, 1, 1, 0, 0, 0, 1, 0])
>>> datetime_format_dc = DateTimeFormatDataCheck(datetime_column="dates")
>>> assert datetime_format_dc.validate(X, y) == {
...     "errors": [{"message": "No frequency could be detected in dates, possibly due to uneven intervals.",
...                 "data_check_name": "DateTimeFormatDataCheck",
...                 "level": "error",
...                 "code": "DATETIME_HAS_UNEVEN_INTERVALS",
...                 "details": {"columns": None, "rows": None}
...                 }],
...     "warnings": [],
...     "actions": []}
...
...
>>> X = pd.DataFrame([1, 2, 3, 4], columns=["Weeks"])
>>> y = pd.Series([0] * 4)
>>> datetime_format_dc = DateTimeFormatDataCheck(datetime_column="Weeks")
>>> assert datetime_format_dc.validate(X, y) == {
...     'warnings': [],
...     'errors': [{'message': 'Datetime information could not be found in the data, or was not in a supported datetime format.',
...                 'data_check_name': 'DateTimeFormatDataCheck',
...                 'level': 'error',
...                 'details': {'columns': None, 'rows': None},
...                 'code': 'DATETIME_INFORMATION_NOT_FOUND'}],
...     'actions': []}
...
...
>>> X = pd.DataFrame(pd.to_datetime([1, 2, 3, 4]), columns=["Weeks"])
>>> datetime_format_dc = DateTimeFormatDataCheck(datetime_column="Weeks")
>>> assert datetime_format_dc.validate(X, y) == {'warnings': [], 'errors': [], 'actions': []}
...
...
>>> X = pd.DataFrame(pd.date_range("2021-01-01", freq='W', periods=10), columns=["Weeks"])
>>> datetime_format_dc = DateTimeFormatDataCheck(datetime_column="Weeks")
>>> assert datetime_format_dc.validate(X, y) == {'warnings': [], 'errors': [], 'actions': []}
...
...
>>> X = X.iloc[::-1]
>>> datetime_format_dc = DateTimeFormatDataCheck(datetime_column="Weeks")
>>> assert datetime_format_dc.validate(X, y) == {
...     'warnings': [],
...     'errors': [{'message': 'Datetime values must be sorted in ascending order.',
...                 'data_check_name': 'DateTimeFormatDataCheck',
...                 'level': 'error',
...                 'details': {'columns': None, 'rows': None},
...                 'code': 'DATETIME_IS_NOT_MONOTONIC'}],
...     'actions': []}