datetime_format_data_check¶
Data check that checks if the datetime column has equally spaced intervals and is monotonically increasing or decreasing in order to be supported by time series estimators.
Module Contents¶
Classes Summary¶
Check if the datetime column has equally spaced intervals and is monotonically increasing or decreasing in order to be supported by time series estimators. |
Contents¶
-
class
evalml.data_checks.datetime_format_data_check.
DateTimeFormatDataCheck
(datetime_column='index')[source]¶ Check if the datetime column has equally spaced intervals and is monotonically increasing or decreasing in order to be supported by time series estimators.
- Parameters
datetime_column (str, int) – The name of the datetime column. If the datetime values are in the index, then pass “index”.
Methods
Return a name describing the data check.
Checks if the target data has equal intervals and is sorted.
-
name
(cls)¶ Return a name describing the data check.
-
validate
(self, X, y)[source]¶ Checks if the target data has equal intervals and is sorted.
- Parameters
X (pd.DataFrame, np.ndarray) – Features.
y (pd.Series, np.ndarray) – Target data.
- Returns
List with DataCheckErrors if unequal intervals are found in the datetime column.
- Return type
dict (DataCheckError)
Examples
>>> import pandas as pd
The column “dates” has the date 2021-01-31 appended to the end, which is inconsistent with the frequency of the previous 9 dates (1 day).
>>> X = pd.DataFrame(pd.date_range("2021-01-01", periods=9).append(pd.date_range("2021-01-31", periods=1)), columns=["dates"]) >>> y = pd.Series([0, 1, 0, 1, 1, 0, 0, 0, 1, 0]) >>> datetime_format_dc = DateTimeFormatDataCheck(datetime_column="dates") >>> assert datetime_format_dc.validate(X, y) == [ ... { ... "message": "No frequency could be detected in dates, possibly due to uneven intervals.", ... "data_check_name": "DateTimeFormatDataCheck", ... "level": "error", ... "code": "DATETIME_HAS_UNEVEN_INTERVALS", ... "details": {"columns": None, "rows": None}, ... "action_options": [] ... } ... ]
The column “Weeks” passed integers instead of datetime data, which will raise an error.
>>> X = pd.DataFrame([1, 2, 3, 4], columns=["Weeks"]) >>> y = pd.Series([0] * 4) >>> datetime_format_dc = DateTimeFormatDataCheck(datetime_column="Weeks") >>> assert datetime_format_dc.validate(X, y) == [ ... { ... "message": "Datetime information could not be found in the data, or was not in a supported datetime format.", ... "data_check_name": "DateTimeFormatDataCheck", ... "level": "error", ... "details": {"columns": None, "rows": None}, ... "code": "DATETIME_INFORMATION_NOT_FOUND", ... "action_options": [] ... } ... ]
Converting that same integer data to datetime, however, is valid.
>>> X = pd.DataFrame(pd.to_datetime([1, 2, 3, 4]), columns=["Weeks"]) >>> datetime_format_dc = DateTimeFormatDataCheck(datetime_column="Weeks") >>> assert datetime_format_dc.validate(X, y) == []
>>> X = pd.DataFrame(pd.date_range("2021-01-01", freq="W", periods=10), columns=["Weeks"]) >>> datetime_format_dc = DateTimeFormatDataCheck(datetime_column="Weeks") >>> assert datetime_format_dc.validate(X, y) == []
While the data passed in is of datetime type, time series requires the datetime information in datetime_column to be monotonically increasing (ascending).
>>> X = X.iloc[::-1] >>> datetime_format_dc = DateTimeFormatDataCheck(datetime_column="Weeks") >>> assert datetime_format_dc.validate(X, y) == [ ... { ... "message": "Datetime values must be sorted in ascending order.", ... "data_check_name": "DateTimeFormatDataCheck", ... "level": "error", ... "details": {"columns": None, "rows": None}, ... "code": "DATETIME_IS_NOT_MONOTONIC", ... "action_options": [] ... } ... ]