evalml.automl.make_data_splitter¶
-
evalml.automl.
make_data_splitter
(X, y, problem_type, problem_configuration=None, n_splits=3, shuffle=True, random_state=None, random_seed=0)[source]¶ Given the training data and ML problem parameters, compute a data splitting method to use during AutoML search.
- Parameters
X (ww.DataTable, pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (ww.DataColumn, pd.Series) – The target training data of length [n_samples].
problem_type (ProblemType) – The type of machine learning problem.
problem_configuration (dict, None) – Additional parameters needed to configure the search. For example, in time series problems, values should be passed in for the gap and max_delay variables. Defaults to None.
n_splits (int, None) – The number of CV splits, if applicable. Defaults to 3.
shuffle (bool) – Whether or not to shuffle the data before splitting, if applicable. Defaults to True.
random_state (None, int) – Deprecated - use random_seed instead.
random_seed (int) – Seed for the random number generator. Defaults to 0.
- Returns
Data splitting method.
- Return type
sklearn.model_selection.BaseCrossValidator