evalml.preprocessing.
split_data
Splits data into train and test sets.
X (ww.DataTable, pd.DataFrame or np.ndarray) – data of shape [n_samples, n_features]
y (ww.DataColumn, pd.Series, or np.ndarray) – target data of length [n_samples]
problem_type (str or ProblemTypes) – type of supervised learning problem. see evalml.problem_types.problemtype.all_problem_types for a full list.
problem_configuration (dict) – Additional parameters needed to configure the search. For example, in time series problems, values should be passed in for the gap and max_delay variables.
test_size (float) – What percentage of data points should be included in the test set. Defaults to 0.2 (20%).
random_state (None, int) – Deprecated - use random_seed instead.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Feature and target data each split into train and test sets
ww.DataTable, ww.DataTable, ww.DataColumn, ww.DataColumn