evalml.preprocessing.split_data(X, y, regression=False, test_size=0.2, random_state=None)[source]

Splits data into train and test sets.

  • X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features]

  • y (pd.Series) – Target data of length [n_samples]

  • regression (bool) – If true, do not use stratified split

  • test_size (float) – Percent of train set to holdout for testing

  • random_state (int, np.random.RandomState) – Seed for the random number generator


Feature and target data each split into train and test sets

Return type

pd.DataFrame, pd.DataFrame, pd.Series, pd.Series