training_validation_split ======================================================================= .. py:module:: evalml.preprocessing.data_splitters.training_validation_split .. autoapi-nested-parse:: Training Validation Split class. Module Contents --------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.preprocessing.data_splitters.training_validation_split.TrainingValidationSplit Contents ~~~~~~~~~~~~~~~~~~~ .. py:class:: TrainingValidationSplit(test_size=None, train_size=None, shuffle=False, stratify=None, random_seed=0) Split the training data into training and validation sets. :param test_size: What percentage of data points should be included in the validation set. Defalts to the complement of `train_size` if `train_size` is set, and 0.25 otherwise. :type test_size: float :param train_size: What percentage of data points should be included in the training set. Defaults to the complement of `test_size` :type train_size: float :param shuffle: Whether to shuffle the data before splitting. Defaults to False. :type shuffle: boolean :param stratify: Splits the data in a stratified fashion, using this argument as class labels. Defaults to None. :type stratify: list :param random_seed: The seed to use for random sampling. Defaults to 0. :type random_seed: int .. rubric:: Examples >>> import numpy as np >>> import pandas as pd ... >>> X = pd.DataFrame([i for i in range(10)], columns=["First"]) >>> y = pd.Series([i for i in range(10)]) ... >>> tv_split = TrainingValidationSplit() >>> split_ = next(tv_split.split(X, y)) >>> assert (split_[0] == np.array([0, 1, 2, 3, 4, 5, 6])).all() >>> assert (split_[1] == np.array([7, 8, 9])).all() ... ... >>> tv_split = TrainingValidationSplit(test_size=0.5) >>> split_ = next(tv_split.split(X, y)) >>> assert (split_[0] == np.array([0, 1, 2, 3, 4])).all() >>> assert (split_[1] == np.array([5, 6, 7, 8, 9])).all() ... ... >>> tv_split = TrainingValidationSplit(shuffle=True) >>> split_ = next(tv_split.split(X, y)) >>> assert (split_[0] == np.array([9, 1, 6, 7, 3, 0, 5])).all() >>> assert (split_[1] == np.array([2, 8, 4])).all() ... ... >>> y = pd.Series([i % 3 for i in range(10)]) >>> tv_split = TrainingValidationSplit(shuffle=True, stratify=y) >>> split_ = next(tv_split.split(X, y)) >>> assert (split_[0] == np.array([1, 9, 3, 2, 8, 6, 7])).all() >>> assert (split_[1] == np.array([0, 4, 5])).all() **Methods** .. autoapisummary:: :nosignatures: evalml.preprocessing.data_splitters.training_validation_split.TrainingValidationSplit.get_metadata_routing evalml.preprocessing.data_splitters.training_validation_split.TrainingValidationSplit.get_n_splits evalml.preprocessing.data_splitters.training_validation_split.TrainingValidationSplit.is_cv evalml.preprocessing.data_splitters.training_validation_split.TrainingValidationSplit.split .. py:method:: get_metadata_routing(self) Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :returns: **routing** -- A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. :rtype: MetadataRequest .. py:method:: get_n_splits() :staticmethod: Return the number of splits of this object. :returns: Always returns 1. :rtype: int .. py:method:: is_cv(self) :property: Returns whether or not the data splitter is a cross-validation data splitter. :returns: If the splitter is a cross-validation data splitter :rtype: bool .. py:method:: split(self, X, y=None) Divide the data into training and testing sets. :param X: Dataframe of points to split :type X: pd.DataFrame :param y: Series of points to split :type y: pd.Series :returns: Indices to split data into training and test set :rtype: list