sk_splitters#

SKLearn data splitter wrapper classes.

Module Contents#

Classes Summary#

KFold

Wrapper class for sklearn's KFold splitter.

StratifiedKFold

Wrapper class for sklearn's Stratified KFold splitter.

Contents#

class evalml.preprocessing.data_splitters.sk_splitters.KFold(n_splits=5, *, shuffle=False, random_state=None)[source]#

Wrapper class for sklearn’s KFold splitter.

Methods

get_metadata_routing

Get metadata routing of this object.

get_n_splits

Returns the number of splitting iterations in the cross-validator

is_cv

Returns whether or not the data splitter is a cross-validation data splitter.

split

Generate indices to split data into training and test set.

get_metadata_routing(self)#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns

routing – A MetadataRequest encapsulating routing information.

Return type

MetadataRequest

get_n_splits(self, X=None, y=None, groups=None)#

Returns the number of splitting iterations in the cross-validator

Parameters
  • X (object) – Always ignored, exists for compatibility.

  • y (object) – Always ignored, exists for compatibility.

  • groups (object) – Always ignored, exists for compatibility.

Returns

n_splits – Returns the number of splitting iterations in the cross-validator.

Return type

int

property is_cv(self)#

Returns whether or not the data splitter is a cross-validation data splitter.

Returns

If the splitter is a cross-validation data splitter

Return type

bool

split(self, X, y=None, groups=None)#

Generate indices to split data into training and test set.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y (array-like of shape (n_samples,), default=None) – The target variable for supervised learning problems.

  • groups (array-like of shape (n_samples,), default=None) – Group labels for the samples used while splitting the dataset into train/test set.

Yields
  • train (ndarray) – The training set indices for that split.

  • test (ndarray) – The testing set indices for that split.

class evalml.preprocessing.data_splitters.sk_splitters.StratifiedKFold(n_splits=5, *, shuffle=False, random_state=None)[source]#

Wrapper class for sklearn’s Stratified KFold splitter.

Methods

get_metadata_routing

Get metadata routing of this object.

get_n_splits

Returns the number of splitting iterations in the cross-validator

is_cv

Returns whether or not the data splitter is a cross-validation data splitter.

split

Generate indices to split data into training and test set.

get_metadata_routing(self)#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns

routing – A MetadataRequest encapsulating routing information.

Return type

MetadataRequest

get_n_splits(self, X=None, y=None, groups=None)#

Returns the number of splitting iterations in the cross-validator

Parameters
  • X (object) – Always ignored, exists for compatibility.

  • y (object) – Always ignored, exists for compatibility.

  • groups (object) – Always ignored, exists for compatibility.

Returns

n_splits – Returns the number of splitting iterations in the cross-validator.

Return type

int

property is_cv(self)#

Returns whether or not the data splitter is a cross-validation data splitter.

Returns

If the splitter is a cross-validation data splitter

Return type

bool

split(self, X, y, groups=None)[source]#

Generate indices to split data into training and test set.

Parameters
  • X (array-like of shape (n_samples, n_features)) –

    Training data, where n_samples is the number of samples and n_features is the number of features.

    Note that providing y is sufficient to generate the splits and hence np.zeros(n_samples) may be used as a placeholder for X instead of actual training data.

  • y (array-like of shape (n_samples,)) – The target variable for supervised learning problems. Stratification is done based on the y labels.

  • groups (object) – Always ignored, exists for compatibility.

Yields
  • train (ndarray) – The training set indices for that split.

  • test (ndarray) – The testing set indices for that split.

Notes

Randomized CV splitters may return different results for each call of split. You can make the results identical by setting random_state to an integer.