sk_splitters#

SKLearn data splitter wrapper classes.

Module Contents#

Classes Summary#

`KFold`	Wrapper class for sklearn's KFold splitter.
`StratifiedKFold`	Wrapper class for sklearn's Stratified KFold splitter.

Contents#

class evalml.preprocessing.data_splitters.sk_splitters.KFold(n_splits=5, *, shuffle=False, random_state=None)[source]#

Wrapper class for sklearn’s KFold splitter.

Methods

`get_metadata_routing`	Get metadata routing of this object.
`get_n_splits`	Returns the number of splitting iterations in the cross-validator
`is_cv`	Returns whether or not the data splitter is a cross-validation data splitter.
`split`	Generate indices to split data into training and test set.

get_metadata_routing(self)#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns: routing – A MetadataRequest encapsulating routing information.
Return type: MetadataRequest

get_n_splits(self, X=None, y=None, groups=None)#

Returns the number of splitting iterations in the cross-validator

Parameters

X (object) – Always ignored, exists for compatibility.
y (object) – Always ignored, exists for compatibility.
groups (object) – Always ignored, exists for compatibility.

Returns

n_splits – Returns the number of splitting iterations in the cross-validator.

Return type

int

property is_cv(self)#

Returns whether or not the data splitter is a cross-validation data splitter.

Returns: If the splitter is a cross-validation data splitter
Return type: bool

split(self, X, y=None, groups=None)#

Generate indices to split data into training and test set.

Parameters

X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (array-like of shape (n_samples,), default=None) – The target variable for supervised learning problems.
groups (array-like of shape (n_samples,), default=None) – Group labels for the samples used while splitting the dataset into train/test set.

Yields

train (ndarray) – The training set indices for that split.
test (ndarray) – The testing set indices for that split.

class evalml.preprocessing.data_splitters.sk_splitters.StratifiedKFold(n_splits=5, *, shuffle=False, random_state=None)[source]#

Wrapper class for sklearn’s Stratified KFold splitter.

Methods

`get_metadata_routing`	Get metadata routing of this object.
`get_n_splits`	Returns the number of splitting iterations in the cross-validator
`is_cv`	Returns whether or not the data splitter is a cross-validation data splitter.
`split`	Generate indices to split data into training and test set.

get_metadata_routing(self)#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns: routing – A MetadataRequest encapsulating routing information.
Return type: MetadataRequest

get_n_splits(self, X=None, y=None, groups=None)#

Returns the number of splitting iterations in the cross-validator

Parameters

X (object) – Always ignored, exists for compatibility.
y (object) – Always ignored, exists for compatibility.
groups (object) – Always ignored, exists for compatibility.

Returns

n_splits – Returns the number of splitting iterations in the cross-validator.

Return type

int

property is_cv(self)#

Returns whether or not the data splitter is a cross-validation data splitter.

Returns: If the splitter is a cross-validation data splitter
Return type: bool

split(self, X, y, groups=None)[source]#

Generate indices to split data into training and test set.

Parameters

X (array-like of shape (n_samples, n_features)) –
Training data, where n_samples is the number of samples and n_features is the number of features.

Note that providing y is sufficient to generate the splits and hence np.zeros(n_samples) may be used as a placeholder for X instead of actual training data.
y (array-like of shape (n_samples,)) – The target variable for supervised learning problems. Stratification is done based on the y labels.
groups (object) – Always ignored, exists for compatibility.

Yields

train (ndarray) – The training set indices for that split.
test (ndarray) – The testing set indices for that split.

Notes

Randomized CV splitters may return different results for each call of split. You can make the results identical by setting random_state to an integer.