sk_splitters#
SKLearn data splitter wrapper classes.
Module Contents#
Classes Summary#
Wrapper class for sklearn's KFold splitter. |
|
Wrapper class for sklearn's Stratified KFold splitter. |
Contents#
- class evalml.preprocessing.data_splitters.sk_splitters.KFold(n_splits=5, *, shuffle=False, random_state=None)[source]#
Wrapper class for sklearn’s KFold splitter.
Methods
Get metadata routing of this object.
Returns the number of splitting iterations in the cross-validator
Returns whether or not the data splitter is a cross-validation data splitter.
Generate indices to split data into training and test set.
- get_metadata_routing(self)#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns
routing – A
MetadataRequest
encapsulating routing information.- Return type
MetadataRequest
- get_n_splits(self, X=None, y=None, groups=None)#
Returns the number of splitting iterations in the cross-validator
- Parameters
X (object) – Always ignored, exists for compatibility.
y (object) – Always ignored, exists for compatibility.
groups (object) – Always ignored, exists for compatibility.
- Returns
n_splits – Returns the number of splitting iterations in the cross-validator.
- Return type
int
- property is_cv(self)#
Returns whether or not the data splitter is a cross-validation data splitter.
- Returns
If the splitter is a cross-validation data splitter
- Return type
bool
- split(self, X, y=None, groups=None)#
Generate indices to split data into training and test set.
- Parameters
X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (array-like of shape (n_samples,), default=None) – The target variable for supervised learning problems.
groups (array-like of shape (n_samples,), default=None) – Group labels for the samples used while splitting the dataset into train/test set.
- Yields
train (ndarray) – The training set indices for that split.
test (ndarray) – The testing set indices for that split.
- class evalml.preprocessing.data_splitters.sk_splitters.StratifiedKFold(n_splits=5, *, shuffle=False, random_state=None)[source]#
Wrapper class for sklearn’s Stratified KFold splitter.
Methods
Get metadata routing of this object.
Returns the number of splitting iterations in the cross-validator
Returns whether or not the data splitter is a cross-validation data splitter.
Generate indices to split data into training and test set.
- get_metadata_routing(self)#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns
routing – A
MetadataRequest
encapsulating routing information.- Return type
MetadataRequest
- get_n_splits(self, X=None, y=None, groups=None)#
Returns the number of splitting iterations in the cross-validator
- Parameters
X (object) – Always ignored, exists for compatibility.
y (object) – Always ignored, exists for compatibility.
groups (object) – Always ignored, exists for compatibility.
- Returns
n_splits – Returns the number of splitting iterations in the cross-validator.
- Return type
int
- property is_cv(self)#
Returns whether or not the data splitter is a cross-validation data splitter.
- Returns
If the splitter is a cross-validation data splitter
- Return type
bool
- split(self, X, y, groups=None)[source]#
Generate indices to split data into training and test set.
- Parameters
X (array-like of shape (n_samples, n_features)) –
Training data, where n_samples is the number of samples and n_features is the number of features.
Note that providing
y
is sufficient to generate the splits and hencenp.zeros(n_samples)
may be used as a placeholder forX
instead of actual training data.y (array-like of shape (n_samples,)) – The target variable for supervised learning problems. Stratification is done based on the y labels.
groups (object) – Always ignored, exists for compatibility.
- Yields
train (ndarray) – The training set indices for that split.
test (ndarray) – The testing set indices for that split.
Notes
Randomized CV splitters may return different results for each call of split. You can make the results identical by setting random_state to an integer.