classifiers¶
Classification model components.
Submodules¶
Package Contents¶
Classes Summary¶
Classifier that predicts using the specified strategy. |
|
CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features. |
|
Decision Tree Classifier. |
|
Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator. |
|
Extra Trees Classifier. |
|
K-Nearest Neighbors Classifier. |
|
LightGBM Classifier. |
|
Logistic Regression Classifier. |
|
Random Forest Classifier. |
|
Support Vector Machine Classifier. |
|
Vowpal Wabbit Binary Classifier. |
|
Vowpal Wabbit Multiclass Classifier. |
|
XGBoost Classifier. |
Contents¶
-
class
evalml.pipelines.components.estimators.classifiers.
BaselineClassifier
(strategy='mode', random_seed=0, **kwargs)[source]¶ Classifier that predicts using the specified strategy.
This is useful as a simple baseline classifier to compare with other classifiers.
- Parameters
strategy (str) – Method used to predict. Valid options are “mode”, “random” and “random_weighted”. Defaults to “mode”.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.BASELINE
modifies_features
True
modifies_target
False
name
Baseline Classifier
supported_problem_types
[ProblemTypes.BINARY, ProblemTypes.MULTICLASS]
training_only
False
Methods
Returns class labels. Will return None before fitting.
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature. Since baseline classifiers do not use input features to calculate predictions, returns an array of zeroes.
Fits baseline classifier component to data.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using the baseline classification strategy.
Make prediction probabilities using the baseline classification strategy.
Saves component at file path.
-
property
classes_
(self)¶ Returns class labels. Will return None before fitting.
- Returns
Class names
- Return type
list[str] or list(float)
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature. Since baseline classifiers do not use input features to calculate predictions, returns an array of zeroes.
- Returns
An array of zeroes
- Return type
np.ndarray (float)
-
fit
(self, X, y=None)[source]¶ Fits baseline classifier component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
- Raises
ValueError – If y is None.
-
static
load
(file_path)¶ Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component.
-
predict
(self, X)[source]¶ Make predictions using the baseline classification strategy.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
-
predict_proba
(self, X)[source]¶ Make prediction probabilities using the baseline classification strategy.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted probability values.
- Return type
pd.DataFrame
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
-
class
evalml.pipelines.components.estimators.classifiers.
CatBoostClassifier
(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=True, allow_writing_files=False, random_seed=0, n_jobs=- 1, **kwargs)[source]¶ CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.
For more information, check out https://catboost.ai/
- Parameters
n_estimators (float) – The maximum number of trees to build. Defaults to 10.
eta (float) – The learning rate. Defaults to 0.03.
max_depth (int) – The maximum tree depth for base learners. Defaults to 6.
bootstrap_type (string) – Defines the method for sampling the weights of objects. Available methods are ‘Bayesian’, ‘Bernoulli’, ‘MVS’. Defaults to None.
silent (boolean) – Whether to use the “silent” logging mode. Defaults to True.
allow_writing_files (boolean) – Whether to allow writing snapshot files while training. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(4, 100), “eta”: Real(0.000001, 1), “max_depth”: Integer(4, 10),}
model_family
ModelFamily.CATBOOST
modifies_features
True
modifies_target
False
name
CatBoost Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance of fitted CatBoost classifier.
Fits CatBoost classifier component to data.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using the fitted CatBoost classifier.
Make probability estimates for labels.
Saves component at file path.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
-
property
feature_importance
(self)¶ Feature importance of fitted CatBoost classifier.
-
fit
(self, X, y=None)[source]¶ Fits CatBoost classifier component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
-
static
load
(file_path)¶ Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component.
-
predict
(self, X)[source]¶ Make predictions using the fitted CatBoost classifier.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.DataFrame
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
-
class
evalml.pipelines.components.estimators.classifiers.
DecisionTreeClassifier
(criterion='gini', max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs)[source]¶ Decision Tree Classifier.
- Parameters
criterion ({"gini", "entropy"}) – The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. Defaults to “gini”.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Defaults to 2.
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “criterion”: [“gini”, “entropy”], “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.DECISION_TREE
modifies_features
True
modifies_target
False
name
Decision Tree Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits estimator to data.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature.
- Return type
np.ndarray
- Raises
MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.
-
fit
(self, X, y=None)¶ Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
-
static
load
(file_path)¶ Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component.
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
-
class
evalml.pipelines.components.estimators.classifiers.
ElasticNetClassifier
(penalty='elasticnet', C=1.0, l1_ratio=0.15, multi_class='auto', solver='saga', n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator.
- Parameters
penalty ({"l1", "l2", "elasticnet", "none"}) – The norm used in penalization. Defaults to “elasticnet”.
C (float) – Inverse of regularization strength. Must be a positive float. Defaults to 1.0.
l1_ratio (float) – The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’. Setting l1_ratio=0 is equivalent to using penalty=’l2’, while setting l1_ratio=1 is equivalent to using penalty=’l1’. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15.
multi_class ({"auto", "ovr", "multinomial"}) – If the option chosen is “ovr”, then a binary problem is fit for each label. For “multinomial” the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. “multinomial” is unavailable when solver=”liblinear”. “auto” selects “ovr” if the data is binary, or if solver=”liblinear”, and otherwise selects “multinomial”. Defaults to “auto”.
solver ({"newton-cg", "lbfgs", "liblinear", "sag", "saga"}) –
Algorithm to use in the optimization problem. For small datasets, “liblinear” is a good choice, whereas “sag” and “saga” are faster for large ones. For multiclass problems, only “newton-cg”, “sag”, “saga” and “lbfgs” handle multinomial loss; “liblinear” is limited to one-versus-rest schemes.
”newton-cg”, “lbfgs”, “sag” and “saga” handle L2 or no penalty
”liblinear” and “saga” also handle L1 penalty
”saga” also supports “elasticnet” penalty
”liblinear” does not support setting penalty=’none’
Defaults to “saga”.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “C”: Real(0.01, 10), “l1_ratio”: Real(0, 1)}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Elastic Net Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance for fitted ElasticNet classifier.
Fits ElasticNet classifier component to data.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
-
property
feature_importance
(self)¶ Feature importance for fitted ElasticNet classifier.
-
fit
(self, X, y)[source]¶ Fits ElasticNet classifier component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
-
static
load
(file_path)¶ Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component.
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
-
class
evalml.pipelines.components.estimators.classifiers.
ExtraTreesClassifier
(n_estimators=100, max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Extra Trees Classifier.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
to 2. (Defaults) –
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.EXTRA_TREES
modifies_features
True
modifies_target
False
name
Extra Trees Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits estimator to data.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature.
- Return type
np.ndarray
- Raises
MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.
-
fit
(self, X, y=None)¶ Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
-
static
load
(file_path)¶ Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component.
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
-
class
evalml.pipelines.components.estimators.classifiers.
KNeighborsClassifier
(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, random_seed=0, **kwargs)[source]¶ K-Nearest Neighbors Classifier.
- Parameters
n_neighbors (int) – Number of neighbors to use by default. Defaults to 5.
weights ({‘uniform’, ‘distance’} or callable) –
Weight function used in prediction. Can be:
‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
Defaults to “uniform”.
algorithm ({‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}) –
Algorithm used to compute the nearest neighbors:
‘ball_tree’ will use BallTree
‘kd_tree’ will use KDTree
‘brute’ will use a brute-force search.
‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method. Defaults to “auto”. Note: fitting on sparse input will override the setting of this parameter, using brute force.
leaf_size (int) – Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. Defaults to 30.
p (int) – Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. Defaults to 2.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_neighbors”: Integer(2, 12), “weights”: [“uniform”, “distance”], “algorithm”: [“auto”, “ball_tree”, “kd_tree”, “brute”], “leaf_size”: Integer(10, 30), “p”: Integer(1, 5),}
model_family
ModelFamily.K_NEIGHBORS
modifies_features
True
modifies_target
False
name
KNN Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns array of 0’s matching the input number of features as feature_importance is not defined for KNN classifiers.
Fits estimator to data.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns array of 0’s matching the input number of features as feature_importance is not defined for KNN classifiers.
-
fit
(self, X, y=None)¶ Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
-
static
load
(file_path)¶ Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component.
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
-
class
evalml.pipelines.components.estimators.classifiers.
LightGBMClassifier
(boosting_type='gbdt', learning_rate=0.1, n_estimators=100, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ LightGBM Classifier.
- Parameters
boosting_type (string) – Type of boosting to use. Defaults to “gbdt”. - ‘gbdt’ uses traditional Gradient Boosting Decision Tree - “dart”, uses Dropouts meet Multiple Additive Regression Trees - “goss”, uses Gradient-based One-Side Sampling - “rf”, uses Random Forest
learning_rate (float) – Boosting learning rate. Defaults to 0.1.
n_estimators (int) – Number of boosted trees to fit. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners, <=0 means no limit. Defaults to 0.
num_leaves (int) – Maximum tree leaves for base learners. Defaults to 31.
min_child_samples (int) – Minimum number of data needed in a child (leaf). Defaults to 20.
bagging_fraction (float) – LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9.
bagging_freq (int) – Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0.
n_jobs (int or None) – Number of threads to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “learning_rate”: Real(0.000001, 1), “boosting_type”: [“gbdt”, “dart”, “goss”, “rf”], “n_estimators”: Integer(10, 100), “max_depth”: Integer(0, 10), “num_leaves”: Integer(2, 100), “min_child_samples”: Integer(1, 100), “bagging_fraction”: Real(0.000001, 1), “bagging_freq”: Integer(0, 1),}
model_family
ModelFamily.LIGHTGBM
modifies_features
True
modifies_target
False
name
LightGBM Classifier
SEED_MAX
SEED_BOUNDS.max_bound
SEED_MIN
0
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits LightGBM classifier component to data.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using the fitted LightGBM classifier.
Make prediction probabilities using the fitted LightGBM classifier.
Saves component at file path.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature.
- Return type
np.ndarray
- Raises
MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.
-
fit
(self, X, y=None)[source]¶ Fits LightGBM classifier component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
-
static
load
(file_path)¶ Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component.
-
predict
(self, X)[source]¶ Make predictions using the fitted LightGBM classifier.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.DataFrame
-
predict_proba
(self, X)[source]¶ Make prediction probabilities using the fitted LightGBM classifier.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted probability values.
- Return type
pd.DataFrame
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
-
class
evalml.pipelines.components.estimators.classifiers.
LogisticRegressionClassifier
(penalty='l2', C=1.0, multi_class='auto', solver='lbfgs', n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Logistic Regression Classifier.
- Parameters
penalty ({"l1", "l2", "elasticnet", "none"}) – The norm used in penalization. Defaults to “l2”.
C (float) – Inverse of regularization strength. Must be a positive float. Defaults to 1.0.
multi_class ({"auto", "ovr", "multinomial"}) – If the option chosen is “ovr”, then a binary problem is fit for each label. For “multinomial” the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. “multinomial” is unavailable when solver=”liblinear”. “auto” selects “ovr” if the data is binary, or if solver=”liblinear”, and otherwise selects “multinomial”. Defaults to “auto”.
solver ({"newton-cg", "lbfgs", "liblinear", "sag", "saga"}) –
Algorithm to use in the optimization problem. For small datasets, “liblinear” is a good choice, whereas “sag” and “saga” are faster for large ones. For multiclass problems, only “newton-cg”, “sag”, “saga” and “lbfgs” handle multinomial loss; “liblinear” is limited to one-versus-rest schemes.
”newton-cg”, “lbfgs”, “sag” and “saga” handle L2 or no penalty
”liblinear” and “saga” also handle L1 penalty
”saga” also supports “elasticnet” penalty
”liblinear” does not support setting penalty=’none’
Defaults to “lbfgs”.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “penalty”: [“l2”], “C”: Real(0.01, 10),}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Logistic Regression Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance for fitted logistic regression classifier.
Fits estimator to data.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
-
property
feature_importance
(self)¶ Feature importance for fitted logistic regression classifier.
-
fit
(self, X, y=None)¶ Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
-
static
load
(file_path)¶ Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component.
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
-
class
evalml.pipelines.components.estimators.classifiers.
RandomForestClassifier
(n_estimators=100, max_depth=6, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Random Forest Classifier.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_depth”: Integer(1, 10),}
model_family
ModelFamily.RANDOM_FOREST
modifies_features
True
modifies_target
False
name
Random Forest Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Returns importance associated with each feature.
Fits estimator to data.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature.
- Return type
np.ndarray
- Raises
MethodPropertyNotFoundError – If estimator does not have a feature_importance method or a component_obj that implements feature_importance.
-
fit
(self, X, y=None)¶ Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
-
static
load
(file_path)¶ Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component.
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
-
class
evalml.pipelines.components.estimators.classifiers.
SVMClassifier
(C=1.0, kernel='rbf', gamma='auto', probability=True, random_seed=0, **kwargs)[source]¶ Support Vector Machine Classifier.
- Parameters
C (float) – The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0.
kernel ({"poly", "rbf", "sigmoid"}) – Specifies the kernel type to be used in the algorithm. Defaults to “rbf”.
gamma ({"scale", "auto"} or float) – Kernel coefficient for “rbf”, “poly” and “sigmoid”. Defaults to “auto”. - If gamma=’scale’ is passed then it uses 1 / (n_features * X.var()) as value of gamma - If “auto” (default), uses 1 / n_features
probability (boolean) – Whether to enable probability estimates. Defaults to True.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “C”: Real(0, 10), “kernel”: [“poly”, “rbf”, “sigmoid”], “gamma”: [“scale”, “auto”],}
model_family
ModelFamily.SVM
modifies_features
True
modifies_target
False
name
SVM Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance only works with linear kernels.
Fits estimator to data.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
-
property
feature_importance
(self)¶ Feature importance only works with linear kernels.
If the kernel isn’t linear, we return a numpy array of zeros.
- Returns
Feature importance of fitted SVM classifier or a numpy array of zeroes if the kernel is not linear.
-
fit
(self, X, y=None)¶ Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
-
static
load
(file_path)¶ Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component.
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
-
class
evalml.pipelines.components.estimators.classifiers.
VowpalWabbitBinaryClassifier
(loss_function='logistic', learning_rate=0.5, decay_learning_rate=1.0, power_t=0.5, passes=1, random_seed=0, **kwargs)[source]¶ Vowpal Wabbit Binary Classifier.
- Parameters
loss_function (str) – Specifies the loss function to use. One of {“squared”, “classic”, “hinge”, “logistic”, “quantile”}. Defaults to “logistic”.
learning_rate (float) – Boosting learning rate. Defaults to 0.5.
decay_learning_rate (float) – Decay factor for learning_rate. Defaults to 1.0.
power_t (float) – Power on learning rate decay. Defaults to 0.5.
passes (int) – Number of training passes. Defaults to 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
None
model_family
ModelFamily.VOWPAL_WABBIT
modifies_features
True
modifies_target
False
name
Vowpal Wabbit Binary Classifier
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.TIME_SERIES_BINARY,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance for Vowpal Wabbit classifiers. This is not implemented.
Fits estimator to data.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
-
property
feature_importance
(self)¶ Feature importance for Vowpal Wabbit classifiers. This is not implemented.
-
fit
(self, X, y=None)¶ Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
-
static
load
(file_path)¶ Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component.
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
-
class
evalml.pipelines.components.estimators.classifiers.
VowpalWabbitMulticlassClassifier
(loss_function='logistic', learning_rate=0.5, decay_learning_rate=1.0, power_t=0.5, passes=1, random_seed=0, **kwargs)[source]¶ Vowpal Wabbit Multiclass Classifier.
- Parameters
loss_function (str) – Specifies the loss function to use. One of {“squared”, “classic”, “hinge”, “logistic”, “quantile”}. Defaults to “logistic”.
learning_rate (float) – Boosting learning rate. Defaults to 0.5.
decay_learning_rate (float) – Decay factor for learning_rate. Defaults to 1.0.
power_t (float) – Power on learning rate decay. Defaults to 0.5.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
None
model_family
ModelFamily.VOWPAL_WABBIT
modifies_features
True
modifies_target
False
name
Vowpal Wabbit Multiclass Classifier
supported_problem_types
[ ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance for Vowpal Wabbit classifiers. This is not implemented.
Fits estimator to data.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
-
property
feature_importance
(self)¶ Feature importance for Vowpal Wabbit classifiers. This is not implemented.
-
fit
(self, X, y=None)¶ Fits estimator to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series, optional) – The target training data of length [n_samples].
- Returns
self
-
static
load
(file_path)¶ Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component.
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict method or a component_obj that implements predict.
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame) – Features.
- Returns
Probability estimates.
- Return type
pd.Series
- Raises
MethodPropertyNotFoundError – If estimator does not have a predict_proba method or a component_obj that implements predict_proba.
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.
-
class
evalml.pipelines.components.estimators.classifiers.
XGBoostClassifier
(eta=0.1, max_depth=6, min_child_weight=1, n_estimators=100, random_seed=0, eval_metric='logloss', n_jobs=12, **kwargs)[source]¶ XGBoost Classifier.
- Parameters
eta (float) – Boosting learning rate. Defaults to 0.1.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
min_child_weight (float) – Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0
n_estimators (int) – Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100.
random_seed (int) – Seed for the random number generator. Defaults to 0.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to 12.
Attributes
hyperparameter_ranges
{ “eta”: Real(0.000001, 1), “max_depth”: Integer(1, 10), “min_child_weight”: Real(1, 10), “n_estimators”: Integer(1, 1000),}
model_family
ModelFamily.XGBOOST
modifies_features
True
modifies_target
False
name
XGBoost Classifier
SEED_MAX
None
SEED_MIN
None
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
training_only
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters.
Feature importance of fitted XGBoost classifier.
Fits XGBoost classifier component to data.
Loads component at file path.
Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
Returns the parameters which were used to initialize the component.
Make predictions using the fitted XGBoost classifier.
Make predictions using the fitted CatBoost classifier.
Saves component at file path.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
Default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters.
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
Returns dictionary if return_dict is True, else None.
- Return type
None or dict
-
property
feature_importance
(self)¶ Feature importance of fitted XGBoost classifier.
-
fit
(self, X, y=None)[source]¶ Fits XGBoost classifier component to data.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
self
-
static
load
(file_path)¶ Loads component at file path.
- Parameters
file_path (str) – Location to load file.
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.
This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
- Returns
True.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component.
-
predict
(self, X)[source]¶ Make predictions using the fitted XGBoost classifier.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.DataFrame
-
predict_proba
(self, X)[source]¶ Make predictions using the fitted CatBoost classifier.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
- Returns
Predicted values.
- Return type
pd.DataFrame
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path.
- Parameters
file_path (str) – Location to save file.
pickle_protocol (int) – The pickle data stream format.