estimators¶
Subpackages¶
Package Contents¶
Classes Summary¶
Autoregressive Integrated Moving Average Model. |
|
Classifier that predicts using the specified strategy. |
|
Baseline regressor that uses a simple strategy to make predictions. |
|
CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. |
|
CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. |
|
Decision Tree Classifier. |
|
Decision Tree Regressor. |
|
Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator. |
|
Elastic Net Regressor. |
|
A component that fits and predicts given data. |
|
Extra Trees Classifier. |
|
Extra Trees Regressor. |
|
K-Nearest Neighbors Classifier. |
|
LightGBM Classifier. |
|
LightGBM Regressor. |
|
Linear Regressor. |
|
Logistic Regression Classifier. |
|
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. |
|
Random Forest Classifier. |
|
Random Forest Regressor. |
|
Support Vector Machine Classifier. |
|
Support Vector Machine Regressor. |
|
Time series estimator that predicts using the naive forecasting approach. |
|
XGBoost Classifier. |
|
XGBoost Regressor. |
Contents¶
-
class
evalml.pipelines.components.estimators.
ARIMARegressor
(date_index=None, trend=None, start_p=2, d=0, start_q=2, max_p=5, max_d=2, max_q=5, seasonal=True, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Autoregressive Integrated Moving Average Model. The three parameters (p, d, q) are the AR order, the degree of differencing, and the MA order. More information here: https://www.statsmodels.org/devel/generated/statsmodels.tsa.arima_model.ARIMA.html
Currently ARIMARegressor isn’t supported via conda install. It’s recommended that it be installed via PyPI.
- Parameters
date_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.
trend (str) – Controls the deterministic trend. Options are [‘n’, ‘c’, ‘t’, ‘ct’] where ‘c’ is a constant term, ‘t’ indicates a linear trend, and ‘ct’ is both. Can also be an iterable when defining a polynomial, such as [1, 1, 0, 1].
start_p (int) – Minimum Autoregressive order. Defaults to 2.
d (int) – Minimum Differencing degree. Defaults to 0.
start_q (int) – Minimum Moving Average order. Defaults to 2.
max_p (int) – Maximum Autoregressive order. Defaults to 5.
max_d (int) – Maximum Differencing degree. Defaults to 2.
max_q (int) – Maximum Moving Average order. Defaults to 5.
seasonal (boolean) – Whether to fit a seasonal model to ARIMA. Defaults to True.
n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “start_p”: Integer(1, 3), “d”: Integer(0, 2), “start_q”: Integer(1, 3), “max_p”: Integer(3, 10), “max_d”: Integer(2, 5), “max_q”: Integer(3, 10), “seasonal”: [True, False],}
model_family
ModelFamily.ARIMA
modifies_features
True
modifies_target
False
name
ARIMA Regressor
predict_uses_y
False
supported_problem_types
[ProblemTypes.TIME_SERIES_REGRESSION]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns array of 0’s with a length of 1 as feature_importance is not defined for ARIMA regressor.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns array of 0’s with a length of 1 as feature_importance is not defined for ARIMA regressor.
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X, y=None)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
BaselineClassifier
(strategy='mode', random_seed=0, **kwargs)[source]¶ Classifier that predicts using the specified strategy.
This is useful as a simple baseline classifier to compare with other classifiers.
- Parameters
strategy (str) – Method used to predict. Valid options are “mode”, “random” and “random_weighted”. Defaults to “mode”.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.BASELINE
modifies_features
True
modifies_target
False
name
Baseline Classifier
predict_uses_y
False
supported_problem_types
[ProblemTypes.BINARY, ProblemTypes.MULTICLASS]
Methods
Returns class labels. Will return None before fitting.
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature. Since baseline classifiers do not use input features to calculate predictions, returns an array of zeroes.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
property
classes_
(self)¶ Returns class labels. Will return None before fitting.
- Returns
Class names
- Return type
list[str] or list(float)
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature. Since baseline classifiers do not use input features to calculate predictions, returns an array of zeroes.
- Returns
An array of zeroes
- Return type
np.ndarray (float)
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)[source]¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
BaselineRegressor
(strategy='mean', random_seed=0, **kwargs)[source]¶ Baseline regressor that uses a simple strategy to make predictions. This is useful as a simple baseline regressor to compare with other regressors.
- Parameters
strategy (str) – Method used to predict. Valid options are “mean”, “median”. Defaults to “mean”.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.BASELINE
modifies_features
True
modifies_target
False
name
Baseline Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature. Since baseline regressors do not use input features to calculate predictions, returns an array of zeroes.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature. Since baseline regressors do not use input features to calculate predictions, returns an array of zeroes.
- Returns
An array of zeroes
- Return type
np.ndarray (float)
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
CatBoostClassifier
(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=True, allow_writing_files=False, random_seed=0, n_jobs=- 1, **kwargs)[source]¶ CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.
For more information, check out https://catboost.ai/
- Parameters
n_estimators (float) – The maximum number of trees to build. Defaults to 10.
eta (float) – The learning rate. Defaults to 0.03.
max_depth (int) – The maximum tree depth for base learners. Defaults to 6.
bootstrap_type (string) – Defines the method for sampling the weights of objects. Available methods are ‘Bayesian’, ‘Bernoulli’, ‘MVS’. Defaults to None.
silent (boolean) – Whether to use the “silent” logging mode. Defaults to True.
allow_writing_files (boolean) – Whether to allow writing snapshot files while training. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(4, 100), “eta”: Real(0.000001, 1), “max_depth”: Integer(4, 10),}
model_family
ModelFamily.CATBOOST
modifies_features
True
modifies_target
False
name
CatBoost Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
CatBoostRegressor
(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=False, allow_writing_files=False, random_seed=0, n_jobs=- 1, **kwargs)[source]¶ CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.
For more information, check out https://catboost.ai/
- Parameters
n_estimators (float) – The maximum number of trees to build. Defaults to 10.
eta (float) – The learning rate. Defaults to 0.03.
max_depth (int) – The maximum tree depth for base learners. Defaults to 6.
bootstrap_type (string) – Defines the method for sampling the weights of objects. Available methods are ‘Bayesian’, ‘Bernoulli’, ‘MVS’. Defaults to None.
silent (boolean) – Whether to use the “silent” logging mode. Defaults to True.
allow_writing_files (boolean) – Whether to allow writing snapshot files while training. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(4, 100), “eta”: Real(0.000001, 1), “max_depth”: Integer(4, 10),}
model_family
ModelFamily.CATBOOST
modifies_features
True
modifies_target
False
name
CatBoost Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
DecisionTreeClassifier
(criterion='gini', max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs)[source]¶ Decision Tree Classifier.
- Parameters
criterion ({"gini", "entropy"}) – The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. Defaults to “gini”.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Defaults to 2.
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “criterion”: [“gini”, “entropy”], “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.DECISION_TREE
modifies_features
True
modifies_target
False
name
Decision Tree Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
DecisionTreeRegressor
(criterion='mse', max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs)[source]¶ Decision Tree Regressor.
- Parameters
criterion ({"mse", "friedman_mse", "mae", "poisson"}) –
The function to measure the quality of a split. Supported criteria are:
”mse” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node
”friedman_mse”, which uses mean squared error with Friedman”s improvement score for potential splits
”mae” for the mean absolute error, which minimizes the L1 loss using the median of each terminal node,
”poisson” which uses reduction in Poisson deviance to find splits.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Defaults to 2.
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “criterion”: [“mse”, “friedman_mse”, “mae”], “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.DECISION_TREE
modifies_features
True
modifies_target
False
name
Decision Tree Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
ElasticNetClassifier
(penalty='elasticnet', C=1.0, l1_ratio=0.15, multi_class='auto', solver='saga', n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator.
- Parameters
penalty ({"l1", "l2", "elasticnet", "none"}) – The norm used in penalization. Defaults to “elasticnet”.
C (float) – Inverse of regularization strength. Must be a positive float. Defaults to 1.0.
l1_ratio (float) – The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’. Setting l1_ratio=0 is equivalent to using penalty=’l2’, while setting l1_ratio=1 is equivalent to using penalty=’l1’. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15.
multi_class ({"auto", "ovr", "multinomial"}) – If the option chosen is “ovr”, then a binary problem is fit for each label. For “multinomial” the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. “multinomial” is unavailable when solver=”liblinear”. “auto” selects “ovr” if the data is binary, or if solver=”liblinear”, and otherwise selects “multinomial”. Defaults to “auto”.
solver ({"newton-cg", "lbfgs", "liblinear", "sag", "saga"}) –
Algorithm to use in the optimization problem. For small datasets, “liblinear” is a good choice, whereas “sag” and “saga” are faster for large ones. For multiclass problems, only “newton-cg”, “sag”, “saga” and “lbfgs” handle multinomial loss; “liblinear” is limited to one-versus-rest schemes.
”newton-cg”, “lbfgs”, “sag” and “saga” handle L2 or no penalty
”liblinear” and “saga” also handle L1 penalty
”saga” also supports “elasticnet” penalty
”liblinear” does not support setting penalty=’none’
Defaults to “saga”.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “C”: Real(0.01, 10), “l1_ratio”: Real(0, 1)}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Elastic Net Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
ElasticNetRegressor
(alpha=0.0001, l1_ratio=0.15, max_iter=1000, normalize=False, random_seed=0, **kwargs)[source]¶ Elastic Net Regressor.
- Parameters
alpha (float) – Constant that multiplies the penalty terms. Defaults to 0.0001.
l1_ratio (float) – The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’. Setting l1_ratio=0 is equivalent to using penalty=’l2’, while setting l1_ratio=1 is equivalent to using penalty=’l1’. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15.
max_iter (int) – The maximum number of iterations. Defaults to 1000.
normalize (boolean) – If True, the regressors will be normalized before regression by subtracting the mean and dividing by the l2-norm. Defaults to False.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “alpha”: Real(0, 1), “l1_ratio”: Real(0, 1),}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Elastic Net Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
Estimator
(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]¶ A component that fits and predicts given data.
To implement a new Estimator, define your own class which is a subclass of Estimator, including a name and a list of acceptable ranges for any parameters to be tuned during the automl search (hyperparameters). Define an __init__ method which sets up any necessary state and objects. Make sure your __init__ only uses standard keyword arguments and calls super().__init__() with a parameters dict. You may also override the fit, transform, fit_transform and other methods in this class if appropriate.
To see some examples, check out the definitions of any Estimator component.
- Parameters
parameters (dict) – Dictionary of parameters for the component. Defaults to None.
component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
predict_uses_y
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns string name of this component
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
Problem types this estimator supports
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
property
name
(cls)¶ Returns string name of this component
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)[source]¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
property
supported_problem_types
(cls)¶ Problem types this estimator supports
-
class
evalml.pipelines.components.estimators.
ExtraTreesClassifier
(n_estimators=100, max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Extra Trees Classifier.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
to 2. (Defaults) –
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.EXTRA_TREES
modifies_features
True
modifies_target
False
name
Extra Trees Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
ExtraTreesRegressor
(n_estimators=100, max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Extra Trees Regressor.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
to 2. (Defaults) –
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.EXTRA_TREES
modifies_features
True
modifies_target
False
name
Extra Trees Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
KNeighborsClassifier
(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, random_seed=0, **kwargs)[source]¶ K-Nearest Neighbors Classifier.
- Parameters
n_neighbors (int) – Number of neighbors to use by default. Defaults to 5.
weights ({‘uniform’, ‘distance’} or callable) –
Weight function used in prediction. Can be:
‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
Defaults to “uniform”.
algorithm ({‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}) –
Algorithm used to compute the nearest neighbors:
‘ball_tree’ will use BallTree
‘kd_tree’ will use KDTree
‘brute’ will use a brute-force search.
‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method. Defaults to “auto”. Note: fitting on sparse input will override the setting of this parameter, using brute force.
leaf_size (int) – Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. Defaults to 30.
p (int) – Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. Defaults to 2.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_neighbors”: Integer(2, 12), “weights”: [“uniform”, “distance”], “algorithm”: [“auto”, “ball_tree”, “kd_tree”, “brute”], “leaf_size”: Integer(10, 30), “p”: Integer(1, 5),}
model_family
ModelFamily.K_NEIGHBORS
modifies_features
True
modifies_target
False
name
KNN Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns array of 0’s matching the input number of features as feature_importance is
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns array of 0’s matching the input number of features as feature_importance is not defined for KNN classifiers.
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
LightGBMClassifier
(boosting_type='gbdt', learning_rate=0.1, n_estimators=100, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ LightGBM Classifier.
- Parameters
boosting_type (string) – Type of boosting to use. Defaults to “gbdt”. - ‘gbdt’ uses traditional Gradient Boosting Decision Tree - “dart”, uses Dropouts meet Multiple Additive Regression Trees - “goss”, uses Gradient-based One-Side Sampling - “rf”, uses Random Forest
learning_rate (float) – Boosting learning rate. Defaults to 0.1.
n_estimators (int) – Number of boosted trees to fit. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners, <=0 means no limit. Defaults to 0.
num_leaves (int) – Maximum tree leaves for base learners. Defaults to 31.
min_child_samples (int) – Minimum number of data needed in a child (leaf). Defaults to 20.
bagging_fraction (float) – LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9.
bagging_freq (int) – Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0.
n_jobs (int or None) – Number of threads to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “learning_rate”: Real(0.000001, 1), “boosting_type”: [“gbdt”, “dart”, “goss”, “rf”], “n_estimators”: Integer(10, 100), “max_depth”: Integer(0, 10), “num_leaves”: Integer(2, 100), “min_child_samples”: Integer(1, 100), “bagging_fraction”: Real(0.000001, 1), “bagging_freq”: Integer(0, 1),}
model_family
ModelFamily.LIGHTGBM
modifies_features
True
modifies_target
False
name
LightGBM Classifier
predict_uses_y
False
SEED_MAX
SEED_BOUNDS.max_bound
SEED_MIN
0
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)[source]¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
LightGBMRegressor
(boosting_type='gbdt', learning_rate=0.1, n_estimators=20, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ LightGBM Regressor.
- Parameters
boosting_type (string) – Type of boosting to use. Defaults to “gbdt”. - ‘gbdt’ uses traditional Gradient Boosting Decision Tree - “dart”, uses Dropouts meet Multiple Additive Regression Trees - “goss”, uses Gradient-based One-Side Sampling - “rf”, uses Random Forest
learning_rate (float) – Boosting learning rate. Defaults to 0.1.
n_estimators (int) – Number of boosted trees to fit. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners, <=0 means no limit. Defaults to 0.
num_leaves (int) – Maximum tree leaves for base learners. Defaults to 31.
min_child_samples (int) – Minimum number of data needed in a child (leaf). Defaults to 20.
bagging_fraction (float) – LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9.
bagging_freq (int) – Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0.
n_jobs (int or None) – Number of threads to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “learning_rate”: Real(0.000001, 1), “boosting_type”: [“gbdt”, “dart”, “goss”, “rf”], “n_estimators”: Integer(10, 100), “max_depth”: Integer(0, 10), “num_leaves”: Integer(2, 100), “min_child_samples”: Integer(1, 100), “bagging_fraction”: Real(0.000001, 1), “bagging_freq”: Integer(0, 1),}
model_family
ModelFamily.LIGHTGBM
modifies_features
True
modifies_target
False
name
LightGBM Regressor
predict_uses_y
False
SEED_MAX
SEED_BOUNDS.max_bound
SEED_MIN
0
supported_problem_types
[ProblemTypes.REGRESSION]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
LinearRegressor
(fit_intercept=True, normalize=False, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Linear Regressor.
- Parameters
fit_intercept (boolean) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered). Defaults to True.
normalize (boolean) – If True, the regressors will be normalized before regression by subtracting the mean and dividing by the l2-norm. This parameter is ignored when fit_intercept is set to False. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “fit_intercept”: [True, False], “normalize”: [True, False]}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Linear Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
LogisticRegressionClassifier
(penalty='l2', C=1.0, multi_class='auto', solver='lbfgs', n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Logistic Regression Classifier.
- Parameters
penalty ({"l1", "l2", "elasticnet", "none"}) – The norm used in penalization. Defaults to “l2”.
C (float) – Inverse of regularization strength. Must be a positive float. Defaults to 1.0.
multi_class ({"auto", "ovr", "multinomial"}) – If the option chosen is “ovr”, then a binary problem is fit for each label. For “multinomial” the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. “multinomial” is unavailable when solver=”liblinear”. “auto” selects “ovr” if the data is binary, or if solver=”liblinear”, and otherwise selects “multinomial”. Defaults to “auto”.
solver ({"newton-cg", "lbfgs", "liblinear", "sag", "saga"}) –
Algorithm to use in the optimization problem. For small datasets, “liblinear” is a good choice, whereas “sag” and “saga” are faster for large ones. For multiclass problems, only “newton-cg”, “sag”, “saga” and “lbfgs” handle multinomial loss; “liblinear” is limited to one-versus-rest schemes.
”newton-cg”, “lbfgs”, “sag” and “saga” handle L2 or no penalty
”liblinear” and “saga” also handle L1 penalty
”saga” also supports “elasticnet” penalty
”liblinear” does not support setting penalty=’none’
Defaults to “lbfgs”.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “penalty”: [“l2”], “C”: Real(0.01, 10),}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Logistic Regression Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
ProphetRegressor
(date_column='ds', changepoint_prior_scale=0.05, seasonality_prior_scale=10, holidays_prior_scale=10, seasonality_mode='additive', random_seed=0, stan_backend='CMDSTANPY', **kwargs)[source]¶ Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
More information here: https://facebook.github.io/prophet/
Attributes
hyperparameter_ranges
{ “changepoint_prior_scale”: Real(0.001, 0.5), “seasonality_prior_scale”: Real(0.01, 10), “holidays_prior_scale”: Real(0.01, 10), “seasonality_mode”: [“additive”, “multiplicative”],}
model_family
ModelFamily.PROPHET
modifies_features
True
modifies_target
False
name
Prophet Regressor
predict_uses_y
False
supported_problem_types
[ProblemTypes.TIME_SERIES_REGRESSION]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns array of 0’s with len(1) as feature_importance is not defined for Prophet regressor.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns array of 0’s with len(1) as feature_importance is not defined for Prophet regressor.
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X, y=None)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
-
class
evalml.pipelines.components.estimators.
RandomForestClassifier
(n_estimators=100, max_depth=6, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Random Forest Classifier.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_depth”: Integer(1, 10),}
model_family
ModelFamily.RANDOM_FOREST
modifies_features
True
modifies_target
False
name
Random Forest Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
RandomForestRegressor
(n_estimators=100, max_depth=6, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Random Forest Regressor.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_depth”: Integer(1, 32),}
model_family
ModelFamily.RANDOM_FOREST
modifies_features
True
modifies_target
False
name
Random Forest Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
SVMClassifier
(C=1.0, kernel='rbf', gamma='scale', probability=True, random_seed=0, **kwargs)[source]¶ Support Vector Machine Classifier.
- Parameters
C (float) – The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0.
kernel ({"linear", "poly", "rbf", "sigmoid", "precomputed"}) – Specifies the kernel type to be used in the algorithm. Defaults to “rbf”.
gamma ({"scale", "auto"} or float) – Kernel coefficient for “rbf”, “poly” and “sigmoid”. Defaults to “scale”. - If gamma=’scale’ (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma - if “auto”, uses 1 / n_features
probability (boolean) – Whether to enable probability estimates. Defaults to True.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “C”: Real(0, 10), “kernel”: [“linear”, “poly”, “rbf”, “sigmoid”, “precomputed”], “gamma”: [“scale”, “auto”],}
model_family
ModelFamily.SVM
modifies_features
True
modifies_target
False
name
SVM Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Feature importance only works with linear kernels.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Feature importance only works with linear kernels. If the kernel isn’t linear, we return a numpy array of zeros
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
SVMRegressor
(C=1.0, kernel='rbf', gamma='scale', random_seed=0, **kwargs)[source]¶ Support Vector Machine Regressor.
- Parameters
C (float) – The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0.
kernel ({"linear", "poly", "rbf", "sigmoid", "precomputed"}) – Specifies the kernel type to be used in the algorithm. Defaults to “rbf”.
gamma ({"scale", "auto"} or float) – Kernel coefficient for “rbf”, “poly” and “sigmoid”. Defaults to “scale”. - If gamma=’scale’ (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma - if “auto”, uses 1 / n_features
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “C”: Real(0, 10), “kernel”: [“linear”, “poly”, “rbf”, “sigmoid”, “precomputed”], “gamma”: [“scale”, “auto”],}
model_family
ModelFamily.SVM
modifies_features
True
modifies_target
False
name
SVM Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Feature importance only works with linear kernels.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Feature importance only works with linear kernels. If the kernel isn’t linear, we return a numpy array of zeros
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
TimeSeriesBaselineEstimator
(gap=1, random_seed=0, **kwargs)[source]¶ Time series estimator that predicts using the naive forecasting approach.
This is useful as a simple baseline estimator for time series problems.
- Parameters
gap (int) – Gap between prediction date and target date and must be a positive integer. If gap is 0, target date will be shifted ahead by 1 time period. Defaults to 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.BASELINE
modifies_features
True
modifies_target
False
name
Time Series Baseline Estimator
predict_uses_y
True
supported_problem_types
[ ProblemTypes.TIME_SERIES_REGRESSION, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
Since baseline estimators do not use input features to calculate predictions, returns an array of zeroes.
- Returns
an array of zeroes
- Return type
np.ndarray (float)
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X, y=None)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X, y=None)[source]¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
XGBoostClassifier
(eta=0.1, max_depth=6, min_child_weight=1, n_estimators=100, random_seed=0, n_jobs=- 1, **kwargs)[source]¶ XGBoost Classifier.
- Parameters
eta (float) – Boosting learning rate. Defaults to 0.1.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
min_child_weight (float) – Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0
n_estimators (int) – Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100.
random_seed (int) – Seed for the random number generator. Defaults to 0.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.
Attributes
hyperparameter_ranges
{ “eta”: Real(0.000001, 1), “max_depth”: Integer(1, 10), “min_child_weight”: Real(1, 10), “n_estimators”: Integer(1, 1000),}
model_family
ModelFamily.XGBOOST
modifies_features
True
modifies_target
False
name
XGBoost Classifier
predict_uses_y
False
SEED_MAX
None
SEED_MIN
None
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)[source]¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.
XGBoostRegressor
(eta=0.1, max_depth=6, min_child_weight=1, n_estimators=100, random_seed=0, n_jobs=- 1, **kwargs)[source]¶ XGBoost Regressor.
- Parameters
eta (float) – Boosting learning rate. Defaults to 0.1.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
min_child_weight (float) – Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0
n_estimators (int) – Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100.
random_seed (int) – Seed for the random number generator. Defaults to 0.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.
Attributes
hyperparameter_ranges
{ “eta”: Real(0.000001, 1), “max_depth”: Integer(1, 20), “min_child_weight”: Real(1, 10), “n_estimators”: Integer(1, 1000),}
model_family
ModelFamily.XGBOOST
modifies_features
True
modifies_target
False
name
XGBoost Regressor
predict_uses_y
False
SEED_MAX
None
SEED_MIN
None
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None