regressors¶
Submodules¶
Package Contents¶
Classes Summary¶
Autoregressive Integrated Moving Average Model. |
|
Baseline regressor that uses a simple strategy to make predictions. |
|
CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. |
|
Decision Tree Regressor. |
|
Elastic Net Regressor. |
|
Extra Trees Regressor. |
|
LightGBM Regressor. |
|
Linear Regressor. |
|
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. |
|
Random Forest Regressor. |
|
Support Vector Machine Regressor. |
|
Time series estimator that predicts using the naive forecasting approach. |
|
XGBoost Regressor. |
Contents¶
-
class
evalml.pipelines.components.estimators.regressors.
ARIMARegressor
(date_index=None, trend=None, start_p=2, d=0, start_q=2, max_p=5, max_d=2, max_q=5, seasonal=True, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Autoregressive Integrated Moving Average Model. The three parameters (p, d, q) are the AR order, the degree of differencing, and the MA order. More information here: https://www.statsmodels.org/devel/generated/statsmodels.tsa.arima_model.ARIMA.html
Currently ARIMARegressor isn’t supported via conda install. It’s recommended that it be installed via PyPI.
- Parameters
date_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.
trend (str) – Controls the deterministic trend. Options are [‘n’, ‘c’, ‘t’, ‘ct’] where ‘c’ is a constant term, ‘t’ indicates a linear trend, and ‘ct’ is both. Can also be an iterable when defining a polynomial, such as [1, 1, 0, 1].
start_p (int) – Minimum Autoregressive order. Defaults to 2.
d (int) – Minimum Differencing degree. Defaults to 0.
start_q (int) – Minimum Moving Average order. Defaults to 2.
max_p (int) – Maximum Autoregressive order. Defaults to 5.
max_d (int) – Maximum Differencing degree. Defaults to 2.
max_q (int) – Maximum Moving Average order. Defaults to 5.
seasonal (boolean) – Whether to fit a seasonal model to ARIMA. Defaults to True.
n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “start_p”: Integer(1, 3), “d”: Integer(0, 2), “start_q”: Integer(1, 3), “max_p”: Integer(3, 10), “max_d”: Integer(2, 5), “max_q”: Integer(3, 10), “seasonal”: [True, False],}
model_family
ModelFamily.ARIMA
modifies_features
True
modifies_target
False
name
ARIMA Regressor
predict_uses_y
False
supported_problem_types
[ProblemTypes.TIME_SERIES_REGRESSION]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns array of 0’s with a length of 1 as feature_importance is not defined for ARIMA regressor.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns array of 0’s with a length of 1 as feature_importance is not defined for ARIMA regressor.
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X, y=None)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.regressors.
BaselineRegressor
(strategy='mean', random_seed=0, **kwargs)[source]¶ Baseline regressor that uses a simple strategy to make predictions. This is useful as a simple baseline regressor to compare with other regressors.
- Parameters
strategy (str) – Method used to predict. Valid options are “mean”, “median”. Defaults to “mean”.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.BASELINE
modifies_features
True
modifies_target
False
name
Baseline Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature. Since baseline regressors do not use input features to calculate predictions, returns an array of zeroes.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature. Since baseline regressors do not use input features to calculate predictions, returns an array of zeroes.
- Returns
An array of zeroes
- Return type
np.ndarray (float)
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.regressors.
CatBoostRegressor
(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=False, allow_writing_files=False, random_seed=0, n_jobs=- 1, **kwargs)[source]¶ CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.
For more information, check out https://catboost.ai/
- Parameters
n_estimators (float) – The maximum number of trees to build. Defaults to 10.
eta (float) – The learning rate. Defaults to 0.03.
max_depth (int) – The maximum tree depth for base learners. Defaults to 6.
bootstrap_type (string) – Defines the method for sampling the weights of objects. Available methods are ‘Bayesian’, ‘Bernoulli’, ‘MVS’. Defaults to None.
silent (boolean) – Whether to use the “silent” logging mode. Defaults to True.
allow_writing_files (boolean) – Whether to allow writing snapshot files while training. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(4, 100), “eta”: Real(0.000001, 1), “max_depth”: Integer(4, 10),}
model_family
ModelFamily.CATBOOST
modifies_features
True
modifies_target
False
name
CatBoost Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.regressors.
DecisionTreeRegressor
(criterion='mse', max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs)[source]¶ Decision Tree Regressor.
- Parameters
criterion ({"mse", "friedman_mse", "mae", "poisson"}) –
The function to measure the quality of a split. Supported criteria are:
”mse” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node
”friedman_mse”, which uses mean squared error with Friedman”s improvement score for potential splits
”mae” for the mean absolute error, which minimizes the L1 loss using the median of each terminal node,
”poisson” which uses reduction in Poisson deviance to find splits.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Defaults to 2.
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “criterion”: [“mse”, “friedman_mse”, “mae”], “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.DECISION_TREE
modifies_features
True
modifies_target
False
name
Decision Tree Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.regressors.
ElasticNetRegressor
(alpha=0.0001, l1_ratio=0.15, max_iter=1000, normalize=False, random_seed=0, **kwargs)[source]¶ Elastic Net Regressor.
- Parameters
alpha (float) – Constant that multiplies the penalty terms. Defaults to 0.0001.
l1_ratio (float) – The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’. Setting l1_ratio=0 is equivalent to using penalty=’l2’, while setting l1_ratio=1 is equivalent to using penalty=’l1’. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15.
max_iter (int) – The maximum number of iterations. Defaults to 1000.
normalize (boolean) – If True, the regressors will be normalized before regression by subtracting the mean and dividing by the l2-norm. Defaults to False.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “alpha”: Real(0, 1), “l1_ratio”: Real(0, 1),}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Elastic Net Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.regressors.
ExtraTreesRegressor
(n_estimators=100, max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Extra Trees Regressor.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
to 2. (Defaults) –
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.EXTRA_TREES
modifies_features
True
modifies_target
False
name
Extra Trees Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.regressors.
LightGBMRegressor
(boosting_type='gbdt', learning_rate=0.1, n_estimators=20, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ LightGBM Regressor.
- Parameters
boosting_type (string) – Type of boosting to use. Defaults to “gbdt”. - ‘gbdt’ uses traditional Gradient Boosting Decision Tree - “dart”, uses Dropouts meet Multiple Additive Regression Trees - “goss”, uses Gradient-based One-Side Sampling - “rf”, uses Random Forest
learning_rate (float) – Boosting learning rate. Defaults to 0.1.
n_estimators (int) – Number of boosted trees to fit. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners, <=0 means no limit. Defaults to 0.
num_leaves (int) – Maximum tree leaves for base learners. Defaults to 31.
min_child_samples (int) – Minimum number of data needed in a child (leaf). Defaults to 20.
bagging_fraction (float) – LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9.
bagging_freq (int) – Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0.
n_jobs (int or None) – Number of threads to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “learning_rate”: Real(0.000001, 1), “boosting_type”: [“gbdt”, “dart”, “goss”, “rf”], “n_estimators”: Integer(10, 100), “max_depth”: Integer(0, 10), “num_leaves”: Integer(2, 100), “min_child_samples”: Integer(1, 100), “bagging_fraction”: Real(0.000001, 1), “bagging_freq”: Integer(0, 1),}
model_family
ModelFamily.LIGHTGBM
modifies_features
True
modifies_target
False
name
LightGBM Regressor
predict_uses_y
False
SEED_MAX
SEED_BOUNDS.max_bound
SEED_MIN
0
supported_problem_types
[ProblemTypes.REGRESSION]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.regressors.
LinearRegressor
(fit_intercept=True, normalize=False, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Linear Regressor.
- Parameters
fit_intercept (boolean) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered). Defaults to True.
normalize (boolean) – If True, the regressors will be normalized before regression by subtracting the mean and dividing by the l2-norm. This parameter is ignored when fit_intercept is set to False. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “fit_intercept”: [True, False], “normalize”: [True, False]}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Linear Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.regressors.
ProphetRegressor
(date_index=None, changepoint_prior_scale=0.05, seasonality_prior_scale=10, holidays_prior_scale=10, seasonality_mode='additive', random_seed=0, stan_backend='CMDSTANPY', **kwargs)[source]¶ Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
More information here: https://facebook.github.io/prophet/
Attributes
hyperparameter_ranges
{ “changepoint_prior_scale”: Real(0.001, 0.5), “seasonality_prior_scale”: Real(0.01, 10), “holidays_prior_scale”: Real(0.01, 10), “seasonality_mode”: [“additive”, “multiplicative”],}
model_family
ModelFamily.PROPHET
modifies_features
True
modifies_target
False
name
Prophet Regressor
predict_uses_y
False
supported_problem_types
[ProblemTypes.TIME_SERIES_REGRESSION]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns array of 0’s with len(1) as feature_importance is not defined for Prophet regressor.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns array of 0’s with len(1) as feature_importance is not defined for Prophet regressor.
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X, y=None)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
-
class
evalml.pipelines.components.estimators.regressors.
RandomForestRegressor
(n_estimators=100, max_depth=6, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Random Forest Regressor.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_depth”: Integer(1, 32),}
model_family
ModelFamily.RANDOM_FOREST
modifies_features
True
modifies_target
False
name
Random Forest Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.regressors.
SVMRegressor
(C=1.0, kernel='rbf', gamma='auto', random_seed=0, **kwargs)[source]¶ Support Vector Machine Regressor.
- Parameters
C (float) – The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0.
kernel ({"poly", "rbf", "sigmoid"}) – Specifies the kernel type to be used in the algorithm. Defaults to “rbf”.
gamma ({"scale", "auto"} or float) – Kernel coefficient for “rbf”, “poly” and “sigmoid”. Defaults to “auto”. - If gamma=’scale’ is passed then it uses 1 / (n_features * X.var()) as value of gamma - If “auto” (default), uses 1 / n_features
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “C”: Real(0, 10), “kernel”: [“poly”, “rbf”, “sigmoid”], “gamma”: [“scale”, “auto”],}
model_family
ModelFamily.SVM
modifies_features
True
modifies_target
False
name
SVM Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Feature importance only works with linear kernels.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Feature importance only works with linear kernels. If the kernel isn’t linear, we return a numpy array of zeros
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.regressors.
TimeSeriesBaselineEstimator
(gap=1, random_seed=0, **kwargs)[source]¶ Time series estimator that predicts using the naive forecasting approach.
This is useful as a simple baseline estimator for time series problems.
- Parameters
gap (int) – Gap between prediction date and target date and must be a positive integer. If gap is 0, target date will be shifted ahead by 1 time period. Defaults to 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.BASELINE
modifies_features
True
modifies_target
False
name
Time Series Baseline Estimator
predict_uses_y
True
supported_problem_types
[ ProblemTypes.TIME_SERIES_REGRESSION, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
Since baseline estimators do not use input features to calculate predictions, returns an array of zeroes.
- Returns
an array of zeroes
- Return type
np.ndarray (float)
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X, y=None)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X, y=None)[source]¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.components.estimators.regressors.
XGBoostRegressor
(eta=0.1, max_depth=6, min_child_weight=1, n_estimators=100, random_seed=0, n_jobs=- 1, **kwargs)[source]¶ XGBoost Regressor.
- Parameters
eta (float) – Boosting learning rate. Defaults to 0.1.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
min_child_weight (float) – Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0
n_estimators (int) – Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100.
random_seed (int) – Seed for the random number generator. Defaults to 0.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.
Attributes
hyperparameter_ranges
{ “eta”: Real(0.000001, 1), “max_depth”: Integer(1, 20), “min_child_weight”: Real(1, 10), “n_estimators”: Integer(1, 1000),}
model_family
ModelFamily.XGBOOST
modifies_features
True
modifies_target
False
name
XGBoost Regressor
predict_uses_y
False
SEED_MAX
None
SEED_MIN
None
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None