regressors¶

Submodules¶

Package Contents¶

Classes Summary¶

`ARIMARegressor`	Autoregressive Integrated Moving Average Model.
`BaselineRegressor`	Baseline regressor that uses a simple strategy to make predictions.
`CatBoostRegressor`	CatBoost Regressor, a regressor that uses gradient-boosting on decision trees.
`DecisionTreeRegressor`	Decision Tree Regressor.
`ElasticNetRegressor`	Elastic Net Regressor.
`ExtraTreesRegressor`	Extra Trees Regressor.
`LightGBMRegressor`	LightGBM Regressor.
`LinearRegressor`	Linear Regressor.
`ProphetRegressor`	Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.
`RandomForestRegressor`	Random Forest Regressor.
`SVMRegressor`	Support Vector Machine Regressor.
`TimeSeriesBaselineEstimator`	Time series estimator that predicts using the naive forecasting approach.
`XGBoostRegressor`	XGBoost Regressor.

Contents¶

class evalml.pipelines.components.estimators.regressors.ARIMARegressor(date_index=None, trend=None, start_p=2, d=0, start_q=2, max_p=5, max_d=2, max_q=5, seasonal=True, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Autoregressive Integrated Moving Average Model. The three parameters (p, d, q) are the AR order, the degree of differencing, and the MA order. More information here: https://www.statsmodels.org/devel/generated/statsmodels.tsa.arima_model.ARIMA.html

Currently ARIMARegressor isn’t supported via conda install. It’s recommended that it be installed via PyPI.

Parameters

date_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.
trend (str) – Controls the deterministic trend. Options are [‘n’, ‘c’, ‘t’, ‘ct’] where ‘c’ is a constant term, ‘t’ indicates a linear trend, and ‘ct’ is both. Can also be an iterable when defining a polynomial, such as [1, 1, 0, 1].
start_p (int) – Minimum Autoregressive order. Defaults to 2.
d (int) – Minimum Differencing degree. Defaults to 0.
start_q (int) – Minimum Moving Average order. Defaults to 2.
max_p (int) – Maximum Autoregressive order. Defaults to 5.
max_d (int) – Maximum Differencing degree. Defaults to 2.
max_q (int) – Maximum Moving Average order. Defaults to 5.
seasonal (boolean) – Whether to fit a seasonal model to ARIMA. Defaults to True.
n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “start_p”: Integer(1, 3), “d”: Integer(0, 2), “start_q”: Integer(1, 3), “max_p”: Integer(3, 10), “max_d”: Integer(2, 5), “max_q”: Integer(3, 10), “seasonal”: [True, False],}
model_family	ModelFamily.ARIMA
modifies_features	True
modifies_target	False
name	ARIMA Regressor
predict_uses_y	False
supported_problem_types	[ProblemTypes.TIME_SERIES_REGRESSION]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns array of 0’s with a length of 1 as feature_importance is not defined for ARIMA regressor.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶: Returns array of 0’s with a length of 1 as feature_importance is not defined for ARIMA regressor.

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X, y=None)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.estimators.regressors.BaselineRegressor(strategy='mean', random_seed=0, **kwargs)[source]¶

Baseline regressor that uses a simple strategy to make predictions. This is useful as a simple baseline regressor to compare with other regressors.

Parameters

strategy (str) – Method used to predict. Valid options are “mean”, “median”. Defaults to “mean”.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.BASELINE
modifies_features	True
modifies_target	False
name	Baseline Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature. Since baseline regressors do not use input features to calculate predictions, returns an array of zeroes.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature. Since baseline regressors do not use input features to calculate predictions, returns an array of zeroes.

Returns: An array of zeroes
Return type: np.ndarray (float)

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.estimators.regressors.CatBoostRegressor(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=False, allow_writing_files=False, random_seed=0, n_jobs=- 1, **kwargs)[source]¶

CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.

For more information, check out https://catboost.ai/

Parameters

n_estimators (float) – The maximum number of trees to build. Defaults to 10.
eta (float) – The learning rate. Defaults to 0.03.
max_depth (int) – The maximum tree depth for base learners. Defaults to 6.
bootstrap_type (string) – Defines the method for sampling the weights of objects. Available methods are ‘Bayesian’, ‘Bernoulli’, ‘MVS’. Defaults to None.
silent (boolean) – Whether to use the “silent” logging mode. Defaults to True.
allow_writing_files (boolean) – Whether to allow writing snapshot files while training. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “n_estimators”: Integer(4, 100), “eta”: Real(0.000001, 1), “max_depth”: Integer(4, 10),}
model_family	ModelFamily.CATBOOST
modifies_features	True
modifies_target	False
name	CatBoost Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.estimators.regressors.DecisionTreeRegressor(criterion='mse', max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs)[source]¶

Decision Tree Regressor.

Parameters

criterion ({"mse", "friedman_mse", "mae", "poisson"}) –
The function to measure the quality of a split. Supported criteria are:
- ”mse” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node
- ”friedman_mse”, which uses mean squared error with Friedman”s improvement score for potential splits
- ”mae” for the mean absolute error, which minimizes the L1 loss using the median of each terminal node,
- ”poisson” which uses reduction in Poisson deviance to find splits.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
- If “auto”, then max_features=sqrt(n_features).
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
- If int, then consider min_samples_split as the minimum number.
- If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Defaults to 2.
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “criterion”: [“mse”, “friedman_mse”, “mae”], “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family	ModelFamily.DECISION_TREE
modifies_features	True
modifies_target	False
name	Decision Tree Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.estimators.regressors.ElasticNetRegressor(alpha=0.0001, l1_ratio=0.15, max_iter=1000, normalize=False, random_seed=0, **kwargs)[source]¶

Elastic Net Regressor.

Parameters

alpha (float) – Constant that multiplies the penalty terms. Defaults to 0.0001.
l1_ratio (float) – The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’. Setting l1_ratio=0 is equivalent to using penalty=’l2’, while setting l1_ratio=1 is equivalent to using penalty=’l1’. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15.
max_iter (int) – The maximum number of iterations. Defaults to 1000.
normalize (boolean) – If True, the regressors will be normalized before regression by subtracting the mean and dividing by the l2-norm. Defaults to False.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “alpha”: Real(0, 1), “l1_ratio”: Real(0, 1),}
model_family	ModelFamily.LINEAR_MODEL
modifies_features	True
modifies_target	False
name	Elastic Net Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.estimators.regressors.ExtraTreesRegressor(n_estimators=100, max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Extra Trees Regressor.

Parameters

n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
- If “auto”, then max_features=sqrt(n_features).
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
- If int, then consider min_samples_split as the minimum number.
- If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
to 2. (Defaults) –
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “n_estimators”: Integer(10, 1000), “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family	ModelFamily.EXTRA_TREES
modifies_features	True
modifies_target	False
name	Extra Trees Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.estimators.regressors.LightGBMRegressor(boosting_type='gbdt', learning_rate=0.1, n_estimators=20, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

LightGBM Regressor.

Parameters

boosting_type (string) – Type of boosting to use. Defaults to “gbdt”. - ‘gbdt’ uses traditional Gradient Boosting Decision Tree - “dart”, uses Dropouts meet Multiple Additive Regression Trees - “goss”, uses Gradient-based One-Side Sampling - “rf”, uses Random Forest
learning_rate (float) – Boosting learning rate. Defaults to 0.1.
n_estimators (int) – Number of boosted trees to fit. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners, <=0 means no limit. Defaults to 0.
num_leaves (int) – Maximum tree leaves for base learners. Defaults to 31.
min_child_samples (int) – Minimum number of data needed in a child (leaf). Defaults to 20.
bagging_fraction (float) – LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9.
bagging_freq (int) – Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0.
n_jobs (int or None) – Number of threads to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “learning_rate”: Real(0.000001, 1), “boosting_type”: [“gbdt”, “dart”, “goss”, “rf”], “n_estimators”: Integer(10, 100), “max_depth”: Integer(0, 10), “num_leaves”: Integer(2, 100), “min_child_samples”: Integer(1, 100), “bagging_fraction”: Real(0.000001, 1), “bagging_freq”: Integer(0, 1),}
model_family	ModelFamily.LIGHTGBM
modifies_features	True
modifies_target	False
name	LightGBM Regressor
predict_uses_y	False
SEED_MAX	SEED_BOUNDS.max_bound
SEED_MIN	0
supported_problem_types	[ProblemTypes.REGRESSION]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.estimators.regressors.LinearRegressor(fit_intercept=True, normalize=False, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Linear Regressor.

Parameters

fit_intercept (boolean) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered). Defaults to True.
normalize (boolean) – If True, the regressors will be normalized before regression by subtracting the mean and dividing by the l2-norm. This parameter is ignored when fit_intercept is set to False. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “fit_intercept”: [True, False], “normalize”: [True, False]}
model_family	ModelFamily.LINEAR_MODEL
modifies_features	True
modifies_target	False
name	Linear Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.estimators.regressors.ProphetRegressor(date_index=None, changepoint_prior_scale=0.05, seasonality_prior_scale=10, holidays_prior_scale=10, seasonality_mode='additive', random_seed=0, stan_backend='CMDSTANPY', **kwargs)[source]¶

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

More information here: https://facebook.github.io/prophet/

Attributes

hyperparameter_ranges	{ “changepoint_prior_scale”: Real(0.001, 0.5), “seasonality_prior_scale”: Real(0.01, 10), “holidays_prior_scale”: Real(0.01, 10), “seasonality_mode”: [“additive”, “multiplicative”],}
model_family	ModelFamily.PROPHET
modifies_features	True
modifies_target	False
name	Prophet Regressor
predict_uses_y	False
supported_problem_types	[ProblemTypes.TIME_SERIES_REGRESSION]

Methods

`build_prophet_df`
`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns array of 0’s with len(1) as feature_importance is not defined for Prophet regressor.
`fit`	Fits component to data
`get_params`
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

static build_prophet_df(X, y=None, date_column='ds')[source]¶

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶: Returns array of 0’s with len(1) as feature_importance is not defined for Prophet regressor.

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

get_params(self)[source]¶

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X, y=None)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.estimators.regressors.RandomForestRegressor(n_estimators=100, max_depth=6, n_jobs=- 1, random_seed=0, **kwargs)[source]¶

Random Forest Regressor.

Parameters

n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “n_estimators”: Integer(10, 1000), “max_depth”: Integer(1, 32),}
model_family	ModelFamily.RANDOM_FOREST
modifies_features	True
modifies_target	False
name	Random Forest Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.estimators.regressors.SVMRegressor(C=1.0, kernel='rbf', gamma='auto', random_seed=0, **kwargs)[source]¶

Support Vector Machine Regressor.

Parameters

C (float) – The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0.
kernel ({"poly", "rbf", "sigmoid"}) – Specifies the kernel type to be used in the algorithm. Defaults to “rbf”.
gamma ({"scale", "auto"} or float) – Kernel coefficient for “rbf”, “poly” and “sigmoid”. Defaults to “auto”. - If gamma=’scale’ is passed then it uses 1 / (n_features * X.var()) as value of gamma - If “auto” (default), uses 1 / n_features
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{ “C”: Real(0, 10), “kernel”: [“poly”, “rbf”, “sigmoid”], “gamma”: [“scale”, “auto”],}
model_family	ModelFamily.SVM
modifies_features	True
modifies_target	False
name	SVM Regressor
predict_uses_y	False
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Feature importance only works with linear kernels.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶: Feature importance only works with linear kernels. If the kernel isn’t linear, we return a numpy array of zeros

fit(self, X, y=None)¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.estimators.regressors.TimeSeriesBaselineEstimator(gap=1, random_seed=0, **kwargs)[source]¶

Time series estimator that predicts using the naive forecasting approach.

This is useful as a simple baseline estimator for time series problems.

Parameters

gap (int) – Gap between prediction date and target date and must be a positive integer. If gap is 0, target date will be shifted ahead by 1 time period. Defaults to 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges	{}
model_family	ModelFamily.BASELINE
modifies_features	True
modifies_target	False
name	Time Series Baseline Estimator
predict_uses_y	True
supported_problem_types	[ ProblemTypes.TIME_SERIES_REGRESSION, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Since baseline estimators do not use input features to calculate predictions, returns an array of zeroes.

Returns: an array of zeroes
Return type: np.ndarray (float)

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X, y=None)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X, y=None)[source]¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

class evalml.pipelines.components.estimators.regressors.XGBoostRegressor(eta=0.1, max_depth=6, min_child_weight=1, n_estimators=100, random_seed=0, n_jobs=- 1, **kwargs)[source]¶

XGBoost Regressor.

Parameters

eta (float) – Boosting learning rate. Defaults to 0.1.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
min_child_weight (float) – Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0
n_estimators (int) – Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100.
random_seed (int) – Seed for the random number generator. Defaults to 0.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.

Attributes

hyperparameter_ranges	{ “eta”: Real(0.000001, 1), “max_depth”: Integer(1, 20), “min_child_weight”: Real(1, 10), “n_estimators”: Integer(1, 1000),}
model_family	ModelFamily.XGBOOST
modifies_features	True
modifies_target	False
name	XGBoost Regressor
predict_uses_y	False
SEED_MAX	None
SEED_MIN	None
supported_problem_types	[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]

Methods

`clone`	Constructs a new component with the same parameters and random state.
`default_parameters`	Returns the default parameters for this component.
`describe`	Describe a component and its parameters
`feature_importance`	Returns importance associated with each feature.
`fit`	Fits component to data
`load`	Loads component at file path
`needs_fitting`	Returns boolean determining if component needs fitting before
`parameters`	Returns the parameters which were used to initialize the component
`predict`	Make predictions using selected features.
`predict_proba`	Make probability estimates for labels.
`save`	Saves component at file path

clone(self)¶

Constructs a new component with the same parameters and random state.

Returns: A new instance of this component with identical parameters and random state.

default_parameters(cls)¶

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns: default parameters for this component.
Return type: dict

describe(self, print_name=False, return_dict=False)¶

Describe a component and its parameters

Parameters

print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

prints and returns dictionary

Return type

None or dict

property feature_importance(self)¶

Returns importance associated with each feature.

Returns: Importance associated with each feature
Return type: np.ndarray

fit(self, X, y=None)[source]¶

Fits component to data

Parameters

X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]

Returns

self

static load(file_path)¶

Loads component at file path

Parameters: file_path (str) – Location to load file
Returns: ComponentBase object

needs_fitting(self)¶: Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

property parameters(self)¶: Returns the parameters which were used to initialize the component

predict(self, X)[source]¶

Make predictions using selected features.

Parameters: X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
Returns: Predicted values
Return type: pd.Series

predict_proba(self, X)¶

Make probability estimates for labels.

Parameters: X (pd.DataFrame, or np.ndarray) – Features
Returns: Probability estimates
Return type: pd.Series

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶

Saves component at file path

Parameters

file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.

Returns

None

xgboost_classifier arima_regressor