Pipelines¶
Submodules¶
Package Contents¶
Classes Summary¶
Autoregressive Integrated Moving Average Model. |
|
Pipeline subclass for all binary classification pipelines. |
|
CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. |
|
CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. |
|
Pipeline subclass for all classification pipelines. |
|
Component graph for a pipeline as a directed acyclic graph (DAG). |
|
Decision Tree Classifier. |
|
Decision Tree Regressor. |
|
Transformer that delays input features and target variable for time series problems. |
|
Featuretools DFS component that generates features for the input features. |
|
Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator. |
|
Elastic Net Regressor. |
|
A component that fits and predicts given data. |
|
Extra Trees Classifier. |
|
Extra Trees Regressor. |
|
Selects top features based on importance weights. |
|
K-Nearest Neighbors Classifier. |
|
LightGBM Classifier. |
|
LightGBM Regressor. |
|
Linear Regressor. |
|
Logistic Regression Classifier. |
|
Pipeline subclass for all multiclass classification pipelines. |
|
A transformer that encodes categorical features in a one-hot numeric array. |
|
Imputes missing data according to a specified imputation strategy per column. |
|
Machine learning pipeline made out of transformers and an Estimator. |
|
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. |
|
Random Forest Classifier. |
|
Random Forest Regressor. |
|
Pipeline subclass for all regression pipelines. |
|
Selects top features based on importance weights using a Random Forest classifier. |
|
Selects top features based on importance weights using a Random Forest regressor. |
|
Imputes missing data according to a specified imputation strategy. |
|
Scikit-learn Stacked Ensemble Classifier. |
|
Scikit-learn Stacked Ensemble Regressor. |
|
A transformer that standardizes input features by removing the mean and scaling to unit variance. |
|
Support Vector Machine Classifier. |
|
Support Vector Machine Regressor. |
|
A transformer that encodes categorical features into target encodings. |
|
Pipeline base class for time series binary classification problems. |
|
Pipeline base class for time series classification problems. |
|
Pipeline base class for time series multiclass classification problems. |
|
Pipeline base class for time series regression problems. |
|
A component that may or may not need fitting that transforms data. |
|
XGBoost Classifier. |
|
XGBoost Regressor. |
Contents¶
-
class
evalml.pipelines.
ARIMARegressor
(date_index=None, trend=None, start_p=2, d=0, start_q=2, max_p=5, max_d=2, max_q=5, seasonal=True, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Autoregressive Integrated Moving Average Model. The three parameters (p, d, q) are the AR order, the degree of differencing, and the MA order. More information here: https://www.statsmodels.org/devel/generated/statsmodels.tsa.arima_model.ARIMA.html
Currently ARIMARegressor isn’t supported via conda install. It’s recommended that it be installed via PyPI.
- Parameters
date_index (str) – Specifies the name of the column in X that provides the datetime objects. Defaults to None.
trend (str) – Controls the deterministic trend. Options are [‘n’, ‘c’, ‘t’, ‘ct’] where ‘c’ is a constant term, ‘t’ indicates a linear trend, and ‘ct’ is both. Can also be an iterable when defining a polynomial, such as [1, 1, 0, 1].
start_p (int) – Minimum Autoregressive order. Defaults to 2.
d (int) – Minimum Differencing degree. Defaults to 0.
start_q (int) – Minimum Moving Average order. Defaults to 2.
max_p (int) – Maximum Autoregressive order. Defaults to 5.
max_d (int) – Maximum Differencing degree. Defaults to 2.
max_q (int) – Maximum Moving Average order. Defaults to 5.
seasonal (boolean) – Whether to fit a seasonal model to ARIMA. Defaults to True.
n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “start_p”: Integer(1, 3), “d”: Integer(0, 2), “start_q”: Integer(1, 3), “max_p”: Integer(3, 10), “max_d”: Integer(2, 5), “max_q”: Integer(3, 10), “seasonal”: [True, False],}
model_family
ModelFamily.ARIMA
modifies_features
True
modifies_target
False
name
ARIMA Regressor
predict_uses_y
False
supported_problem_types
[ProblemTypes.TIME_SERIES_REGRESSION]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns array of 0’s with a length of 1 as feature_importance is not defined for ARIMA regressor.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns array of 0’s with a length of 1 as feature_importance is not defined for ARIMA regressor.
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X, y=None)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
BinaryClassificationPipeline
(component_graph, parameters=None, custom_name=None, random_seed=0)[source]¶ Pipeline subclass for all binary classification pipelines.
- Parameters
component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
custom_name (str) – Custom name for the pipeline. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
problem_type
ProblemTypes.BINARY
Methods
Determine whether the threshold of a binary classification pipeline can be tuned.
Gets the class names for the problem.
Constructs a new pipeline with the same components, parameters, and random state.
Transforms the data by applying all pre-processing components.
Custom name of the pipeline.
Outputs pipeline details including component parameters
Importance associated with each feature. Features dropped by the feature selection are excluded.
Build a classification model. For string and categorical targets, classes are sorted
Returns component by name
Returns hyperparameter ranges from all components as a dictionary.
Generate an image representing the pipeline graph.
Generate a bar graph of the pipeline’s feature importance
Apply component inverse_transform methods to estimator predictions in reverse order.
Loads pipeline at file path
Returns model family of this pipeline.
Name of the pipeline.
Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Optimize the pipeline threshold given the objective to use. Only used for binary problems with objectives whose thresholds can be tuned.
Parameter dictionary for this pipeline.
Make predictions using selected features.
Make probability estimates for labels. Assumes that the column at index 1 represents the positive label case.
Saves pipeline at file path
Evaluate model performance on objectives
A short summary of the pipeline structure, describing the list of components used.
Threshold used to make a prediction. Defaults to None.
Transform the input.
-
can_tune_threshold_with_objective
(self, objective)¶ Determine whether the threshold of a binary classification pipeline can be tuned.
- Parameters
pipeline (PipelineBase) – Binary classification pipeline.
objective – Primary AutoMLSearch objective.
-
property
classes_
(self)¶ Gets the class names for the problem.
-
clone
(self)¶ Constructs a new pipeline with the same components, parameters, and random state.
- Returns
A new instance of this pipeline with identical components, parameters, and random state.
-
compute_estimator_features
(self, X, y=None, X_train=None, y_train=None)¶ Transforms the data by applying all pre-processing components.
- Parameters
X (pd.DataFrame) – Input data to the pipeline to transform.
y (pd.Series or None) – Targets corresponding to X. optional.
X_train (pd.DataFrame or np.ndarray or None) – Training data. Only used for time series.
y_train (pd.Series or None) – Training labels. Only used for time series.
- Returns
New transformed features.
- Return type
pd.DataFrame
-
static
create_objectives
(objectives)¶
-
property
custom_name
(self)¶ Custom name of the pipeline.
-
describe
(self, return_dict=False)¶ Outputs pipeline details including component parameters
- Parameters
return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.
- Returns
Dictionary of all component parameters if return_dict is True, else None
- Return type
dict
-
property
feature_importance
(self)¶ Importance associated with each feature. Features dropped by the feature selection are excluded.
- Returns
pd.DataFrame including feature names and their corresponding importance
-
fit
(self, X, y)¶ - Build a classification model. For string and categorical targets, classes are sorted
by sorted(set(y)) and then are mapped to values between 0 and n_classes-1.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, np.ndarray) – The target training labels of length [n_samples]
- Returns
self
-
get_component
(self, name)¶ Returns component by name
- Parameters
name (str) – Name of component
- Returns
Component to return
- Return type
Component
-
get_hyperparameter_ranges
(self, custom_hyperparameters)¶ Returns hyperparameter ranges from all components as a dictionary.
- Parameters
custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.
- Returns
Dictionary of hyperparameter ranges for each component in the pipeline.
- Return type
dict
-
graph
(self, filepath=None)¶ Generate an image representing the pipeline graph.
- Parameters
filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
- Returns
Graph object that can be directly displayed in Jupyter notebooks.
- Return type
graphviz.Digraph
-
graph_feature_importance
(self, importance_threshold=0)¶ Generate a bar graph of the pipeline’s feature importance
- Parameters
importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.
- Returns
plotly.Figure, a bar graph showing features and their corresponding importance
-
inverse_transform
(self, y)¶ Apply component inverse_transform methods to estimator predictions in reverse order.
Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
- Parameters
y (pd.Series) – Final component features
-
static
load
(file_path)¶ Loads pipeline at file path
- Parameters
file_path (str) – location to load file
- Returns
PipelineBase object
-
property
model_family
(self)¶ Returns model family of this pipeline.
-
property
name
(self)¶ Name of the pipeline.
-
new
(self, parameters, random_seed=0)¶ - Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Not to be confused with python’s __new__ method.
- Parameters
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
- Returns
A new instance of this pipeline with identical components.
-
optimize_threshold
(self, X, y, y_pred_proba, objective)¶ Optimize the pipeline threshold given the objective to use. Only used for binary problems with objectives whose thresholds can be tuned.
- Parameters
X (pd.DataFrame) – Input features
y (pd.Series) – Input target values
y_pred_proba (pd.Series) – The predicted probabilities of the target outputted by the pipeline
objective (ObjectiveBase) – The objective to threshold with. Must have a tunable threshold.
-
property
parameters
(self)¶ Parameter dictionary for this pipeline.
- Returns
Dictionary of all component parameters.
- Return type
dict
-
predict
(self, X, objective=None, X_train=None, y_train=None)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features]
objective (Object or string) – The objective to use to make predictions
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
- Returns
Estimated labels
- Return type
pd.Series
-
predict_proba
(self, X, X_train=None, y_train=None)[source]¶ Make probability estimates for labels. Assumes that the column at index 1 represents the positive label case.
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features]
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves pipeline at file path
- Parameters
file_path (str) – location to save file
pickle_protocol (int) – the pickle data stream format.
- Returns
None
-
score
(self, X, y, objectives, X_train=None, y_train=None)¶ Evaluate model performance on objectives
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features]
y (pd.Series, or np.ndarray) – True labels of length [n_samples]
objectives (list) – List of objectives to score
X_train (pd.DataFrame or np.ndarray) – Training data. Ignored. Only used for time series.
y_train (pd.Series) – Training labels. Ignored. Only used for time series.
- Returns
Ordered dictionary of objective scores
- Return type
dict
-
property
summary
(self)¶ A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder
-
property
threshold
(self)¶ Threshold used to make a prediction. Defaults to None.
-
transform
(self, X, y=None)¶ Transform the input.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples]. Defaults to None.
- Returns
Transformed output.
- Return type
pd.DataFrame
-
class
evalml.pipelines.
CatBoostClassifier
(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=True, allow_writing_files=False, random_seed=0, n_jobs=- 1, **kwargs)[source]¶ CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.
For more information, check out https://catboost.ai/
- Parameters
n_estimators (float) – The maximum number of trees to build. Defaults to 10.
eta (float) – The learning rate. Defaults to 0.03.
max_depth (int) – The maximum tree depth for base learners. Defaults to 6.
bootstrap_type (string) – Defines the method for sampling the weights of objects. Available methods are ‘Bayesian’, ‘Bernoulli’, ‘MVS’. Defaults to None.
silent (boolean) – Whether to use the “silent” logging mode. Defaults to True.
allow_writing_files (boolean) – Whether to allow writing snapshot files while training. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(4, 100), “eta”: Real(0.000001, 1), “max_depth”: Integer(4, 10),}
model_family
ModelFamily.CATBOOST
modifies_features
True
modifies_target
False
name
CatBoost Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
CatBoostRegressor
(n_estimators=10, eta=0.03, max_depth=6, bootstrap_type=None, silent=False, allow_writing_files=False, random_seed=0, n_jobs=- 1, **kwargs)[source]¶ CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features.
For more information, check out https://catboost.ai/
- Parameters
n_estimators (float) – The maximum number of trees to build. Defaults to 10.
eta (float) – The learning rate. Defaults to 0.03.
max_depth (int) – The maximum tree depth for base learners. Defaults to 6.
bootstrap_type (string) – Defines the method for sampling the weights of objects. Available methods are ‘Bayesian’, ‘Bernoulli’, ‘MVS’. Defaults to None.
silent (boolean) – Whether to use the “silent” logging mode. Defaults to True.
allow_writing_files (boolean) – Whether to allow writing snapshot files while training. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(4, 100), “eta”: Real(0.000001, 1), “max_depth”: Integer(4, 10),}
model_family
ModelFamily.CATBOOST
modifies_features
True
modifies_target
False
name
CatBoost Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
ClassificationPipeline
(component_graph, parameters=None, custom_name=None, random_seed=0)[source]¶ Pipeline subclass for all classification pipelines.
- Parameters
component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
custom_name (str) – Custom name for the pipeline. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
problem_type
None
Methods
Determine whether the threshold of a binary classification pipeline can be tuned.
Gets the class names for the problem.
Constructs a new pipeline with the same components, parameters, and random state.
Transforms the data by applying all pre-processing components.
Custom name of the pipeline.
Outputs pipeline details including component parameters
Importance associated with each feature. Features dropped by the feature selection are excluded.
Build a classification model. For string and categorical targets, classes are sorted
Returns component by name
Returns hyperparameter ranges from all components as a dictionary.
Generate an image representing the pipeline graph.
Generate a bar graph of the pipeline’s feature importance
Apply component inverse_transform methods to estimator predictions in reverse order.
Loads pipeline at file path
Returns model family of this pipeline.
Name of the pipeline.
Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Parameter dictionary for this pipeline.
Make predictions using selected features.
Make probability estimates for labels.
Saves pipeline at file path
Evaluate model performance on objectives
A short summary of the pipeline structure, describing the list of components used.
Transform the input.
-
can_tune_threshold_with_objective
(self, objective)¶ Determine whether the threshold of a binary classification pipeline can be tuned.
- Parameters
pipeline (PipelineBase) – Binary classification pipeline.
objective – Primary AutoMLSearch objective.
-
property
classes_
(self)¶ Gets the class names for the problem.
-
clone
(self)¶ Constructs a new pipeline with the same components, parameters, and random state.
- Returns
A new instance of this pipeline with identical components, parameters, and random state.
-
compute_estimator_features
(self, X, y=None, X_train=None, y_train=None)¶ Transforms the data by applying all pre-processing components.
- Parameters
X (pd.DataFrame) – Input data to the pipeline to transform.
y (pd.Series or None) – Targets corresponding to X. optional.
X_train (pd.DataFrame or np.ndarray or None) – Training data. Only used for time series.
y_train (pd.Series or None) – Training labels. Only used for time series.
- Returns
New transformed features.
- Return type
pd.DataFrame
-
static
create_objectives
(objectives)¶
-
property
custom_name
(self)¶ Custom name of the pipeline.
-
describe
(self, return_dict=False)¶ Outputs pipeline details including component parameters
- Parameters
return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.
- Returns
Dictionary of all component parameters if return_dict is True, else None
- Return type
dict
-
property
feature_importance
(self)¶ Importance associated with each feature. Features dropped by the feature selection are excluded.
- Returns
pd.DataFrame including feature names and their corresponding importance
-
fit
(self, X, y)[source]¶ - Build a classification model. For string and categorical targets, classes are sorted
by sorted(set(y)) and then are mapped to values between 0 and n_classes-1.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, np.ndarray) – The target training labels of length [n_samples]
- Returns
self
-
get_component
(self, name)¶ Returns component by name
- Parameters
name (str) – Name of component
- Returns
Component to return
- Return type
Component
-
get_hyperparameter_ranges
(self, custom_hyperparameters)¶ Returns hyperparameter ranges from all components as a dictionary.
- Parameters
custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.
- Returns
Dictionary of hyperparameter ranges for each component in the pipeline.
- Return type
dict
-
graph
(self, filepath=None)¶ Generate an image representing the pipeline graph.
- Parameters
filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
- Returns
Graph object that can be directly displayed in Jupyter notebooks.
- Return type
graphviz.Digraph
-
graph_feature_importance
(self, importance_threshold=0)¶ Generate a bar graph of the pipeline’s feature importance
- Parameters
importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.
- Returns
plotly.Figure, a bar graph showing features and their corresponding importance
-
inverse_transform
(self, y)¶ Apply component inverse_transform methods to estimator predictions in reverse order.
Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
- Parameters
y (pd.Series) – Final component features
-
static
load
(file_path)¶ Loads pipeline at file path
- Parameters
file_path (str) – location to load file
- Returns
PipelineBase object
-
property
model_family
(self)¶ Returns model family of this pipeline.
-
property
name
(self)¶ Name of the pipeline.
-
new
(self, parameters, random_seed=0)¶ - Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Not to be confused with python’s __new__ method.
- Parameters
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
- Returns
A new instance of this pipeline with identical components.
-
property
parameters
(self)¶ Parameter dictionary for this pipeline.
- Returns
Dictionary of all component parameters.
- Return type
dict
-
predict
(self, X, objective=None, X_train=None, y_train=None)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features]
objective (Object or string) – The objective to use to make predictions
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
- Returns
Estimated labels
- Return type
pd.Series
-
predict_proba
(self, X, X_train=None, y_train=None)[source]¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features]
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
- Returns
Probability estimates
- Return type
pd.DataFrame
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves pipeline at file path
- Parameters
file_path (str) – location to save file
pickle_protocol (int) – the pickle data stream format.
- Returns
None
-
score
(self, X, y, objectives, X_train=None, y_train=None)[source]¶ Evaluate model performance on objectives
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features]
y (pd.Series, or np.ndarray) – True labels of length [n_samples]
objectives (list) – List of objectives to score
X_train (pd.DataFrame or np.ndarray) – Training data. Ignored. Only used for time series.
y_train (pd.Series) – Training labels. Ignored. Only used for time series.
- Returns
Ordered dictionary of objective scores
- Return type
dict
-
property
summary
(self)¶ A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder
-
transform
(self, X, y=None)¶ Transform the input.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples]. Defaults to None.
- Returns
Transformed output.
- Return type
pd.DataFrame
-
class
evalml.pipelines.
ComponentGraph
(component_dict=None, random_seed=0)[source]¶ Component graph for a pipeline as a directed acyclic graph (DAG).
- Parameters
component_dict (dict) – A dictionary which specifies the components and edges between components that should be used to create the component graph. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Example
>>> component_dict = {'imputer': ['Imputer'], 'ohe': ['One Hot Encoder', 'imputer.x'], 'estimator_1': ['Random Forest Classifier', 'ohe.x'], 'estimator_2': ['Decision Tree Classifier', 'ohe.x'], 'final': ['Logistic Regression Classifier', 'estimator_1', 'estimator_2']} >>> component_graph = ComponentGraph(component_dict)
Methods
Transform all components save the final one, and gathers the data from any number of parents
The order that components will be computed or called in.
The default parameter dictionary for this pipeline.
Outputs component graph details including component parameters
Fit each component in the graph.
Fit all components save the final one, usually an estimator.
Regenerated the topologically sorted order of the graph
Retrieves a single component object from the graph.
Gets a list of all the estimator components within this graph.
Retrieves all inputs for a given component.
Retrieves the component that is computed last in the graph, usually the final estimator.
Generate an image representing the component graph
Instantiates all uninstantiated components within the graph using the given parameters. An error will be
Apply component inverse_transform methods to estimator predictions in reverse order.
Make predictions using selected features.
Transform the input using the component graph.
-
compute_final_component_features
(self, X, y=None)[source]¶ Transform all components save the final one, and gathers the data from any number of parents to get all the information that should be fed to the final component.
- Parameters
X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples]. Defaults to None.
- Returns
Transformed values.
- Return type
pd.DataFrame
-
property
compute_order
(self)¶ The order that components will be computed or called in.
-
property
default_parameters
(self)¶ The default parameter dictionary for this pipeline.
- Returns
Dictionary of all component default parameters.
- Return type
dict
-
describe
(self, return_dict=False)[source]¶ Outputs component graph details including component parameters
- Parameters
return_dict (bool) – If True, return dictionary of information about component graph. Defaults to False.
- Returns
Dictionary of all component parameters if return_dict is True, else None
- Return type
dict
-
fit
(self, X, y)[source]¶ Fit each component in the graph.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
-
fit_features
(self, X, y)[source]¶ Fit all components save the final one, usually an estimator.
- Parameters
X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].
- Returns
Transformed values.
- Return type
pd.DataFrame
-
classmethod
generate_order
(cls, component_dict)[source]¶ Regenerated the topologically sorted order of the graph
-
get_component
(self, component_name)[source]¶ Retrieves a single component object from the graph.
- Parameters
component_name (str) – Name of the component to retrieve
- Returns
ComponentBase object
-
get_estimators
(self)[source]¶ Gets a list of all the estimator components within this graph.
- Returns
All estimator objects within the graph.
- Return type
list
-
get_inputs
(self, component_name)[source]¶ Retrieves all inputs for a given component.
- Parameters
component_name (str) – Name of the component to look up.
- Returns
List of inputs for the component to use.
- Return type
list[str]
-
get_last_component
(self)[source]¶ Retrieves the component that is computed last in the graph, usually the final estimator.
- Returns
ComponentBase object
-
graph
(self, name=None, graph_format=None)[source]¶ Generate an image representing the component graph
- Parameters
name (str) – Name of the graph. Defaults to None.
graph_format (str) – file format to save the graph in. Defaults to None.
- Returns
Graph object that can be directly displayed in Jupyter notebooks.
- Return type
graphviz.Digraph
-
instantiate
(self, parameters)[source]¶ Instantiates all uninstantiated components within the graph using the given parameters. An error will be raised if a component is already instantiated but the parameters dict contains arguments for that component.
- Parameters
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary {} or None implies using all default values for component parameters. If a component in the component graph is already instantiated, it will not use any of its parameters defined in this dictionary.
-
inverse_transform
(self, y)[source]¶ Apply component inverse_transform methods to estimator predictions in reverse order.
Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
- Parameters
y – (pd.Series): Final component features
-
class
evalml.pipelines.
DecisionTreeClassifier
(criterion='gini', max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs)[source]¶ Decision Tree Classifier.
- Parameters
criterion ({"gini", "entropy"}) – The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. Defaults to “gini”.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Defaults to 2.
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “criterion”: [“gini”, “entropy”], “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.DECISION_TREE
modifies_features
True
modifies_target
False
name
Decision Tree Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
DecisionTreeRegressor
(criterion='mse', max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, random_seed=0, **kwargs)[source]¶ Decision Tree Regressor.
- Parameters
criterion ({"mse", "friedman_mse", "mae", "poisson"}) –
The function to measure the quality of a split. Supported criteria are:
”mse” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node
”friedman_mse”, which uses mean squared error with Friedman”s improvement score for potential splits
”mae” for the mean absolute error, which minimizes the L1 loss using the median of each terminal node,
”poisson” which uses reduction in Poisson deviance to find splits.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Defaults to 2.
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “criterion”: [“mse”, “friedman_mse”, “mae”], “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.DECISION_TREE
modifies_features
True
modifies_target
False
name
Decision Tree Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
DelayedFeatureTransformer
(date_index=None, max_delay=2, gap=0, forecast_horizon=1, delay_features=True, delay_target=True, random_seed=0, **kwargs)[source]¶ Transformer that delays input features and target variable for time series problems.
- Parameters
date_index (str) – Name of the column containing the datetime information used to order the data. Ignored.
max_delay (int) – Maximum number of time units to delay each feature. Defaults to 2.
forecast_horizon (int) – The number of time periods the pipeline is expected to forecast.
delay_features (bool) – Whether to delay the input features. Defaults to True.
delay_target (bool) – Whether to delay the target. Defaults to True.
gap (int) – The number of time units between when the features are collected and when the target is collected. For example, if you are predicting the next time step’s target, gap=1. This is only needed because when gap=0, we need to be sure to start the lagging of the target variable at 1. Defaults to 1.
random_seed (int) – Seed for the random number generator. This transformer performs the same regardless of the random seed provided.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Delayed Feature Transformer
needs_fitting
False
target_colname_prefix
target_delay_{}
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits the DelayFeatureTransformer.
Fits on X and transforms X
Loads component at file path
Returns the parameters which were used to initialize the component
Saves component at file path
Computes the delayed features for all features in X and y.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits the DelayFeatureTransformer.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)[source]¶ Computes the delayed features for all features in X and y.
For each feature in X, it will add a column to the output dataframe for each delay in the (inclusive) range [1, max_delay]. The values of each delayed feature are simply the original feature shifted forward in time by the delay amount. For example, a delay of 3 units means that the feature value at row n will be taken from the n-3rd row of that feature
If y is not None, it will also compute the delayed values for the target variable.
- Parameters
X (pd.DataFrame or None) – Data to transform. None is expected when only the target variable is being used.
y (pd.Series, or None) – Target.
- Returns
Transformed X.
- Return type
pd.DataFrame
-
class
evalml.pipelines.
DFSTransformer
(index='index', random_seed=0, **kwargs)[source]¶ Featuretools DFS component that generates features for the input features.
- Parameters
index (string) – The name of the column that contains the indices. If no column with this name exists, then featuretools.EntitySet() creates a column with this name to serve as the index column. Defaults to ‘index’.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
DFS Transformer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits the DFSTransformer Transformer component.
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Computes the feature matrix for the input X using featuretools’ dfs algorithm.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits the DFSTransformer Transformer component.
- Parameters
X (pd.DataFrame, np.array) – The input data to transform, of shape [n_samples, n_features]
y (pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)[source]¶ Computes the feature matrix for the input X using featuretools’ dfs algorithm.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data to transform. Has shape [n_samples, n_features]
y (pd.Series, optional) – Ignored.
- Returns
Feature matrix
- Return type
pd.DataFrame
-
class
evalml.pipelines.
ElasticNetClassifier
(penalty='elasticnet', C=1.0, l1_ratio=0.15, multi_class='auto', solver='saga', n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator.
- Parameters
penalty ({"l1", "l2", "elasticnet", "none"}) – The norm used in penalization. Defaults to “elasticnet”.
C (float) – Inverse of regularization strength. Must be a positive float. Defaults to 1.0.
l1_ratio (float) – The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’. Setting l1_ratio=0 is equivalent to using penalty=’l2’, while setting l1_ratio=1 is equivalent to using penalty=’l1’. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15.
multi_class ({"auto", "ovr", "multinomial"}) – If the option chosen is “ovr”, then a binary problem is fit for each label. For “multinomial” the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. “multinomial” is unavailable when solver=”liblinear”. “auto” selects “ovr” if the data is binary, or if solver=”liblinear”, and otherwise selects “multinomial”. Defaults to “auto”.
solver ({"newton-cg", "lbfgs", "liblinear", "sag", "saga"}) –
Algorithm to use in the optimization problem. For small datasets, “liblinear” is a good choice, whereas “sag” and “saga” are faster for large ones. For multiclass problems, only “newton-cg”, “sag”, “saga” and “lbfgs” handle multinomial loss; “liblinear” is limited to one-versus-rest schemes.
”newton-cg”, “lbfgs”, “sag” and “saga” handle L2 or no penalty
”liblinear” and “saga” also handle L1 penalty
”saga” also supports “elasticnet” penalty
”liblinear” does not support setting penalty=’none’
Defaults to “saga”.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “C”: Real(0.01, 10), “l1_ratio”: Real(0, 1)}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Elastic Net Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
ElasticNetRegressor
(alpha=0.0001, l1_ratio=0.15, max_iter=1000, normalize=False, random_seed=0, **kwargs)[source]¶ Elastic Net Regressor.
- Parameters
alpha (float) – Constant that multiplies the penalty terms. Defaults to 0.0001.
l1_ratio (float) – The mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’. Setting l1_ratio=0 is equivalent to using penalty=’l2’, while setting l1_ratio=1 is equivalent to using penalty=’l1’. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2. Defaults to 0.15.
max_iter (int) – The maximum number of iterations. Defaults to 1000.
normalize (boolean) – If True, the regressors will be normalized before regression by subtracting the mean and dividing by the l2-norm. Defaults to False.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “alpha”: Real(0, 1), “l1_ratio”: Real(0, 1),}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Elastic Net Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
Estimator
(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]¶ A component that fits and predicts given data.
To implement a new Estimator, define your own class which is a subclass of Estimator, including a name and a list of acceptable ranges for any parameters to be tuned during the automl search (hyperparameters). Define an __init__ method which sets up any necessary state and objects. Make sure your __init__ only uses standard keyword arguments and calls super().__init__() with a parameters dict. You may also override the fit, transform, fit_transform and other methods in this class if appropriate.
To see some examples, check out the definitions of any Estimator component.
- Parameters
parameters (dict) – Dictionary of parameters for the component. Defaults to None.
component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
predict_uses_y
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns string name of this component
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
Problem types this estimator supports
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
property
name
(cls)¶ Returns string name of this component
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)[source]¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
property
supported_problem_types
(cls)¶ Problem types this estimator supports
-
class
evalml.pipelines.
ExtraTreesClassifier
(n_estimators=100, max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Extra Trees Classifier.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
to 2. (Defaults) –
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.EXTRA_TREES
modifies_features
True
modifies_target
False
name
Extra Trees Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
ExtraTreesRegressor
(n_estimators=100, max_features='auto', max_depth=6, min_samples_split=2, min_weight_fraction_leaf=0.0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Extra Trees Regressor.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_features (int, float or {"auto", "sqrt", "log2"}) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features = n_features.
The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features. Defaults to “auto”.
max_depth (int) – The maximum depth of the tree. Defaults to 6.
min_samples_split (int or float) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
to 2. (Defaults) –
min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Defaults to 0.0.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_features”: [“auto”, “sqrt”, “log2”], “max_depth”: Integer(4, 10),}
model_family
ModelFamily.EXTRA_TREES
modifies_features
True
modifies_target
False
name
Extra Trees Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
FeatureSelector
(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]¶ Selects top features based on importance weights.
- Parameters
parameters (dict) – Dictionary of parameters for the component. Defaults to None.
component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Get names of selected features.
Loads component at file path
Returns string name of this component
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
get_names
(self)[source]¶ Get names of selected features.
- Returns
List of the names of features selected
- Return type
list[str]
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
property
name
(cls)¶ Returns string name of this component
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)[source]¶ Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.
- Parameters
X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data. Ignored.
- Returns
Transformed X
- Return type
pd.DataFrame
-
class
evalml.pipelines.
KNeighborsClassifier
(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, random_seed=0, **kwargs)[source]¶ K-Nearest Neighbors Classifier.
- Parameters
n_neighbors (int) – Number of neighbors to use by default. Defaults to 5.
weights ({‘uniform’, ‘distance’} or callable) –
Weight function used in prediction. Can be:
‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
Defaults to “uniform”.
algorithm ({‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}) –
Algorithm used to compute the nearest neighbors:
‘ball_tree’ will use BallTree
‘kd_tree’ will use KDTree
‘brute’ will use a brute-force search.
‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method. Defaults to “auto”. Note: fitting on sparse input will override the setting of this parameter, using brute force.
leaf_size (int) – Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. Defaults to 30.
p (int) – Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. Defaults to 2.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_neighbors”: Integer(2, 12), “weights”: [“uniform”, “distance”], “algorithm”: [“auto”, “ball_tree”, “kd_tree”, “brute”], “leaf_size”: Integer(10, 30), “p”: Integer(1, 5),}
model_family
ModelFamily.K_NEIGHBORS
modifies_features
True
modifies_target
False
name
KNN Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns array of 0’s matching the input number of features as feature_importance is
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns array of 0’s matching the input number of features as feature_importance is not defined for KNN classifiers.
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
LightGBMClassifier
(boosting_type='gbdt', learning_rate=0.1, n_estimators=100, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ LightGBM Classifier.
- Parameters
boosting_type (string) – Type of boosting to use. Defaults to “gbdt”. - ‘gbdt’ uses traditional Gradient Boosting Decision Tree - “dart”, uses Dropouts meet Multiple Additive Regression Trees - “goss”, uses Gradient-based One-Side Sampling - “rf”, uses Random Forest
learning_rate (float) – Boosting learning rate. Defaults to 0.1.
n_estimators (int) – Number of boosted trees to fit. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners, <=0 means no limit. Defaults to 0.
num_leaves (int) – Maximum tree leaves for base learners. Defaults to 31.
min_child_samples (int) – Minimum number of data needed in a child (leaf). Defaults to 20.
bagging_fraction (float) – LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9.
bagging_freq (int) – Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0.
n_jobs (int or None) – Number of threads to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “learning_rate”: Real(0.000001, 1), “boosting_type”: [“gbdt”, “dart”, “goss”, “rf”], “n_estimators”: Integer(10, 100), “max_depth”: Integer(0, 10), “num_leaves”: Integer(2, 100), “min_child_samples”: Integer(1, 100), “bagging_fraction”: Real(0.000001, 1), “bagging_freq”: Integer(0, 1),}
model_family
ModelFamily.LIGHTGBM
modifies_features
True
modifies_target
False
name
LightGBM Classifier
predict_uses_y
False
SEED_MAX
SEED_BOUNDS.max_bound
SEED_MIN
0
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)[source]¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
LightGBMRegressor
(boosting_type='gbdt', learning_rate=0.1, n_estimators=20, max_depth=0, num_leaves=31, min_child_samples=20, bagging_fraction=0.9, bagging_freq=0, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ LightGBM Regressor.
- Parameters
boosting_type (string) – Type of boosting to use. Defaults to “gbdt”. - ‘gbdt’ uses traditional Gradient Boosting Decision Tree - “dart”, uses Dropouts meet Multiple Additive Regression Trees - “goss”, uses Gradient-based One-Side Sampling - “rf”, uses Random Forest
learning_rate (float) – Boosting learning rate. Defaults to 0.1.
n_estimators (int) – Number of boosted trees to fit. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners, <=0 means no limit. Defaults to 0.
num_leaves (int) – Maximum tree leaves for base learners. Defaults to 31.
min_child_samples (int) – Minimum number of data needed in a child (leaf). Defaults to 20.
bagging_fraction (float) – LightGBM will randomly select a subset of features on each iteration (tree) without resampling if this is smaller than 1.0. For example, if set to 0.8, LightGBM will select 80% of features before training each tree. This can be used to speed up training and deal with overfitting. Defaults to 0.9.
bagging_freq (int) – Frequency for bagging. 0 means bagging is disabled. k means perform bagging at every k iteration. Every k-th iteration, LightGBM will randomly select bagging_fraction * 100 % of the data to use for the next k iterations. Defaults to 0.
n_jobs (int or None) – Number of threads to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “learning_rate”: Real(0.000001, 1), “boosting_type”: [“gbdt”, “dart”, “goss”, “rf”], “n_estimators”: Integer(10, 100), “max_depth”: Integer(0, 10), “num_leaves”: Integer(2, 100), “min_child_samples”: Integer(1, 100), “bagging_fraction”: Real(0.000001, 1), “bagging_freq”: Integer(0, 1),}
model_family
ModelFamily.LIGHTGBM
modifies_features
True
modifies_target
False
name
LightGBM Regressor
predict_uses_y
False
SEED_MAX
SEED_BOUNDS.max_bound
SEED_MIN
0
supported_problem_types
[ProblemTypes.REGRESSION]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
LinearRegressor
(fit_intercept=True, normalize=False, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Linear Regressor.
- Parameters
fit_intercept (boolean) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered). Defaults to True.
normalize (boolean) – If True, the regressors will be normalized before regression by subtracting the mean and dividing by the l2-norm. This parameter is ignored when fit_intercept is set to False. Defaults to False.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all threads. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “fit_intercept”: [True, False], “normalize”: [True, False]}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Linear Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
LogisticRegressionClassifier
(penalty='l2', C=1.0, multi_class='auto', solver='lbfgs', n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Logistic Regression Classifier.
- Parameters
penalty ({"l1", "l2", "elasticnet", "none"}) – The norm used in penalization. Defaults to “l2”.
C (float) – Inverse of regularization strength. Must be a positive float. Defaults to 1.0.
multi_class ({"auto", "ovr", "multinomial"}) – If the option chosen is “ovr”, then a binary problem is fit for each label. For “multinomial” the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. “multinomial” is unavailable when solver=”liblinear”. “auto” selects “ovr” if the data is binary, or if solver=”liblinear”, and otherwise selects “multinomial”. Defaults to “auto”.
solver ({"newton-cg", "lbfgs", "liblinear", "sag", "saga"}) –
Algorithm to use in the optimization problem. For small datasets, “liblinear” is a good choice, whereas “sag” and “saga” are faster for large ones. For multiclass problems, only “newton-cg”, “sag”, “saga” and “lbfgs” handle multinomial loss; “liblinear” is limited to one-versus-rest schemes.
”newton-cg”, “lbfgs”, “sag” and “saga” handle L2 or no penalty
”liblinear” and “saga” also handle L1 penalty
”saga” also supports “elasticnet” penalty
”liblinear” does not support setting penalty=’none’
Defaults to “lbfgs”.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “penalty”: [“l2”], “C”: Real(0.01, 10),}
model_family
ModelFamily.LINEAR_MODEL
modifies_features
True
modifies_target
False
name
Logistic Regression Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
MulticlassClassificationPipeline
(component_graph, parameters=None, custom_name=None, random_seed=0)[source]¶ Pipeline subclass for all multiclass classification pipelines.
- Parameters
component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
custom_name (str) – Custom name for the pipeline. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
problem_type
ProblemTypes.MULTICLASS
Methods
Determine whether the threshold of a binary classification pipeline can be tuned.
Gets the class names for the problem.
Constructs a new pipeline with the same components, parameters, and random state.
Transforms the data by applying all pre-processing components.
Custom name of the pipeline.
Outputs pipeline details including component parameters
Importance associated with each feature. Features dropped by the feature selection are excluded.
Build a classification model. For string and categorical targets, classes are sorted
Returns component by name
Returns hyperparameter ranges from all components as a dictionary.
Generate an image representing the pipeline graph.
Generate a bar graph of the pipeline’s feature importance
Apply component inverse_transform methods to estimator predictions in reverse order.
Loads pipeline at file path
Returns model family of this pipeline.
Name of the pipeline.
Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Parameter dictionary for this pipeline.
Make predictions using selected features.
Make probability estimates for labels.
Saves pipeline at file path
Evaluate model performance on objectives
A short summary of the pipeline structure, describing the list of components used.
Transform the input.
-
can_tune_threshold_with_objective
(self, objective)¶ Determine whether the threshold of a binary classification pipeline can be tuned.
- Parameters
pipeline (PipelineBase) – Binary classification pipeline.
objective – Primary AutoMLSearch objective.
-
property
classes_
(self)¶ Gets the class names for the problem.
-
clone
(self)¶ Constructs a new pipeline with the same components, parameters, and random state.
- Returns
A new instance of this pipeline with identical components, parameters, and random state.
-
compute_estimator_features
(self, X, y=None, X_train=None, y_train=None)¶ Transforms the data by applying all pre-processing components.
- Parameters
X (pd.DataFrame) – Input data to the pipeline to transform.
y (pd.Series or None) – Targets corresponding to X. optional.
X_train (pd.DataFrame or np.ndarray or None) – Training data. Only used for time series.
y_train (pd.Series or None) – Training labels. Only used for time series.
- Returns
New transformed features.
- Return type
pd.DataFrame
-
static
create_objectives
(objectives)¶
-
property
custom_name
(self)¶ Custom name of the pipeline.
-
describe
(self, return_dict=False)¶ Outputs pipeline details including component parameters
- Parameters
return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.
- Returns
Dictionary of all component parameters if return_dict is True, else None
- Return type
dict
-
property
feature_importance
(self)¶ Importance associated with each feature. Features dropped by the feature selection are excluded.
- Returns
pd.DataFrame including feature names and their corresponding importance
-
fit
(self, X, y)¶ - Build a classification model. For string and categorical targets, classes are sorted
by sorted(set(y)) and then are mapped to values between 0 and n_classes-1.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, np.ndarray) – The target training labels of length [n_samples]
- Returns
self
-
get_component
(self, name)¶ Returns component by name
- Parameters
name (str) – Name of component
- Returns
Component to return
- Return type
Component
-
get_hyperparameter_ranges
(self, custom_hyperparameters)¶ Returns hyperparameter ranges from all components as a dictionary.
- Parameters
custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.
- Returns
Dictionary of hyperparameter ranges for each component in the pipeline.
- Return type
dict
-
graph
(self, filepath=None)¶ Generate an image representing the pipeline graph.
- Parameters
filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
- Returns
Graph object that can be directly displayed in Jupyter notebooks.
- Return type
graphviz.Digraph
-
graph_feature_importance
(self, importance_threshold=0)¶ Generate a bar graph of the pipeline’s feature importance
- Parameters
importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.
- Returns
plotly.Figure, a bar graph showing features and their corresponding importance
-
inverse_transform
(self, y)¶ Apply component inverse_transform methods to estimator predictions in reverse order.
Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
- Parameters
y (pd.Series) – Final component features
-
static
load
(file_path)¶ Loads pipeline at file path
- Parameters
file_path (str) – location to load file
- Returns
PipelineBase object
-
property
model_family
(self)¶ Returns model family of this pipeline.
-
property
name
(self)¶ Name of the pipeline.
-
new
(self, parameters, random_seed=0)¶ - Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Not to be confused with python’s __new__ method.
- Parameters
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
- Returns
A new instance of this pipeline with identical components.
-
property
parameters
(self)¶ Parameter dictionary for this pipeline.
- Returns
Dictionary of all component parameters.
- Return type
dict
-
predict
(self, X, objective=None, X_train=None, y_train=None)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features]
objective (Object or string) – The objective to use to make predictions
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
- Returns
Estimated labels
- Return type
pd.Series
-
predict_proba
(self, X, X_train=None, y_train=None)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features]
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
- Returns
Probability estimates
- Return type
pd.DataFrame
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves pipeline at file path
- Parameters
file_path (str) – location to save file
pickle_protocol (int) – the pickle data stream format.
- Returns
None
-
score
(self, X, y, objectives, X_train=None, y_train=None)¶ Evaluate model performance on objectives
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features]
y (pd.Series, or np.ndarray) – True labels of length [n_samples]
objectives (list) – List of objectives to score
X_train (pd.DataFrame or np.ndarray) – Training data. Ignored. Only used for time series.
y_train (pd.Series) – Training labels. Ignored. Only used for time series.
- Returns
Ordered dictionary of objective scores
- Return type
dict
-
property
summary
(self)¶ A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder
-
transform
(self, X, y=None)¶ Transform the input.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples]. Defaults to None.
- Returns
Transformed output.
- Return type
pd.DataFrame
-
class
evalml.pipelines.
OneHotEncoder
(top_n=10, features_to_encode=None, categories=None, drop='if_binary', handle_unknown='ignore', handle_missing='error', random_seed=0, **kwargs)[source]¶ A transformer that encodes categorical features in a one-hot numeric array.
- Parameters
top_n (int) – Number of categories per column to encode. If None, all categories will be encoded. Otherwise, the n most frequent will be encoded and all others will be dropped. Defaults to 10.
features_to_encode (list[str]) – List of columns to encode. All other columns will remain untouched. If None, all appropriate columns will be encoded. Defaults to None.
categories (list) – A two dimensional list of categories, where categories[i] is a list of the categories for the column at index i. This can also be None, or “auto” if top_n is not None. Defaults to None.
drop (string, list) – Method (“first” or “if_binary”) to use to drop one category per feature. Can also be a list specifying which categories to drop for each feature. Defaults to ‘if_binary’.
handle_unknown (string) – Whether to ignore or error for unknown categories for a feature encountered during fit or transform. If either top_n or categories is used to limit the number of categories per column, this must be “ignore”. Defaults to “ignore”.
handle_missing (string) – Options for how to handle missing (NaN) values encountered during fit or transform. If this is set to “as_category” and NaN values are within the n most frequent, “nan” values will be encoded as their own column. If this is set to “error”, any missing values encountered will raise an error. Defaults to “error”.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
One Hot Encoder
Methods
Returns a list of the unique categories to be encoded for the particular feature, in order.
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Return feature names for the categorical features after fitting.
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
One-hot encode the input data.
-
categories
(self, feature_name)[source]¶ Returns a list of the unique categories to be encoded for the particular feature, in order.
- Parameters
feature_name (str) – the name of any feature provided to one-hot encoder during fit
- Returns
the unique categories, in the same dtype as they were provided during fit
- Return type
np.ndarray
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
get_feature_names
(self)[source]¶ Return feature names for the categorical features after fitting.
Feature names are formatted as {column name}_{category name}. In the event of a duplicate name, an integer will be added at the end of the feature name to distinguish it.
For example, consider a dataframe with a column called “A” and category “x_y” and another column called “A_x” with “y”. In this example, the feature names would be “A_x_y” and “A_x_y_1”.
- Returns
The feature names after encoding, provided in the same order as input_features.
- Return type
np.ndarray
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
PerColumnImputer
(impute_strategies=None, default_impute_strategy='most_frequent', random_seed=0, **kwargs)[source]¶ Imputes missing data according to a specified imputation strategy per column.
- Parameters
impute_strategies (dict) – Column and {“impute_strategy”: strategy, “fill_value”:value} pairings. Valid values for impute strategy include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to None, which uses “most_frequent” for all columns. When impute_strategy == “constant”, fill_value is used to replace missing data. When None, uses 0 when imputing numerical data and “missing_value” for strings or object data types.
default_impute_strategy (str) – Impute strategy to fall back on when none is provided for a certain column. Valid values include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types. Defaults to “most_frequent”.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Per Column Imputer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits imputers on input data
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms input data by imputing missing values.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ Fits imputers on input data
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features] to fit.
y (pd.Series, optional) – The target training data of length [n_samples]. Ignored.
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)[source]¶ Transforms input data by imputing missing values.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features] to transform.
y (pd.Series, optional) – The target training data of length [n_samples]. Ignored.
- Returns
Transformed X
- Return type
pd.DataFrame
-
class
evalml.pipelines.
PipelineBase
(component_graph, parameters=None, custom_name=None, random_seed=0)[source]¶ Machine learning pipeline made out of transformers and an Estimator.
- Parameters
component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”].
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
custom_name (str) – Custom name for the pipeline. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
problem_type
None
Methods
Determine whether the threshold of a binary classification pipeline can be tuned.
Constructs a new pipeline with the same components, parameters, and random state.
Transforms the data by applying all pre-processing components.
Custom name of the pipeline.
Outputs pipeline details including component parameters
Importance associated with each feature. Features dropped by the feature selection are excluded.
Build a model.
Returns component by name
Returns hyperparameter ranges from all components as a dictionary.
Generate an image representing the pipeline graph.
Generate a bar graph of the pipeline’s feature importance
Apply component inverse_transform methods to estimator predictions in reverse order.
Loads pipeline at file path
Returns model family of this pipeline.
Name of the pipeline.
Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Parameter dictionary for this pipeline.
Make predictions using selected features.
Saves pipeline at file path
Evaluate model performance on current and additional objectives.
A short summary of the pipeline structure, describing the list of components used.
Transform the input.
-
can_tune_threshold_with_objective
(self, objective)[source]¶ Determine whether the threshold of a binary classification pipeline can be tuned.
- Parameters
pipeline (PipelineBase) – Binary classification pipeline.
objective – Primary AutoMLSearch objective.
-
clone
(self)[source]¶ Constructs a new pipeline with the same components, parameters, and random state.
- Returns
A new instance of this pipeline with identical components, parameters, and random state.
-
compute_estimator_features
(self, X, y=None, X_train=None, y_train=None)[source]¶ Transforms the data by applying all pre-processing components.
- Parameters
X (pd.DataFrame) – Input data to the pipeline to transform.
y (pd.Series or None) – Targets corresponding to X. optional.
X_train (pd.DataFrame or np.ndarray or None) – Training data. Only used for time series.
y_train (pd.Series or None) – Training labels. Only used for time series.
- Returns
New transformed features.
- Return type
pd.DataFrame
-
property
custom_name
(self)¶ Custom name of the pipeline.
-
describe
(self, return_dict=False)[source]¶ Outputs pipeline details including component parameters
- Parameters
return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.
- Returns
Dictionary of all component parameters if return_dict is True, else None
- Return type
dict
-
property
feature_importance
(self)¶ Importance associated with each feature. Features dropped by the feature selection are excluded.
- Returns
pd.DataFrame including feature names and their corresponding importance
-
abstract
fit
(self, X, y)[source]¶ Build a model.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features].
y (pd.Series, np.ndarray) – The target training data of length [n_samples].
- Returns
self
-
get_component
(self, name)[source]¶ Returns component by name
- Parameters
name (str) – Name of component
- Returns
Component to return
- Return type
Component
-
get_hyperparameter_ranges
(self, custom_hyperparameters)[source]¶ Returns hyperparameter ranges from all components as a dictionary.
- Parameters
custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.
- Returns
Dictionary of hyperparameter ranges for each component in the pipeline.
- Return type
dict
-
graph
(self, filepath=None)[source]¶ Generate an image representing the pipeline graph.
- Parameters
filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
- Returns
Graph object that can be directly displayed in Jupyter notebooks.
- Return type
graphviz.Digraph
-
graph_feature_importance
(self, importance_threshold=0)[source]¶ Generate a bar graph of the pipeline’s feature importance
- Parameters
importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.
- Returns
plotly.Figure, a bar graph showing features and their corresponding importance
-
inverse_transform
(self, y)[source]¶ Apply component inverse_transform methods to estimator predictions in reverse order.
Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
- Parameters
y (pd.Series) – Final component features
-
static
load
(file_path)[source]¶ Loads pipeline at file path
- Parameters
file_path (str) – location to load file
- Returns
PipelineBase object
-
property
model_family
(self)¶ Returns model family of this pipeline.
-
property
name
(self)¶ Name of the pipeline.
-
new
(self, parameters, random_seed=0)[source]¶ - Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Not to be confused with python’s __new__ method.
- Parameters
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
- Returns
A new instance of this pipeline with identical components.
-
property
parameters
(self)¶ Parameter dictionary for this pipeline.
- Returns
Dictionary of all component parameters.
- Return type
dict
-
predict
(self, X, objective=None, X_train=None, y_train=None)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
objective (Object or string) – The objective to use to make predictions.
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
- Returns
Predicted values.
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)[source]¶ Saves pipeline at file path
- Parameters
file_path (str) – location to save file
pickle_protocol (int) – the pickle data stream format.
- Returns
None
-
abstract
score
(self, X, y, objectives, X_train=None, y_train=None)[source]¶ Evaluate model performance on current and additional objectives.
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series, np.ndarray) – True labels of length [n_samples].
objectives (list) – Non-empty list of objectives to score on.
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
- Returns
Ordered dictionary of objective scores.
- Return type
dict
-
property
summary
(self)¶ A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder
-
class
evalml.pipelines.
ProphetRegressor
(date_index=None, changepoint_prior_scale=0.05, seasonality_prior_scale=10, holidays_prior_scale=10, seasonality_mode='additive', random_seed=0, stan_backend='CMDSTANPY', **kwargs)[source]¶ Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
More information here: https://facebook.github.io/prophet/
Attributes
hyperparameter_ranges
{ “changepoint_prior_scale”: Real(0.001, 0.5), “seasonality_prior_scale”: Real(0.01, 10), “holidays_prior_scale”: Real(0.01, 10), “seasonality_mode”: [“additive”, “multiplicative”],}
model_family
ModelFamily.PROPHET
modifies_features
True
modifies_target
False
name
Prophet Regressor
predict_uses_y
False
supported_problem_types
[ProblemTypes.TIME_SERIES_REGRESSION]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns array of 0’s with len(1) as feature_importance is not defined for Prophet regressor.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns array of 0’s with len(1) as feature_importance is not defined for Prophet regressor.
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X, y=None)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
-
class
evalml.pipelines.
RandomForestClassifier
(n_estimators=100, max_depth=6, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Random Forest Classifier.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_depth”: Integer(1, 10),}
model_family
ModelFamily.RANDOM_FOREST
modifies_features
True
modifies_target
False
name
Random Forest Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
RandomForestRegressor
(n_estimators=100, max_depth=6, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Random Forest Regressor.
- Parameters
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “n_estimators”: Integer(10, 1000), “max_depth”: Integer(1, 32),}
model_family
ModelFamily.RANDOM_FOREST
modifies_features
True
modifies_target
False
name
Random Forest Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
RegressionPipeline
(component_graph, parameters=None, custom_name=None, random_seed=0)[source]¶ Pipeline subclass for all regression pipelines.
- Parameters
component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
custom_name (str) – Custom name for the pipeline. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
problem_type
ProblemTypes.REGRESSION
Methods
Determine whether the threshold of a binary classification pipeline can be tuned.
Constructs a new pipeline with the same components, parameters, and random state.
Transforms the data by applying all pre-processing components.
Custom name of the pipeline.
Outputs pipeline details including component parameters
Importance associated with each feature. Features dropped by the feature selection are excluded.
Build a regression model.
Returns component by name
Returns hyperparameter ranges from all components as a dictionary.
Generate an image representing the pipeline graph.
Generate a bar graph of the pipeline’s feature importance
Apply component inverse_transform methods to estimator predictions in reverse order.
Loads pipeline at file path
Returns model family of this pipeline.
Name of the pipeline.
Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Parameter dictionary for this pipeline.
Make predictions using selected features.
Saves pipeline at file path
Evaluate model performance on current and additional objectives
A short summary of the pipeline structure, describing the list of components used.
Transform the input.
-
can_tune_threshold_with_objective
(self, objective)¶ Determine whether the threshold of a binary classification pipeline can be tuned.
- Parameters
pipeline (PipelineBase) – Binary classification pipeline.
objective – Primary AutoMLSearch objective.
-
clone
(self)¶ Constructs a new pipeline with the same components, parameters, and random state.
- Returns
A new instance of this pipeline with identical components, parameters, and random state.
-
compute_estimator_features
(self, X, y=None, X_train=None, y_train=None)¶ Transforms the data by applying all pre-processing components.
- Parameters
X (pd.DataFrame) – Input data to the pipeline to transform.
y (pd.Series or None) – Targets corresponding to X. optional.
X_train (pd.DataFrame or np.ndarray or None) – Training data. Only used for time series.
y_train (pd.Series or None) – Training labels. Only used for time series.
- Returns
New transformed features.
- Return type
pd.DataFrame
-
static
create_objectives
(objectives)¶
-
property
custom_name
(self)¶ Custom name of the pipeline.
-
describe
(self, return_dict=False)¶ Outputs pipeline details including component parameters
- Parameters
return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.
- Returns
Dictionary of all component parameters if return_dict is True, else None
- Return type
dict
-
property
feature_importance
(self)¶ Importance associated with each feature. Features dropped by the feature selection are excluded.
- Returns
pd.DataFrame including feature names and their corresponding importance
-
fit
(self, X, y)[source]¶ Build a regression model.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (pd.Series, np.ndarray) – The target training data of length [n_samples]
- Returns
self
-
get_component
(self, name)¶ Returns component by name
- Parameters
name (str) – Name of component
- Returns
Component to return
- Return type
Component
-
get_hyperparameter_ranges
(self, custom_hyperparameters)¶ Returns hyperparameter ranges from all components as a dictionary.
- Parameters
custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.
- Returns
Dictionary of hyperparameter ranges for each component in the pipeline.
- Return type
dict
-
graph
(self, filepath=None)¶ Generate an image representing the pipeline graph.
- Parameters
filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
- Returns
Graph object that can be directly displayed in Jupyter notebooks.
- Return type
graphviz.Digraph
-
graph_feature_importance
(self, importance_threshold=0)¶ Generate a bar graph of the pipeline’s feature importance
- Parameters
importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.
- Returns
plotly.Figure, a bar graph showing features and their corresponding importance
-
inverse_transform
(self, y)¶ Apply component inverse_transform methods to estimator predictions in reverse order.
Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
- Parameters
y (pd.Series) – Final component features
-
static
load
(file_path)¶ Loads pipeline at file path
- Parameters
file_path (str) – location to load file
- Returns
PipelineBase object
-
property
model_family
(self)¶ Returns model family of this pipeline.
-
property
name
(self)¶ Name of the pipeline.
-
new
(self, parameters, random_seed=0)¶ - Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Not to be confused with python’s __new__ method.
- Parameters
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
- Returns
A new instance of this pipeline with identical components.
-
property
parameters
(self)¶ Parameter dictionary for this pipeline.
- Returns
Dictionary of all component parameters.
- Return type
dict
-
predict
(self, X, objective=None, X_train=None, y_train=None)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
objective (Object or string) – The objective to use to make predictions.
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
- Returns
Predicted values.
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves pipeline at file path
- Parameters
file_path (str) – location to save file
pickle_protocol (int) – the pickle data stream format.
- Returns
None
-
score
(self, X, y, objectives, X_train=None, y_train=None)[source]¶ Evaluate model performance on current and additional objectives
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features]
y (pd.Series, or np.ndarray) – True values of length [n_samples]
objectives (list) – Non-empty list of objectives to score on
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
- Returns
Ordered dictionary of objective scores
- Return type
dict
-
property
summary
(self)¶ A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder
-
transform
(self, X, y=None)¶ Transform the input.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples]. Defaults to None.
- Returns
Transformed output.
- Return type
pd.DataFrame
-
class
evalml.pipelines.
RFClassifierSelectFromModel
(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold=- np.inf, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Selects top features based on importance weights using a Random Forest classifier.
- Parameters
number_features (int) – The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. Defaults to None.
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
percent_features (float) – Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5.
threshold (string or float) – The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median”, then the threshold value is the median of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. Defaults to -np.inf.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “percent_features”: Real(0.01, 1), “threshold”: [“mean”, -np.inf],}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
RF Classifier Select From Model
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Get names of selected features.
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
get_names
(self)¶ Get names of selected features.
- Returns
List of the names of features selected
- Return type
list[str]
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)¶ Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.
- Parameters
X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data. Ignored.
- Returns
Transformed X
- Return type
pd.DataFrame
-
class
evalml.pipelines.
RFRegressorSelectFromModel
(number_features=None, n_estimators=10, max_depth=None, percent_features=0.5, threshold=- np.inf, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Selects top features based on importance weights using a Random Forest regressor.
- Parameters
number_features (int) – The maximum number of features to select. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5. Defaults to None.
n_estimators (float) – The number of trees in the forest. Defaults to 100.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
percent_features (float) – Percentage of features to use. If both percent_features and number_features are specified, take the greater number of features. Defaults to 0.5.
threshold (string or float) – The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median”, then the threshold value is the median of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. Defaults to -np.inf.
n_jobs (int or None) – Number of jobs to run in parallel. -1 uses all processes. Defaults to -1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “percent_features”: Real(0.01, 1), “threshold”: [“mean”, -np.inf],}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
RF Regressor Select From Model
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Get names of selected features.
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
get_names
(self)¶ Get names of selected features.
- Returns
List of the names of features selected
- Return type
list[str]
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
transform
(self, X, y=None)¶ Transforms input data by selecting features. If the component_obj does not have a transform method, will raise an MethodPropertyNotFoundError exception.
- Parameters
X (pd.DataFrame) – Data to transform.
y (pd.Series, optional) – Target data. Ignored.
- Returns
Transformed X
- Return type
pd.DataFrame
-
class
evalml.pipelines.
SimpleImputer
(impute_strategy='most_frequent', fill_value=None, random_seed=0, **kwargs)[source]¶ Imputes missing data according to a specified imputation strategy.
- Parameters
impute_strategy (string) – Impute strategy to use. Valid values include “mean”, “median”, “most_frequent”, “constant” for numerical data, and “most_frequent”, “constant” for object data types.
fill_value (string) – When impute_strategy == “constant”, fill_value is used to replace missing data. Defaults to 0 when imputing numerical data and “missing_value” for strings or object data types.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “impute_strategy”: [“mean”, “median”, “most_frequent”]}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Simple Imputer
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms input by imputing missing values. ‘None’ and np.nan values are treated as the same.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)[source]¶ - Fits imputer to data. ‘None’ values are converted to np.nan before imputation and are
treated as the same.
- Parameters
X (pd.DataFrame or np.ndarray) – the input training data of shape [n_samples, n_features]
y (pd.Series, optional) – the target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series, optional) – Target data.
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
SklearnStackedEnsembleClassifier
(input_pipelines=None, final_estimator=None, cv=None, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Scikit-learn Stacked Ensemble Classifier.
- Parameters
input_pipelines (list(PipelineBase or subclass obj)) – List of pipeline instances to use as the base estimators. This must not be None or an empty list or else EnsembleMissingPipelinesError will be raised.
final_estimator (Estimator or subclass) – The classifier used to combine the base estimators. If None, uses LogisticRegressionClassifier.
cv (int, cross-validation generator or an iterable) –
Determines the cross-validation splitting strategy used to train final_estimator. For int/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. Defaults to None. Possible inputs for cv are:
None: 3-fold cross validation
int: the number of folds in a (Stratified) KFold
An scikit-learn cross-validation generator object
An iterable yielding (train, test) splits
n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Defaults to -1. - Note: there could be some multi-process errors thrown for values of n_jobs != 1. If this is the case, please use n_jobs = 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.ENSEMBLE
modifies_features
True
modifies_target
False
name
Sklearn Stacked Ensemble Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for stacked ensemble classes.
Describe a component and its parameters
Not implemented for SklearnStackedEnsembleClassifier and SklearnStackedEnsembleRegressor
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for stacked ensemble classes.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Not implemented for SklearnStackedEnsembleClassifier and SklearnStackedEnsembleRegressor
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
SklearnStackedEnsembleRegressor
(input_pipelines=None, final_estimator=None, cv=None, n_jobs=- 1, random_seed=0, **kwargs)[source]¶ Scikit-learn Stacked Ensemble Regressor.
- Parameters
input_pipelines (list(PipelineBase or subclass obj)) – List of pipeline instances to use as the base estimators. This must not be None or an empty list or else EnsembleMissingPipelinesError will be raised.
final_estimator (Estimator or subclass) – The regressor used to combine the base estimators. If None, uses LinearRegressor.
cv (int, cross-validation generator or an iterable) –
Determines the cross-validation splitting strategy used to train final_estimator. For int/None inputs, KFold is used. Defaults to None. Possible inputs for cv are:
None: 3-fold cross validation
int: the number of folds in a (Stratified) KFold
An scikit-learn cross-validation generator object
An iterable yielding (train, test) splits
n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Defaults to -1. - Note: there could be some multi-process errors thrown for values of n_jobs != 1. If this is the case, please use n_jobs = 1.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.ENSEMBLE
modifies_features
True
modifies_target
False
name
Sklearn Stacked Ensemble Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for stacked ensemble classes.
Describe a component and its parameters
Not implemented for SklearnStackedEnsembleClassifier and SklearnStackedEnsembleRegressor
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for stacked ensemble classes.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Not implemented for SklearnStackedEnsembleClassifier and SklearnStackedEnsembleRegressor
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
StandardScaler
(random_seed=0, **kwargs)[source]¶ A transformer that standardizes input features by removing the mean and scaling to unit variance.
- Parameters
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Standard Scaler
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
SVMClassifier
(C=1.0, kernel='rbf', gamma='auto', probability=True, random_seed=0, **kwargs)[source]¶ Support Vector Machine Classifier.
- Parameters
C (float) – The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0.
kernel ({"poly", "rbf", "sigmoid"}) – Specifies the kernel type to be used in the algorithm. Defaults to “rbf”.
gamma ({"scale", "auto"} or float) – Kernel coefficient for “rbf”, “poly” and “sigmoid”. Defaults to “auto”. - If gamma=’scale’ is passed then it uses 1 / (n_features * X.var()) as value of gamma - If “auto” (default), uses 1 / n_features
probability (boolean) – Whether to enable probability estimates. Defaults to True.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “C”: Real(0, 10), “kernel”: [“poly”, “rbf”, “sigmoid”], “gamma”: [“scale”, “auto”],}
model_family
ModelFamily.SVM
modifies_features
True
modifies_target
False
name
SVM Classifier
predict_uses_y
False
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Feature importance only works with linear kernels.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Feature importance only works with linear kernels. If the kernel isn’t linear, we return a numpy array of zeros
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
SVMRegressor
(C=1.0, kernel='rbf', gamma='auto', random_seed=0, **kwargs)[source]¶ Support Vector Machine Regressor.
- Parameters
C (float) – The regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. Defaults to 1.0.
kernel ({"poly", "rbf", "sigmoid"}) – Specifies the kernel type to be used in the algorithm. Defaults to “rbf”.
gamma ({"scale", "auto"} or float) – Kernel coefficient for “rbf”, “poly” and “sigmoid”. Defaults to “auto”. - If gamma=’scale’ is passed then it uses 1 / (n_features * X.var()) as value of gamma - If “auto” (default), uses 1 / n_features
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{ “C”: Real(0, 10), “kernel”: [“poly”, “rbf”, “sigmoid”], “gamma”: [“scale”, “auto”],}
model_family
ModelFamily.SVM
modifies_features
True
modifies_target
False
name
SVM Regressor
predict_uses_y
False
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Feature importance only works with linear kernels.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Feature importance only works with linear kernels. If the kernel isn’t linear, we return a numpy array of zeros
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
TargetEncoder
(cols=None, smoothing=1.0, handle_unknown='value', handle_missing='value', random_seed=0, **kwargs)[source]¶ A transformer that encodes categorical features into target encodings.
- Parameters
cols (list) – Columns to encode. If None, all string columns will be encoded, otherwise only the columns provided will be encoded. Defaults to None
smoothing (float) – The smoothing factor to apply. The larger this value is, the more influence the expected target value has on the resulting target encodings. Must be strictly larger than 0. Defaults to 1.0
handle_unknown (string) – Determines how to handle unknown categories for a feature encountered. Options are ‘value’, ‘error’, nd ‘return_nan’. Defaults to ‘value’, which replaces with the target mean
handle_missing (string) – Determines how to handle missing values encountered during fit or transform. Options are ‘value’, ‘error’, and ‘return_nan’. Defaults to ‘value’, which replaces with the target mean
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
hyperparameter_ranges
{}
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
name
Target Encoder
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Return feature names for the input features after fitting.
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
get_feature_names
(self)[source]¶ Return feature names for the input features after fitting.
- Returns
The feature names after encoding
- Return type
np.array
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
TimeSeriesBinaryClassificationPipeline
(component_graph, parameters=None, custom_name=None, random_seed=0)[source]¶ Pipeline base class for time series binary classification problems.
- Parameters
component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as date_index, gap, and max_delay must be specified with the “pipeline” key. For example: Pipeline(parameters={“pipeline”: {“date_index”: “Date”, “max_delay”: 4, “gap”: 2}}).
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
problem_type
None
Methods
Determine whether the threshold of a binary classification pipeline can be tuned.
Gets the class names for the problem.
Constructs a new pipeline with the same components, parameters, and random state.
Transforms the data by applying all pre-processing components.
Custom name of the pipeline.
Outputs pipeline details including component parameters
Importance associated with each feature. Features dropped by the feature selection are excluded.
Fit a time series classification pipeline.
Returns component by name
Returns hyperparameter ranges from all components as a dictionary.
Generate an image representing the pipeline graph.
Generate a bar graph of the pipeline’s feature importance
Apply component inverse_transform methods to estimator predictions in reverse order.
Loads pipeline at file path
Returns model family of this pipeline.
Name of the pipeline.
Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Optimize the pipeline threshold given the objective to use. Only used for binary problems with objectives whose thresholds can be tuned.
Parameter dictionary for this pipeline.
Predict on future data where target is not known.
Predict on future data where the target is known, e.g. cross validation.
Predict on future data where the target is unknown.
Predict on future data where the target is known, e.g. cross validation.
Saves pipeline at file path
Evaluate model performance on current and additional objectives.
A short summary of the pipeline structure, describing the list of components used.
Threshold used to make a prediction. Defaults to None.
Transform the input.
-
can_tune_threshold_with_objective
(self, objective)¶ Determine whether the threshold of a binary classification pipeline can be tuned.
- Parameters
pipeline (PipelineBase) – Binary classification pipeline.
objective – Primary AutoMLSearch objective.
-
property
classes_
(self)¶ Gets the class names for the problem.
-
clone
(self)¶ Constructs a new pipeline with the same components, parameters, and random state.
- Returns
A new instance of this pipeline with identical components, parameters, and random state.
-
compute_estimator_features
(self, X, y=None, X_train=None, y_train=None)¶ Transforms the data by applying all pre-processing components.
- Parameters
X (pd.DataFrame) – Input data to the pipeline to transform.
y (pd.Series) – Targets corresponding to the pipeline targets.
X_train (pd.DataFrame) – Training data used to generate generates from past observations.
y_train (pd.Series) – Training targets used to generate features from past observations.
- Returns
New transformed features.
- Return type
pd.DataFrame
-
static
create_objectives
(objectives)¶
-
property
custom_name
(self)¶ Custom name of the pipeline.
-
describe
(self, return_dict=False)¶ Outputs pipeline details including component parameters
- Parameters
return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.
- Returns
Dictionary of all component parameters if return_dict is True, else None
- Return type
dict
-
property
feature_importance
(self)¶ Importance associated with each feature. Features dropped by the feature selection are excluded.
- Returns
pd.DataFrame including feature names and their corresponding importance
-
fit
(self, X, y)¶ Fit a time series classification pipeline.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features].
y (pd.Series, np.ndarray) – The target training targets of length [n_samples].
- Returns
self
-
get_component
(self, name)¶ Returns component by name
- Parameters
name (str) – Name of component
- Returns
Component to return
- Return type
Component
-
get_hyperparameter_ranges
(self, custom_hyperparameters)¶ Returns hyperparameter ranges from all components as a dictionary.
- Parameters
custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.
- Returns
Dictionary of hyperparameter ranges for each component in the pipeline.
- Return type
dict
-
graph
(self, filepath=None)¶ Generate an image representing the pipeline graph.
- Parameters
filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
- Returns
Graph object that can be directly displayed in Jupyter notebooks.
- Return type
graphviz.Digraph
-
graph_feature_importance
(self, importance_threshold=0)¶ Generate a bar graph of the pipeline’s feature importance
- Parameters
importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.
- Returns
plotly.Figure, a bar graph showing features and their corresponding importance
-
inverse_transform
(self, y)¶ Apply component inverse_transform methods to estimator predictions in reverse order.
Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
- Parameters
y (pd.Series) – Final component features
-
static
load
(file_path)¶ Loads pipeline at file path
- Parameters
file_path (str) – location to load file
- Returns
PipelineBase object
-
property
model_family
(self)¶ Returns model family of this pipeline.
-
property
name
(self)¶ Name of the pipeline.
-
new
(self, parameters, random_seed=0)¶ - Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Not to be confused with python’s __new__ method.
- Parameters
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
- Returns
A new instance of this pipeline with identical components.
-
optimize_threshold
(self, X, y, y_pred_proba, objective)¶ Optimize the pipeline threshold given the objective to use. Only used for binary problems with objectives whose thresholds can be tuned.
- Parameters
X (pd.DataFrame) – Input features
y (pd.Series) – Input target values
y_pred_proba (pd.Series) – The predicted probabilities of the target outputted by the pipeline
objective (ObjectiveBase) – The objective to threshold with. Must have a tunable threshold.
-
property
parameters
(self)¶ Parameter dictionary for this pipeline.
- Returns
Dictionary of all component parameters.
- Return type
dict
-
predict
(self, X, objective=None, X_train=None, y_train=None)¶ Predict on future data where target is not known.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
objective (Object or string) – Used in classification problems to threshold the predictions.
objective – The objective to use to make predictions.
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
-
predict_in_sample
(self, X, y, X_train, y_train, objective=None)[source]¶ Predict on future data where the target is known, e.g. cross validation.
- Parameters
X_holdout (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].
y_holdout (pd.Series, np.ndarray) – Future target of shape [n_samples].
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_feautures].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
objective (ObjectiveBase, str, None) – Objective used to threshold predicted probabilities, optional.
- Returns
Estimated labels.
- Return type
pd.Series
-
predict_proba
(self, X, X_train=None, y_train=None)¶ Predict on future data where the target is unknown.
- Parameters
X (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
- Returns
Estimated probabilities.
- Return type
pd.Series
-
predict_proba_in_sample
(self, X_holdout, y_holdout, X_train, y_train)¶ Predict on future data where the target is known, e.g. cross validation.
- Parameters
X_holdout (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].
y_holdout (pd.Series, np.ndarray) – Future target of shape [n_samples].
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
- Returns
Estimated probabilities.
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves pipeline at file path
- Parameters
file_path (str) – location to save file
pickle_protocol (int) – the pickle data stream format.
- Returns
None
-
score
(self, X, y, objectives, X_train=None, y_train=None)¶ Evaluate model performance on current and additional objectives.
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – True labels of length [n_samples].
objectives (list) – Non-empty list of objectives to score on.
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
- Returns
Ordered dictionary of objective scores.
- Return type
dict
-
property
summary
(self)¶ A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder
-
property
threshold
(self)¶ Threshold used to make a prediction. Defaults to None.
-
transform
(self, X, y=None)¶ Transform the input.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples]. Defaults to None.
- Returns
Transformed output.
- Return type
pd.DataFrame
-
class
evalml.pipelines.
TimeSeriesClassificationPipeline
(component_graph, parameters=None, custom_name=None, random_seed=0)[source]¶ Pipeline base class for time series classification problems.
- Parameters
component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as date_index, gap, and max_delay must be specified with the “pipeline” key. For example: Pipeline(parameters={“pipeline”: {“date_index”: “Date”, “max_delay”: 4, “gap”: 2}}).
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
problem_type
None
Methods
Determine whether the threshold of a binary classification pipeline can be tuned.
Gets the class names for the problem.
Constructs a new pipeline with the same components, parameters, and random state.
Transforms the data by applying all pre-processing components.
Custom name of the pipeline.
Outputs pipeline details including component parameters
Importance associated with each feature. Features dropped by the feature selection are excluded.
Fit a time series classification pipeline.
Returns component by name
Returns hyperparameter ranges from all components as a dictionary.
Generate an image representing the pipeline graph.
Generate a bar graph of the pipeline’s feature importance
Apply component inverse_transform methods to estimator predictions in reverse order.
Loads pipeline at file path
Returns model family of this pipeline.
Name of the pipeline.
Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Parameter dictionary for this pipeline.
Predict on future data where target is not known.
Predict on future data where the target is known, e.g. cross validation.
Predict on future data where the target is unknown.
Predict on future data where the target is known, e.g. cross validation.
Saves pipeline at file path
Evaluate model performance on current and additional objectives.
A short summary of the pipeline structure, describing the list of components used.
Transform the input.
-
can_tune_threshold_with_objective
(self, objective)¶ Determine whether the threshold of a binary classification pipeline can be tuned.
- Parameters
pipeline (PipelineBase) – Binary classification pipeline.
objective – Primary AutoMLSearch objective.
-
property
classes_
(self)¶ Gets the class names for the problem.
-
clone
(self)¶ Constructs a new pipeline with the same components, parameters, and random state.
- Returns
A new instance of this pipeline with identical components, parameters, and random state.
-
compute_estimator_features
(self, X, y=None, X_train=None, y_train=None)¶ Transforms the data by applying all pre-processing components.
- Parameters
X (pd.DataFrame) – Input data to the pipeline to transform.
y (pd.Series) – Targets corresponding to the pipeline targets.
X_train (pd.DataFrame) – Training data used to generate generates from past observations.
y_train (pd.Series) – Training targets used to generate features from past observations.
- Returns
New transformed features.
- Return type
pd.DataFrame
-
static
create_objectives
(objectives)¶
-
property
custom_name
(self)¶ Custom name of the pipeline.
-
describe
(self, return_dict=False)¶ Outputs pipeline details including component parameters
- Parameters
return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.
- Returns
Dictionary of all component parameters if return_dict is True, else None
- Return type
dict
-
property
feature_importance
(self)¶ Importance associated with each feature. Features dropped by the feature selection are excluded.
- Returns
pd.DataFrame including feature names and their corresponding importance
-
fit
(self, X, y)[source]¶ Fit a time series classification pipeline.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features].
y (pd.Series, np.ndarray) – The target training targets of length [n_samples].
- Returns
self
-
get_component
(self, name)¶ Returns component by name
- Parameters
name (str) – Name of component
- Returns
Component to return
- Return type
Component
-
get_hyperparameter_ranges
(self, custom_hyperparameters)¶ Returns hyperparameter ranges from all components as a dictionary.
- Parameters
custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.
- Returns
Dictionary of hyperparameter ranges for each component in the pipeline.
- Return type
dict
-
graph
(self, filepath=None)¶ Generate an image representing the pipeline graph.
- Parameters
filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
- Returns
Graph object that can be directly displayed in Jupyter notebooks.
- Return type
graphviz.Digraph
-
graph_feature_importance
(self, importance_threshold=0)¶ Generate a bar graph of the pipeline’s feature importance
- Parameters
importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.
- Returns
plotly.Figure, a bar graph showing features and their corresponding importance
-
inverse_transform
(self, y)¶ Apply component inverse_transform methods to estimator predictions in reverse order.
Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
- Parameters
y (pd.Series) – Final component features
-
static
load
(file_path)¶ Loads pipeline at file path
- Parameters
file_path (str) – location to load file
- Returns
PipelineBase object
-
property
model_family
(self)¶ Returns model family of this pipeline.
-
property
name
(self)¶ Name of the pipeline.
-
new
(self, parameters, random_seed=0)¶ - Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Not to be confused with python’s __new__ method.
- Parameters
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
- Returns
A new instance of this pipeline with identical components.
-
property
parameters
(self)¶ Parameter dictionary for this pipeline.
- Returns
Dictionary of all component parameters.
- Return type
dict
-
predict
(self, X, objective=None, X_train=None, y_train=None)¶ Predict on future data where target is not known.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
objective (Object or string) – Used in classification problems to threshold the predictions.
objective – The objective to use to make predictions.
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
-
predict_in_sample
(self, X, y, X_train, y_train, objective=None)[source]¶ Predict on future data where the target is known, e.g. cross validation.
- Parameters
X_holdout (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].
y_holdout (pd.Series, np.ndarray) – Future target of shape [n_samples].
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
objective (ObjectiveBase, str, None) – Objective used to threshold predicted probabilities, optional.
- Returns
Estimated labels.
- Return type
pd.Series
-
predict_proba
(self, X, X_train=None, y_train=None)[source]¶ Predict on future data where the target is unknown.
- Parameters
X (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
- Returns
Estimated probabilities.
- Return type
pd.Series
-
predict_proba_in_sample
(self, X_holdout, y_holdout, X_train, y_train)[source]¶ Predict on future data where the target is known, e.g. cross validation.
- Parameters
X_holdout (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].
y_holdout (pd.Series, np.ndarray) – Future target of shape [n_samples].
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
- Returns
Estimated probabilities.
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves pipeline at file path
- Parameters
file_path (str) – location to save file
pickle_protocol (int) – the pickle data stream format.
- Returns
None
-
score
(self, X, y, objectives, X_train=None, y_train=None)[source]¶ Evaluate model performance on current and additional objectives.
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – True labels of length [n_samples].
objectives (list) – Non-empty list of objectives to score on.
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
- Returns
Ordered dictionary of objective scores.
- Return type
dict
-
property
summary
(self)¶ A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder
-
transform
(self, X, y=None)¶ Transform the input.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples]. Defaults to None.
- Returns
Transformed output.
- Return type
pd.DataFrame
-
class
evalml.pipelines.
TimeSeriesMulticlassClassificationPipeline
(component_graph, parameters=None, custom_name=None, random_seed=0)[source]¶ Pipeline base class for time series multiclass classification problems.
- Parameters
component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as date_index, gap, and max_delay must be specified with the “pipeline” key. For example: Pipeline(parameters={“pipeline”: {“date_index”: “Date”, “max_delay”: 4, “gap”: 2}}).
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
problem_type
ProblemTypes.TIME_SERIES_MULTICLASS
Methods
Determine whether the threshold of a binary classification pipeline can be tuned.
Gets the class names for the problem.
Constructs a new pipeline with the same components, parameters, and random state.
Transforms the data by applying all pre-processing components.
Custom name of the pipeline.
Outputs pipeline details including component parameters
Importance associated with each feature. Features dropped by the feature selection are excluded.
Fit a time series classification pipeline.
Returns component by name
Returns hyperparameter ranges from all components as a dictionary.
Generate an image representing the pipeline graph.
Generate a bar graph of the pipeline’s feature importance
Apply component inverse_transform methods to estimator predictions in reverse order.
Loads pipeline at file path
Returns model family of this pipeline.
Name of the pipeline.
Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Parameter dictionary for this pipeline.
Predict on future data where target is not known.
Predict on future data where the target is known, e.g. cross validation.
Predict on future data where the target is unknown.
Predict on future data where the target is known, e.g. cross validation.
Saves pipeline at file path
Evaluate model performance on current and additional objectives.
A short summary of the pipeline structure, describing the list of components used.
Transform the input.
-
can_tune_threshold_with_objective
(self, objective)¶ Determine whether the threshold of a binary classification pipeline can be tuned.
- Parameters
pipeline (PipelineBase) – Binary classification pipeline.
objective – Primary AutoMLSearch objective.
-
property
classes_
(self)¶ Gets the class names for the problem.
-
clone
(self)¶ Constructs a new pipeline with the same components, parameters, and random state.
- Returns
A new instance of this pipeline with identical components, parameters, and random state.
-
compute_estimator_features
(self, X, y=None, X_train=None, y_train=None)¶ Transforms the data by applying all pre-processing components.
- Parameters
X (pd.DataFrame) – Input data to the pipeline to transform.
y (pd.Series) – Targets corresponding to the pipeline targets.
X_train (pd.DataFrame) – Training data used to generate generates from past observations.
y_train (pd.Series) – Training targets used to generate features from past observations.
- Returns
New transformed features.
- Return type
pd.DataFrame
-
static
create_objectives
(objectives)¶
-
property
custom_name
(self)¶ Custom name of the pipeline.
-
describe
(self, return_dict=False)¶ Outputs pipeline details including component parameters
- Parameters
return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.
- Returns
Dictionary of all component parameters if return_dict is True, else None
- Return type
dict
-
property
feature_importance
(self)¶ Importance associated with each feature. Features dropped by the feature selection are excluded.
- Returns
pd.DataFrame including feature names and their corresponding importance
-
fit
(self, X, y)¶ Fit a time series classification pipeline.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features].
y (pd.Series, np.ndarray) – The target training targets of length [n_samples].
- Returns
self
-
get_component
(self, name)¶ Returns component by name
- Parameters
name (str) – Name of component
- Returns
Component to return
- Return type
Component
-
get_hyperparameter_ranges
(self, custom_hyperparameters)¶ Returns hyperparameter ranges from all components as a dictionary.
- Parameters
custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.
- Returns
Dictionary of hyperparameter ranges for each component in the pipeline.
- Return type
dict
-
graph
(self, filepath=None)¶ Generate an image representing the pipeline graph.
- Parameters
filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
- Returns
Graph object that can be directly displayed in Jupyter notebooks.
- Return type
graphviz.Digraph
-
graph_feature_importance
(self, importance_threshold=0)¶ Generate a bar graph of the pipeline’s feature importance
- Parameters
importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.
- Returns
plotly.Figure, a bar graph showing features and their corresponding importance
-
inverse_transform
(self, y)¶ Apply component inverse_transform methods to estimator predictions in reverse order.
Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
- Parameters
y (pd.Series) – Final component features
-
static
load
(file_path)¶ Loads pipeline at file path
- Parameters
file_path (str) – location to load file
- Returns
PipelineBase object
-
property
model_family
(self)¶ Returns model family of this pipeline.
-
property
name
(self)¶ Name of the pipeline.
-
new
(self, parameters, random_seed=0)¶ - Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Not to be confused with python’s __new__ method.
- Parameters
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
- Returns
A new instance of this pipeline with identical components.
-
property
parameters
(self)¶ Parameter dictionary for this pipeline.
- Returns
Dictionary of all component parameters.
- Return type
dict
-
predict
(self, X, objective=None, X_train=None, y_train=None)¶ Predict on future data where target is not known.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
objective (Object or string) – Used in classification problems to threshold the predictions.
objective – The objective to use to make predictions.
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
-
predict_in_sample
(self, X, y, X_train, y_train, objective=None)¶ Predict on future data where the target is known, e.g. cross validation.
- Parameters
X_holdout (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].
y_holdout (pd.Series, np.ndarray) – Future target of shape [n_samples].
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
objective (ObjectiveBase, str, None) – Objective used to threshold predicted probabilities, optional.
- Returns
Estimated labels.
- Return type
pd.Series
-
predict_proba
(self, X, X_train=None, y_train=None)¶ Predict on future data where the target is unknown.
- Parameters
X (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
- Returns
Estimated probabilities.
- Return type
pd.Series
-
predict_proba_in_sample
(self, X_holdout, y_holdout, X_train, y_train)¶ Predict on future data where the target is known, e.g. cross validation.
- Parameters
X_holdout (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features].
y_holdout (pd.Series, np.ndarray) – Future target of shape [n_samples].
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
- Returns
Estimated probabilities.
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves pipeline at file path
- Parameters
file_path (str) – location to save file
pickle_protocol (int) – the pickle data stream format.
- Returns
None
-
score
(self, X, y, objectives, X_train=None, y_train=None)¶ Evaluate model performance on current and additional objectives.
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – True labels of length [n_samples].
objectives (list) – Non-empty list of objectives to score on.
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_features].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
- Returns
Ordered dictionary of objective scores.
- Return type
dict
-
property
summary
(self)¶ A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder
-
transform
(self, X, y=None)¶ Transform the input.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples]. Defaults to None.
- Returns
Transformed output.
- Return type
pd.DataFrame
-
class
evalml.pipelines.
TimeSeriesRegressionPipeline
(component_graph, parameters=None, custom_name=None, random_seed=0)[source]¶ Pipeline base class for time series regression problems.
- Parameters
component_graph (list or dict) – List of components in order. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component’s index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names [“Imputer”, “One Hot Encoder”, “Imputer_2”, “Logistic Regression Classifier”]
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary {} implies using all default values for component parameters. Pipeline-level parameters such as date_index, gap, and max_delay must be specified with the “pipeline” key. For example: Pipeline(parameters={“pipeline”: {“date_index”: “Date”, “max_delay”: 4, “gap”: 2}}).
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
problem_type
ProblemTypes.TIME_SERIES_REGRESSION
Methods
Determine whether the threshold of a binary classification pipeline can be tuned.
Constructs a new pipeline with the same components, parameters, and random state.
Transforms the data by applying all pre-processing components.
Custom name of the pipeline.
Outputs pipeline details including component parameters
Importance associated with each feature. Features dropped by the feature selection are excluded.
Fit a time series pipeline.
Returns component by name
Returns hyperparameter ranges from all components as a dictionary.
Generate an image representing the pipeline graph.
Generate a bar graph of the pipeline’s feature importance
Apply component inverse_transform methods to estimator predictions in reverse order.
Loads pipeline at file path
Returns model family of this pipeline.
Name of the pipeline.
Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Parameter dictionary for this pipeline.
Predict on future data where target is not known.
Predict on future data where the target is known, e.g. cross validation.
Saves pipeline at file path
Evaluate model performance on current and additional objectives.
A short summary of the pipeline structure, describing the list of components used.
Transform the input.
-
can_tune_threshold_with_objective
(self, objective)¶ Determine whether the threshold of a binary classification pipeline can be tuned.
- Parameters
pipeline (PipelineBase) – Binary classification pipeline.
objective – Primary AutoMLSearch objective.
-
clone
(self)¶ Constructs a new pipeline with the same components, parameters, and random state.
- Returns
A new instance of this pipeline with identical components, parameters, and random state.
-
compute_estimator_features
(self, X, y=None, X_train=None, y_train=None)¶ Transforms the data by applying all pre-processing components.
- Parameters
X (pd.DataFrame) – Input data to the pipeline to transform.
y (pd.Series) – Targets corresponding to the pipeline targets.
X_train (pd.DataFrame) – Training data used to generate generates from past observations.
y_train (pd.Series) – Training targets used to generate features from past observations.
- Returns
New transformed features.
- Return type
pd.DataFrame
-
static
create_objectives
(objectives)¶
-
property
custom_name
(self)¶ Custom name of the pipeline.
-
describe
(self, return_dict=False)¶ Outputs pipeline details including component parameters
- Parameters
return_dict (bool) – If True, return dictionary of information about pipeline. Defaults to False.
- Returns
Dictionary of all component parameters if return_dict is True, else None
- Return type
dict
-
property
feature_importance
(self)¶ Importance associated with each feature. Features dropped by the feature selection are excluded.
- Returns
pd.DataFrame including feature names and their corresponding importance
-
fit
(self, X, y)¶ Fit a time series pipeline.
- Parameters
X (pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features].
y (pd.Series, np.ndarray) – The target training targets of length [n_samples].
- Returns
self
-
get_component
(self, name)¶ Returns component by name
- Parameters
name (str) – Name of component
- Returns
Component to return
- Return type
Component
-
get_hyperparameter_ranges
(self, custom_hyperparameters)¶ Returns hyperparameter ranges from all components as a dictionary.
- Parameters
custom_hyperparameters (dict) – Custom hyperparameters for the pipeline.
- Returns
Dictionary of hyperparameter ranges for each component in the pipeline.
- Return type
dict
-
graph
(self, filepath=None)¶ Generate an image representing the pipeline graph.
- Parameters
filepath (str, optional) – Path to where the graph should be saved. If set to None (as by default), the graph will not be saved.
- Returns
Graph object that can be directly displayed in Jupyter notebooks.
- Return type
graphviz.Digraph
-
graph_feature_importance
(self, importance_threshold=0)¶ Generate a bar graph of the pipeline’s feature importance
- Parameters
importance_threshold (float, optional) – If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero.
- Returns
plotly.Figure, a bar graph showing features and their corresponding importance
-
inverse_transform
(self, y)¶ Apply component inverse_transform methods to estimator predictions in reverse order.
Components that implement inverse_transform are PolynomialDetrender, LabelEncoder (tbd).
- Parameters
y (pd.Series) – Final component features
-
static
load
(file_path)¶ Loads pipeline at file path
- Parameters
file_path (str) – location to load file
- Returns
PipelineBase object
-
property
model_family
(self)¶ Returns model family of this pipeline.
-
property
name
(self)¶ Name of the pipeline.
-
new
(self, parameters, random_seed=0)¶ - Constructs a new instance of the pipeline with the same component graph but with a different set of parameters.
Not to be confused with python’s __new__ method.
- Parameters
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
- Returns
A new instance of this pipeline with identical components.
-
property
parameters
(self)¶ Parameter dictionary for this pipeline.
- Returns
Dictionary of all component parameters.
- Return type
dict
-
predict
(self, X, objective=None, X_train=None, y_train=None)¶ Predict on future data where target is not known.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
objective (Object or string) – Used in classification problems to threshold the predictions.
objective – The objective to use to make predictions.
X_train (pd.DataFrame or np.ndarray or None) – Training data. Ignored. Only used for time series.
y_train (pd.Series or None) – Training labels. Ignored. Only used for time series.
-
predict_in_sample
(self, X, y, X_train, y_train, objective=None)¶ Predict on future data where the target is known, e.g. cross validation.
- Parameters
X_holdout (pd.DataFrame or np.ndarray) – Future data of shape [n_samples, n_features]
y_holdout (pd.Series, np.ndarray) – Future target of shape [n_samples]
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_feautures]
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train]
objective (ObjectiveBase, str, None) – Objective used to threshold predicted probabilities, optional.
- Returns
Estimated labels
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves pipeline at file path
- Parameters
file_path (str) – location to save file
pickle_protocol (int) – the pickle data stream format.
- Returns
None
-
score
(self, X, y, objectives, X_train=None, y_train=None)[source]¶ Evaluate model performance on current and additional objectives.
- Parameters
X (pd.DataFrame or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – True labels of length [n_samples].
objectives (list) – Non-empty list of objectives to score on.
X_train (pd.DataFrame, np.ndarray) – Data the pipeline was trained on of shape [n_samples_train, n_feautures].
y_train (pd.Series, np.ndarray) – Targets used to train the pipeline of shape [n_samples_train].
- Returns
Ordered dictionary of objective scores.
- Return type
dict
-
property
summary
(self)¶ A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder
-
transform
(self, X, y=None)¶ Transform the input.
- Parameters
X (pd.DataFrame, or np.ndarray) – Data of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples]. Defaults to None.
- Returns
Transformed output.
- Return type
pd.DataFrame
-
class
evalml.pipelines.
Transformer
(parameters=None, component_obj=None, random_seed=0, **kwargs)[source]¶ A component that may or may not need fitting that transforms data. These components are used before an estimator.
To implement a new Transformer, define your own class which is a subclass of Transformer, including a name and a list of acceptable ranges for any parameters to be tuned during the automl search (hyperparameters). Define an __init__ method which sets up any necessary state and objects. Make sure your __init__ only uses standard keyword arguments and calls super().__init__() with a parameters dict. You may also override the fit, transform, fit_transform and other methods in this class if appropriate.
To see some examples, check out the definitions of any Transformer component.
- Parameters
parameters (dict) – Dictionary of parameters for the component. Defaults to None.
component_obj (obj) – Third-party objects useful in component implementation. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.
Attributes
model_family
ModelFamily.NONE
modifies_features
True
modifies_target
False
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Fits component to data
Fits on X and transforms X
Loads component at file path
Returns string name of this component
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Saves component at file path
Transforms data X.
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
fit
(self, X, y=None)¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
fit_transform
(self, X, y=None)[source]¶ Fits on X and transforms X
- Parameters
X (pd.DataFrame) – Data to fit and transform
y (pd.Series) – Target data
- Returns
Transformed X
- Return type
pd.DataFrame
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
property
name
(cls)¶ Returns string name of this component
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
XGBoostClassifier
(eta=0.1, max_depth=6, min_child_weight=1, n_estimators=100, random_seed=0, eval_metric='logloss', n_jobs=12, **kwargs)[source]¶ XGBoost Classifier.
- Parameters
eta (float) – Boosting learning rate. Defaults to 0.1.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
min_child_weight (float) – Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0
n_estimators (int) – Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100.
random_seed (int) – Seed for the random number generator. Defaults to 0.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to 12.
Attributes
hyperparameter_ranges
{ “eta”: Real(0.000001, 1), “max_depth”: Integer(1, 10), “min_child_weight”: Real(1, 10), “n_estimators”: Integer(1, 1000),}
model_family
ModelFamily.XGBOOST
modifies_features
True
modifies_target
False
name
XGBoost Classifier
predict_uses_y
False
SEED_MAX
None
SEED_MIN
None
supported_problem_types
[ ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.TIME_SERIES_BINARY, ProblemTypes.TIME_SERIES_MULTICLASS,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)[source]¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None
-
class
evalml.pipelines.
XGBoostRegressor
(eta=0.1, max_depth=6, min_child_weight=1, n_estimators=100, random_seed=0, n_jobs=12, **kwargs)[source]¶ XGBoost Regressor.
- Parameters
eta (float) – Boosting learning rate. Defaults to 0.1.
max_depth (int) – Maximum tree depth for base learners. Defaults to 6.
min_child_weight (float) – Minimum sum of instance weight (hessian) needed in a child. Defaults to 1.0
n_estimators (int) – Number of gradient boosted trees. Equivalent to number of boosting rounds. Defaults to 100.
random_seed (int) – Seed for the random number generator. Defaults to 0.
n_jobs (int) – Number of parallel threads used to run xgboost. Note that creating thread contention will significantly slow down the algorithm. Defaults to 12.
Attributes
hyperparameter_ranges
{ “eta”: Real(0.000001, 1), “max_depth”: Integer(1, 20), “min_child_weight”: Real(1, 10), “n_estimators”: Integer(1, 1000),}
model_family
ModelFamily.XGBOOST
modifies_features
True
modifies_target
False
name
XGBoost Regressor
predict_uses_y
False
SEED_MAX
None
SEED_MIN
None
supported_problem_types
[ ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION,]
Methods
Constructs a new component with the same parameters and random state.
Returns the default parameters for this component.
Describe a component and its parameters
Returns importance associated with each feature.
Fits component to data
Loads component at file path
Returns boolean determining if component needs fitting before
Returns the parameters which were used to initialize the component
Make predictions using selected features.
Make probability estimates for labels.
Saves component at file path
-
clone
(self)¶ Constructs a new component with the same parameters and random state.
- Returns
A new instance of this component with identical parameters and random state.
-
default_parameters
(cls)¶ Returns the default parameters for this component.
Our convention is that Component.default_parameters == Component().parameters.
- Returns
default parameters for this component.
- Return type
dict
-
describe
(self, print_name=False, return_dict=False)¶ Describe a component and its parameters
- Parameters
print_name (bool, optional) – whether to print name of component
return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}
- Returns
prints and returns dictionary
- Return type
None or dict
-
property
feature_importance
(self)¶ Returns importance associated with each feature.
- Returns
Importance associated with each feature
- Return type
np.ndarray
-
fit
(self, X, y=None)[source]¶ Fits component to data
- Parameters
X (list, pd.DataFrame or np.ndarray) – The input training data of shape [n_samples, n_features]
y (list, pd.Series, np.ndarray, optional) – The target training data of length [n_samples]
- Returns
self
-
static
load
(file_path)¶ Loads component at file path
- Parameters
file_path (str) – Location to load file
- Returns
ComponentBase object
-
needs_fitting
(self)¶ Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances. This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.
-
property
parameters
(self)¶ Returns the parameters which were used to initialize the component
-
predict
(self, X)[source]¶ Make predictions using selected features.
- Parameters
X (pd.DataFrame, np.ndarray) – Data of shape [n_samples, n_features]
- Returns
Predicted values
- Return type
pd.Series
-
predict_proba
(self, X)¶ Make probability estimates for labels.
- Parameters
X (pd.DataFrame, or np.ndarray) – Features
- Returns
Probability estimates
- Return type
pd.Series
-
save
(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)¶ Saves component at file path
- Parameters
file_path (str) – Location to save file
pickle_protocol (int) – The pickle data stream format.
- Returns
None