automl_algorithm¶

Submodules¶

Package Contents¶

Classes Summary¶

`AutoMLAlgorithm`	Base class for the AutoML algorithms which power EvalML.
`EvalMLAlgorithm`	An automl algorithm that consists of two modes: fast and long, where fast is a subset of long.
`IterativeAlgorithm`	An automl algorithm which first fits a base round of pipelines with default parameters, then does a round of parameter tuning on each pipeline in order of performance.

Exceptions Summary¶

Contents¶

class evalml.automl.automl_algorithm.AutoMLAlgorithm(allowed_pipelines=None, custom_hyperparameters=None, max_iterations=None, tuner_class=None, random_seed=0)[source]¶

Base class for the AutoML algorithms which power EvalML.

This class represents an automated machine learning (AutoML) algorithm. It encapsulates the decision-making logic behind an automl search, by both deciding which pipelines to evaluate next and by deciding what set of parameters to configure the pipeline with.

To use this interface, you must define a next_batch method which returns the next group of pipelines to evaluate on the training data. That method may access state and results recorded from the previous batches, although that information is not tracked in a general way in this base class. Overriding add_result is a convenient way to record pipeline evaluation info if necessary.

Parameters

allowed_pipelines (list(class)) – A list of PipelineBase subclasses indicating the pipelines allowed in the search. The default of None indicates all pipelines for this problem type are allowed.
custom_hyperparameters (dict) – Custom hyperparameter ranges specified for pipelines to iterate over.
max_iterations (int) – The maximum number of iterations to be evaluated.
tuner_class (class) – A subclass of Tuner, to be used to find parameters for each pipeline. The default of None indicates the SKOptTuner will be used.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Methods

`add_result`	Register results from evaluating a pipeline
`batch_number`	Returns the number of batches which have been recommended so far.
`next_batch`	Get the next batch of pipelines to evaluate
`pipeline_number`	Returns the number of pipelines which have been recommended so far.

add_result(self, score_to_minimize, pipeline, trained_pipeline_results)[source]¶

Parameters

score_to_minimize (float) – The score obtained by this pipeline on the primary objective, converted so that lower values indicate better pipelines.
pipeline (PipelineBase) – The trained pipeline object which was used to compute the score.
trained_pipeline_results (dict) – Results from training a pipeline.

property batch_number(self)¶: Returns the number of batches which have been recommended so far.

abstract next_batch(self)[source]¶

Get the next batch of pipelines to evaluate

Returns: a list of instances of PipelineBase subclasses, ready to be trained and evaluated.
Return type: list(PipelineBase)

property pipeline_number(self)¶: Returns the number of pipelines which have been recommended so far.

exception evalml.automl.automl_algorithm.AutoMLAlgorithmException[source]¶: Exception raised when an error is encountered during the computation of the automl algorithm

class evalml.automl.automl_algorithm.EvalMLAlgorithm(X, y, problem_type, sampler_name, tuner_class=None, random_seed=0, pipeline_params=None, custom_hyperparameters=None, n_jobs=- 1, text_in_ensembling=None, top_n=3, num_long_explore_pipelines=50, num_long_pipelines_per_batch=10)[source]¶

An automl algorithm that consists of two modes: fast and long, where fast is a subset of long.

Naive pipelines:
1. run baseline with default preprocessing pipeline
2. run naive linear model with default preprocessing pipeline
3. run basic RF pipeline with default preprocessing pipeline
Naive pipelines with feature selection
1. subsequent pipelines will use the selected features with a SelectedColumns transformer

At this point we have a single pipeline candidate for preprocessing and feature selection

Pipelines with preprocessing components:
1. scan rest of estimators (our current batch 1).
First ensembling run

Fast mode ends here. Begin long mode.

Run top 3 estimators:
1. Generate 50 random parameter sets. Run all 150 in one batch
Second ensembling run
Repeat these indefinitely until stopping criterion is met:
1. For each of the previous top 3 estimators, sample 10 parameters from the tuner. Run all 30 in one batch
2. Run ensembling

Methods

`add_result`	Register results from evaluating a pipeline. In batch number 2, the selected column names from the feature selector are taken to be used in a column selector. Information regarding the best pipeline is updated here as well.
`batch_number`	Returns the number of batches which have been recommended so far.
`next_batch`	Get the next batch of pipelines to evaluate
`pipeline_number`	Returns the number of pipelines which have been recommended so far.

add_result(self, score_to_minimize, pipeline, trained_pipeline_results)[source]¶

Register results from evaluating a pipeline. In batch number 2, the selected column names from the feature selector are taken to be used in a column selector. Information regarding the best pipeline is updated here as well.

Parameters

score_to_minimize (float) – The score obtained by this pipeline on the primary objective, converted so that lower values indicate better pipelines.
pipeline (PipelineBase) – The trained pipeline object which was used to compute the score.
trained_pipeline_results (dict) – Results from training a pipeline.

property batch_number(self)¶: Returns the number of batches which have been recommended so far.

next_batch(self)[source]¶

Get the next batch of pipelines to evaluate

Returns: a list of instances of PipelineBase subclasses, ready to be trained and evaluated.
Return type: list(PipelineBase)

property pipeline_number(self)¶: Returns the number of pipelines which have been recommended so far.

class evalml.automl.automl_algorithm.IterativeAlgorithm(allowed_pipelines=None, max_iterations=None, tuner_class=None, random_seed=0, pipelines_per_batch=5, n_jobs=- 1, number_features=None, ensembling=False, text_in_ensembling=False, pipeline_params=None, custom_hyperparameters=None, _estimator_family_order=None)[source]¶

An automl algorithm which first fits a base round of pipelines with default parameters, then does a round of parameter tuning on each pipeline in order of performance.

Parameters

allowed_pipelines (list(class)) – A list of PipelineBase instances indicating the pipelines allowed in the search. The default of None indicates all pipelines for this problem type are allowed.
max_iterations (int) – The maximum number of iterations to be evaluated.
tuner_class (class) – A subclass of Tuner, to be used to find parameters for each pipeline. The default of None indicates the SKOptTuner will be used.
random_seed (int) – Seed for the random number generator. Defaults to 0.
pipelines_per_batch (int) – The number of pipelines to be evaluated in each batch, after the first batch. Defaults to 5.
n_jobs (int or None) – Non-negative integer describing level of parallelism used for pipelines. Defaults to None.
number_features (int) – The number of columns in the input features. Defaults to None.
ensembling (boolean) – If True, runs ensembling in a separate batch after every allowed pipeline class has been iterated over. Defaults to False.
text_in_ensembling (boolean) – If True and ensembling is True, then n_jobs will be set to 1 to avoid downstream sklearn stacking issues related to nltk. Defaults to None.
pipeline_params (dict or None) – Pipeline-level parameters that should be passed to the proposed pipelines. Defaults to None.
custom_hyperparameters (dict or None) – Custom hyperparameter ranges specified for pipelines to iterate over. Defaults to None.
_estimator_family_order (list(ModelFamily) or None) – specify the sort order for the first batch. Defaults to None, which uses _ESTIMATOR_FAMILY_ORDER.

Methods

`add_result`	Register results from evaluating a pipeline
`batch_number`	Returns the number of batches which have been recommended so far.
`next_batch`	Get the next batch of pipelines to evaluate
`pipeline_number`	Returns the number of pipelines which have been recommended so far.

add_result(self, score_to_minimize, pipeline, trained_pipeline_results)[source]¶

Parameters

score_to_minimize (float) – The score obtained by this pipeline on the primary objective, converted so that lower values indicate better pipelines.
pipeline (PipelineBase) – The trained pipeline object which was used to compute the score.
trained_pipeline_results (dict) – Results from training a pipeline.

property batch_number(self)¶: Returns the number of batches which have been recommended so far.

next_batch(self)[source]¶

Get the next batch of pipelines to evaluate

Returns: a list of instances of PipelineBase subclasses, ready to be trained and evaluated.
Return type: list(PipelineBase)

property pipeline_number(self)¶: Returns the number of pipelines which have been recommended so far.

Automl automl_algorithm