automl_algorithm#

Base class for the AutoML algorithms which power EvalML.

Module Contents#

Classes Summary#

AutoMLAlgorithm

Base class for the AutoML algorithms which power EvalML.

Exceptions Summary#

Contents#

class evalml.automl.automl_algorithm.automl_algorithm.AutoMLAlgorithm(allowed_pipelines=None, allowed_model_families=None, excluded_model_families=None, allowed_component_graphs=None, search_parameters=None, tuner_class=None, text_in_ensembling=False, random_seed=0, n_jobs=- 1)[source]#

Base class for the AutoML algorithms which power EvalML.

This class represents an automated machine learning (AutoML) algorithm. It encapsulates the decision-making logic behind an automl search, by both deciding which pipelines to evaluate next and by deciding what set of parameters to configure the pipeline with.

To use this interface, you must define a next_batch method which returns the next group of pipelines to evaluate on the training data. That method may access state and results recorded from the previous batches, although that information is not tracked in a general way in this base class. Overriding add_result is a convenient way to record pipeline evaluation info if necessary.

Parameters
  • allowed_pipelines (list(class)) – A list of PipelineBase subclasses indicating the pipelines allowed in the search. The default of None indicates all pipelines for this problem type are allowed.

  • search_parameters (dict) – Search parameter ranges specified for pipelines to iterate over.

  • tuner_class (class) – A subclass of Tuner, to be used to find parameters for each pipeline. The default of None indicates the SKOptTuner will be used.

  • text_in_ensembling (boolean) – If True and ensembling is True, then n_jobs will be set to 1 to avoid downstream sklearn stacking issues related to nltk. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Methods

add_result

Register results from evaluating a pipeline.

batch_number

Returns the number of batches which have been recommended so far.

default_max_batches

Returns the number of max batches AutoMLSearch should run by default.

next_batch

Get the next batch of pipelines to evaluate.

num_pipelines_per_batch

Return the number of pipelines in the nth batch.

pipeline_number

Returns the number of pipelines which have been recommended so far.

add_result(self, score_to_minimize, pipeline, trained_pipeline_results)[source]#

Register results from evaluating a pipeline.

Parameters
  • score_to_minimize (float) – The score obtained by this pipeline on the primary objective, converted so that lower values indicate better pipelines.

  • pipeline (PipelineBase) – The trained pipeline object which was used to compute the score.

  • trained_pipeline_results (dict) – Results from training a pipeline.

Raises

PipelineNotFoundError – If pipeline is not allowed in search.

property batch_number(self)#

Returns the number of batches which have been recommended so far.

property default_max_batches(self)#

Returns the number of max batches AutoMLSearch should run by default.

abstract next_batch(self)[source]#

Get the next batch of pipelines to evaluate.

Returns

A list of instances of PipelineBase subclasses, ready to be trained and evaluated.

Return type

list[PipelineBase]

abstract num_pipelines_per_batch(self, batch_number)[source]#

Return the number of pipelines in the nth batch.

Parameters

batch_number (int) – which batch to calculate the number of pipelines for.

Returns

number of pipelines in the given batch.

Return type

int

property pipeline_number(self)#

Returns the number of pipelines which have been recommended so far.

exception evalml.automl.automl_algorithm.automl_algorithm.AutoMLAlgorithmException[source]#

Exception raised when an error is encountered during the computation of the automl algorithm.