evalml_algorithm¶

Module Contents¶

Classes Summary¶

EvalMLAlgorithm

An automl algorithm that consists of two modes: fast and long, where fast is a subset of long.

Contents¶

class evalml.automl.automl_algorithm.evalml_algorithm.EvalMLAlgorithm(X, y, problem_type, sampler_name, tuner_class=None, random_seed=0, pipeline_params=None, custom_hyperparameters=None, n_jobs=- 1, text_in_ensembling=None, top_n=3, num_long_explore_pipelines=50, num_long_pipelines_per_batch=10)[source]¶

An automl algorithm that consists of two modes: fast and long, where fast is a subset of long.

Naive pipelines:
1. run baseline with default preprocessing pipeline
2. run naive linear model with default preprocessing pipeline
3. run basic RF pipeline with default preprocessing pipeline
Naive pipelines with feature selection
1. subsequent pipelines will use the selected features with a SelectedColumns transformer

At this point we have a single pipeline candidate for preprocessing and feature selection

Pipelines with preprocessing components:
1. scan rest of estimators (our current batch 1).
First ensembling run

Fast mode ends here. Begin long mode.

Run top 3 estimators:
1. Generate 50 random parameter sets. Run all 150 in one batch
Second ensembling run
Repeat these indefinitely until stopping criterion is met:
1. For each of the previous top 3 estimators, sample 10 parameters from the tuner. Run all 30 in one batch
2. Run ensembling

Methods

`add_result`	Register results from evaluating a pipeline. In batch number 2, the selected column names from the feature selector are taken to be used in a column selector. Information regarding the best pipeline is updated here as well.
`batch_number`	Returns the number of batches which have been recommended so far.
`next_batch`	Get the next batch of pipelines to evaluate
`pipeline_number`	Returns the number of pipelines which have been recommended so far.

add_result(self, score_to_minimize, pipeline, trained_pipeline_results)[source]¶

Register results from evaluating a pipeline. In batch number 2, the selected column names from the feature selector are taken to be used in a column selector. Information regarding the best pipeline is updated here as well.

Parameters

score_to_minimize (float) – The score obtained by this pipeline on the primary objective, converted so that lower values indicate better pipelines.
pipeline (PipelineBase) – The trained pipeline object which was used to compute the score.
trained_pipeline_results (dict) – Results from training a pipeline.

property batch_number(self)¶: Returns the number of batches which have been recommended so far.

next_batch(self)[source]¶

Get the next batch of pipelines to evaluate

Returns: a list of instances of PipelineBase subclasses, ready to be trained and evaluated.
Return type: list(PipelineBase)

property pipeline_number(self)¶: Returns the number of pipelines which have been recommended so far.

automl_algorithm iterative_algorithm