engine

Package Contents

Classes Summary

CFEngine

The concurrent.futures (CF) engine

DaskEngine

The dask engine

EngineBase

Helper class that provides a standard way to create an ABC using

EngineComputation

Wrapper around the result of a (possibly asynchronous) engine computation.

SequentialEngine

The default engine for the AutoML search. Trains and scores pipelines locally and sequentially.

Functions

evaluate_pipeline

Function submitted to the submit_evaluation_job engine method.

train_and_score_pipeline

Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores

train_pipeline

Train a pipeline and tune the threshold if necessary.

Contents

class evalml.automl.engine.CFEngine(client)[source]

The concurrent.futures (CF) engine

Methods

setup_job_log

submit_evaluation_job

Send evaluation job to cluster.

submit_scoring_job

Send scoring job to cluster.

submit_training_job

Send training job to cluster.

static setup_job_log()
submit_evaluation_job(self, automl_config, pipeline, X, y)evalml.automl.engine.engine_base.EngineComputation[source]

Send evaluation job to cluster.

Parameters
  • automl_config – structure containing data passed from AutoMLSearch instance

  • pipeline (pipeline.PipelineBase) – pipeline to evaluate

  • X (pd.DataFrame) – input data for modeling

  • y (pd.Series) – target data for modeling

Returns

an object wrapping a reference to a future-like computation

occurring in the resource pool

Return type

CFComputation

submit_scoring_job(self, automl_config, pipeline, X, y, objectives)evalml.automl.engine.engine_base.EngineComputation[source]

Send scoring job to cluster.

Parameters
  • automl_config – structure containing data passed from AutoMLSearch instance

  • pipeline (pipeline.PipelineBase) – pipeline to train

  • X (pd.DataFrame) – input data for modeling

  • y (pd.Series) – target data for modeling

Returns

a object wrapping a reference to a future-like computation

occurring in the resource pool

Return type

CFComputation

submit_training_job(self, automl_config, pipeline, X, y)evalml.automl.engine.engine_base.EngineComputation[source]

Send training job to cluster.

Parameters
  • automl_config – structure containing data passed from AutoMLSearch instance

  • pipeline (pipeline.PipelineBase) – pipeline to train

  • X (pd.DataFrame) – input data for modeling

  • y (pd.Series) – target data for modeling

Returns

an object wrapping a reference to a future-like computation

occurring in the resource pool

Return type

CFComputation

class evalml.automl.engine.DaskEngine(client)[source]

The dask engine

Methods

send_data_to_cluster

Send data to the cluster.

setup_job_log

submit_evaluation_job

Send evaluation job to cluster.

submit_scoring_job

Send scoring job to cluster.

submit_training_job

Send training job to cluster.

send_data_to_cluster(self, X, y)[source]

Send data to the cluster.

The implementation uses caching so the data is only sent once. This follows dask best practices.

Parameters
  • X (pd.DataFrame) – input data for modeling

  • y (pd.Series) – target data for modeling

Returns

the modeling data

Return type

dask.Future

static setup_job_log()
submit_evaluation_job(self, automl_config, pipeline, X, y)evalml.automl.engine.engine_base.EngineComputation[source]

Send evaluation job to cluster.

Parameters
  • automl_config – structure containing data passed from AutoMLSearch instance

  • pipeline (pipeline.PipelineBase) – pipeline to evaluate

  • X (pd.DataFrame) – input data for modeling

  • y (pd.Series) – target data for modeling

Returns

a object wrapping a reference to a future-like computation

occurring in the dask cluster

Return type

DaskComputation

submit_scoring_job(self, automl_config, pipeline, X, y, objectives)evalml.automl.engine.engine_base.EngineComputation[source]

Send scoring job to cluster.

Parameters
  • automl_config – structure containing data passed from AutoMLSearch instance

  • pipeline (pipeline.PipelineBase) – pipeline to train

  • X (pd.DataFrame) – input data for modeling

  • y (pd.Series) – target data for modeling

Returns

a object wrapping a reference to a future-like computation

occurring in the dask cluster

Return type

DaskComputation

submit_training_job(self, automl_config, pipeline, X, y)evalml.automl.engine.engine_base.EngineComputation[source]

Send training job to cluster.

Parameters
  • automl_config – structure containing data passed from AutoMLSearch instance

  • pipeline (pipeline.PipelineBase) – pipeline to train

  • X (pd.DataFrame) – input data for modeling

  • y (pd.Series) – target data for modeling

Returns

a object wrapping a reference to a future-like computation

occurring in the dask cluster

Return type

DaskComputation

class evalml.automl.engine.EngineBase[source]

Helper class that provides a standard way to create an ABC using inheritance.

Methods

setup_job_log

submit_evaluation_job

Submit job for pipeline evaluation during AutoMLSearch.

submit_scoring_job

Submit job for pipeline scoring.

submit_training_job

Submit job for pipeline training.

static setup_job_log()[source]
abstract submit_evaluation_job(self, automl_config, pipeline, X, y)[source]

Submit job for pipeline evaluation during AutoMLSearch.

abstract submit_scoring_job(self, automl_config, pipeline, X, y, objectives)[source]

Submit job for pipeline scoring.

abstract submit_training_job(self, automl_config, pipeline, X, y)[source]

Submit job for pipeline training.

class evalml.automl.engine.EngineComputation[source]

Wrapper around the result of a (possibly asynchronous) engine computation.

Methods

cancel

Cancel the computation.

done

Whether the computation is done.

get_result

Gets the computation result.

abstract cancel(self)[source]

Cancel the computation.

abstract done(self)[source]

Whether the computation is done.

abstract get_result(self)[source]

Gets the computation result. Will block until the computation is finished.

Raises Exception: If computation fails. Returns traceback.

evalml.automl.engine.evaluate_pipeline(pipeline, automl_config, X, y, logger)[source]

Function submitted to the submit_evaluation_job engine method.

Parameters
  • pipeline (PipelineBase) – The pipeline to score

  • automl_config (AutoMLConfig) – The AutoMLSearch object, used to access config and the error callback

  • X (pd.DataFrame) – Training features

  • y (pd.Series) – Training target

Returns

First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.

Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.

Return type

tuple of three items

class evalml.automl.engine.SequentialEngine[source]

The default engine for the AutoML search. Trains and scores pipelines locally and sequentially.

Methods

setup_job_log

submit_evaluation_job

Submit job for pipeline evaluation during AutoMLSearch.

submit_scoring_job

Submit job for pipeline scoring.

submit_training_job

Submit job for pipeline training.

static setup_job_log()
submit_evaluation_job(self, automl_config, pipeline, X, y)[source]

Submit job for pipeline evaluation during AutoMLSearch.

submit_scoring_job(self, automl_config, pipeline, X, y, objectives)[source]

Submit job for pipeline scoring.

submit_training_job(self, automl_config, pipeline, X, y)[source]

Submit job for pipeline training.

evalml.automl.engine.train_and_score_pipeline(pipeline, automl_config, full_X_train, full_y_train, logger)[source]

Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores

Parameters
  • pipeline (PipelineBase) – The pipeline to score

  • automl_config (AutoMLSearch) – The AutoMLSearch object, used to access config and the error callback

  • full_X_train (pd.DataFrame) – Training features

  • full_y_train (pd.Series) – Training target

Returns

First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.

Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.

Return type

tuple of three items

evalml.automl.engine.train_pipeline(pipeline, X, y, automl_config, schema=True)[source]

Train a pipeline and tune the threshold if necessary.

Parameters
  • pipeline (PipelineBase) – Pipeline to train.

  • X (pd.DataFrame) – Features to train on.

  • y (pd.Series) – Target to train on.

  • automl_config (AutoMLSearch) – The AutoMLSearch object, used to access config and the error callback

  • schema (bool) – Whether to use the schemas for X and y

Returns

trained pipeline.

Return type

pipeline (PipelineBase)