engine¶

Submodules¶

Package Contents¶

Classes Summary¶

`CFEngine`	The concurrent.futures (CF) engine
`DaskEngine`	The dask engine
`EngineBase`	Helper class that provides a standard way to create an ABC using
`EngineComputation`	Wrapper around the result of a (possibly asynchronous) engine computation.
`SequentialEngine`	The default engine for the AutoML search. Trains and scores pipelines locally and sequentially.

Functions¶

`evaluate_pipeline`	Function submitted to the submit_evaluation_job engine method.
`train_and_score_pipeline`	Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores
`train_pipeline`	Train a pipeline and tune the threshold if necessary.

Contents¶

class evalml.automl.engine.CFEngine(client)[source]¶

The concurrent.futures (CF) engine

Methods

`setup_job_log`
`submit_evaluation_job`	Send evaluation job to cluster.
`submit_scoring_job`	Send scoring job to cluster.
`submit_training_job`	Send training job to cluster.

static setup_job_log()¶

submit_evaluation_job(self, automl_config, pipeline, X, y) → evalml.automl.engine.engine_base.EngineComputation [source]¶

Send evaluation job to cluster.

Parameters

automl_config – structure containing data passed from AutoMLSearch instance
pipeline (pipeline.PipelineBase) – pipeline to evaluate
X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling

Returns

an object wrapping a reference to a future-like computation: occurring in the resource pool

Return type

CFComputation

submit_scoring_job(self, automl_config, pipeline, X, y, objectives) → evalml.automl.engine.engine_base.EngineComputation [source]¶

Send scoring job to cluster.

Parameters

automl_config – structure containing data passed from AutoMLSearch instance
pipeline (pipeline.PipelineBase) – pipeline to train
X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling

Returns

a object wrapping a reference to a future-like computation: occurring in the resource pool

Return type

CFComputation

submit_training_job(self, automl_config, pipeline, X, y) → evalml.automl.engine.engine_base.EngineComputation [source]¶

Send training job to cluster.

Parameters

automl_config – structure containing data passed from AutoMLSearch instance
pipeline (pipeline.PipelineBase) – pipeline to train
X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling

Returns

an object wrapping a reference to a future-like computation: occurring in the resource pool

Return type

CFComputation

class evalml.automl.engine.DaskEngine(client)[source]¶

The dask engine

Methods

`send_data_to_cluster`	Send data to the cluster.
`setup_job_log`
`submit_evaluation_job`	Send evaluation job to cluster.
`submit_scoring_job`	Send scoring job to cluster.
`submit_training_job`	Send training job to cluster.

send_data_to_cluster(self, X, y)[source]¶

Send data to the cluster.

The implementation uses caching so the data is only sent once. This follows dask best practices.

Parameters

X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling

Returns

the modeling data

Return type

dask.Future

static setup_job_log()¶

submit_evaluation_job(self, automl_config, pipeline, X, y) → evalml.automl.engine.engine_base.EngineComputation [source]¶

Send evaluation job to cluster.

Parameters

automl_config – structure containing data passed from AutoMLSearch instance
pipeline (pipeline.PipelineBase) – pipeline to evaluate
X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling

Returns

a object wrapping a reference to a future-like computation: occurring in the dask cluster

Return type

DaskComputation

submit_scoring_job(self, automl_config, pipeline, X, y, objectives) → evalml.automl.engine.engine_base.EngineComputation [source]¶

Send scoring job to cluster.

Parameters

automl_config – structure containing data passed from AutoMLSearch instance
pipeline (pipeline.PipelineBase) – pipeline to train
X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling

Returns

a object wrapping a reference to a future-like computation: occurring in the dask cluster

Return type

DaskComputation

submit_training_job(self, automl_config, pipeline, X, y) → evalml.automl.engine.engine_base.EngineComputation [source]¶

Send training job to cluster.

Parameters

automl_config – structure containing data passed from AutoMLSearch instance
pipeline (pipeline.PipelineBase) – pipeline to train
X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling

Returns

a object wrapping a reference to a future-like computation: occurring in the dask cluster

Return type

DaskComputation

class evalml.automl.engine.EngineBase[source]¶

Helper class that provides a standard way to create an ABC using inheritance.

Methods

`setup_job_log`
`submit_evaluation_job`	Submit job for pipeline evaluation during AutoMLSearch.
`submit_scoring_job`	Submit job for pipeline scoring.
`submit_training_job`	Submit job for pipeline training.

static setup_job_log()[source]¶

abstract submit_evaluation_job(self, automl_config, pipeline, X, y)[source]¶: Submit job for pipeline evaluation during AutoMLSearch.

abstract submit_scoring_job(self, automl_config, pipeline, X, y, objectives)[source]¶: Submit job for pipeline scoring.

abstract submit_training_job(self, automl_config, pipeline, X, y)[source]¶: Submit job for pipeline training.

class evalml.automl.engine.EngineComputation[source]¶

Wrapper around the result of a (possibly asynchronous) engine computation.

Methods

`cancel`	Cancel the computation.
`done`	Whether the computation is done.
`get_result`	Gets the computation result.

abstract cancel(self)[source]¶: Cancel the computation.

abstract done(self)[source]¶: Whether the computation is done.

abstract get_result(self)[source]¶

Gets the computation result. Will block until the computation is finished.

Raises Exception: If computation fails. Returns traceback.

evalml.automl.engine.evaluate_pipeline(pipeline, automl_config, X, y, logger)[source]¶

Function submitted to the submit_evaluation_job engine method.

Parameters

pipeline (PipelineBase) – The pipeline to score
automl_config (AutoMLConfig) – The AutoMLSearch object, used to access config and the error callback
X (pd.DataFrame) – Training features
y (pd.Series) – Training target

Returns

First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.: Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.

Return type

tuple of three items

class evalml.automl.engine.SequentialEngine[source]¶

The default engine for the AutoML search. Trains and scores pipelines locally and sequentially.

Methods

`setup_job_log`
`submit_evaluation_job`	Submit job for pipeline evaluation during AutoMLSearch.
`submit_scoring_job`	Submit job for pipeline scoring.
`submit_training_job`	Submit job for pipeline training.

static setup_job_log()¶

submit_evaluation_job(self, automl_config, pipeline, X, y)[source]¶: Submit job for pipeline evaluation during AutoMLSearch.

submit_scoring_job(self, automl_config, pipeline, X, y, objectives)[source]¶: Submit job for pipeline scoring.

submit_training_job(self, automl_config, pipeline, X, y)[source]¶: Submit job for pipeline training.

evalml.automl.engine.train_and_score_pipeline(pipeline, automl_config, full_X_train, full_y_train, logger)[source]¶

Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores

Parameters

pipeline (PipelineBase) – The pipeline to score
automl_config (AutoMLSearch) – The AutoMLSearch object, used to access config and the error callback
full_X_train (pd.DataFrame) – Training features
full_y_train (pd.Series) – Training target

Returns

First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.: Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.

Return type

tuple of three items

evalml.automl.engine.train_pipeline(pipeline, X, y, automl_config, schema=True)[source]¶

Train a pipeline and tune the threshold if necessary.

Parameters

pipeline (PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Features to train on.
y (pd.Series) – Target to train on.
automl_config (AutoMLSearch) – The AutoMLSearch object, used to access config and the error callback
schema (bool) – Whether to use the schemas for X and y

Returns

trained pipeline.

Return type

pipeline (PipelineBase)

iterative_algorithm cf_engine