engine

EvalML Engine classes used to evaluate pipelines in AutoMLSearch.

Package Contents

Classes Summary

CFEngine

The concurrent.futures (CF) engine.

DaskEngine

The dask engine.

EngineBase

Base class for EvalML engines.

EngineComputation

Wrapper around the result of a (possibly asynchronous) engine computation.

SequentialEngine

The default engine for the AutoML search.

Functions

evaluate_pipeline

Function submitted to the submit_evaluation_job engine method.

train_and_score_pipeline

Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores.

train_pipeline

Train a pipeline and tune the threshold if necessary.

Contents

class evalml.automl.engine.CFEngine(client=None)[source]

The concurrent.futures (CF) engine.

Parameters

client (None or CFClient) – If None, creates a threaded pool for processing. Defaults to None.

Methods

close

Function to properly shutdown the Engine’s Client’s resources.

is_closed

Property that determines whether the Engine’s Client’s resources are shutdown.

setup_job_log

Set up logger for job.

submit_evaluation_job

Send evaluation job to cluster.

submit_scoring_job

Send scoring job to cluster.

submit_training_job

Send training job to cluster.

close(self)[source]

Function to properly shutdown the Engine’s Client’s resources.

property is_closed(self)

Property that determines whether the Engine’s Client’s resources are shutdown.

static setup_job_log()

Set up logger for job.

submit_evaluation_job(self, automl_config, pipeline, X, y)[source]

Send evaluation job to cluster.

Parameters
  • automl_config – Structure containing data passed from AutoMLSearch instance.

  • pipeline (pipeline.PipelineBase) – Pipeline to evaluate.

  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

Returns

An object wrapping a reference to a future-like computation

occurring in the resource pool

Return type

CFComputation

submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)[source]

Send scoring job to cluster.

Parameters
  • automl_config – Structure containing data passed from AutoMLSearch instance.

  • pipeline (pipeline.PipelineBase) – Pipeline to train.

  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

  • X_train (pd.DataFrame) – Training features. Used for feature engineering in time series.

  • y_train (pd.Series) – Training target. Used for feature engineering in time series.

  • objectives (list[ObjectiveBase]) – Objectives to score on.

Returns

An object wrapping a reference to a future-like computation

occurring in the resource pool.

Return type

CFComputation

submit_training_job(self, automl_config, pipeline, X, y)[source]

Send training job to cluster.

Parameters
  • automl_config – Structure containing data passed from AutoMLSearch instance.

  • pipeline (pipeline.PipelineBase) – Pipeline to train.

  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

Returns

An object wrapping a reference to a future-like computation

occurring in the resource pool

Return type

CFComputation

class evalml.automl.engine.DaskEngine(cluster=None)[source]

The dask engine.

Parameters

cluster (None or dd.Client) – If None, creates a local, threaded Dask client for processing. Defaults to None.

Methods

close

Closes the underlying cluster.

is_closed

Property that determines whether the Engine’s Client’s resources are shutdown.

send_data_to_cluster

Send data to the cluster.

setup_job_log

Set up logger for job.

submit_evaluation_job

Send evaluation job to cluster.

submit_scoring_job

Send scoring job to cluster.

submit_training_job

Send training job to cluster.

close(self)[source]

Closes the underlying cluster.

property is_closed(self)

Property that determines whether the Engine’s Client’s resources are shutdown.

send_data_to_cluster(self, X, y)[source]

Send data to the cluster.

The implementation uses caching so the data is only sent once. This follows dask best practices.

Parameters
  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

Returns

The modeling data.

Return type

dask.Future

static setup_job_log()

Set up logger for job.

submit_evaluation_job(self, automl_config, pipeline, X, y)[source]

Send evaluation job to cluster.

Parameters
  • automl_config – Structure containing data passed from AutoMLSearch instance.

  • pipeline (pipeline.PipelineBase) – Pipeline to evaluate.

  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

Returns

An object wrapping a reference to a future-like computation

occurring in the dask cluster.

Return type

DaskComputation

submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)[source]

Send scoring job to cluster.

Parameters
  • automl_config – Structure containing data passed from AutoMLSearch instance.

  • pipeline (pipeline.PipelineBase) – Pipeline to train.

  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

  • X_train (pd.DataFrame) – Training features. Used for feature engineering in time series.

  • y_train (pd.Series) – Training target. Used for feature engineering in time series.

  • objectives (list[ObjectiveBase]) – List of objectives to score on.

Returns

An object wrapping a reference to a future-like computation

occurring in the dask cluster.

Return type

DaskComputation

submit_training_job(self, automl_config, pipeline, X, y)[source]

Send training job to cluster.

Parameters
  • automl_config – Structure containing data passed from AutoMLSearch instance.

  • pipeline (pipeline.PipelineBase) – Pipeline to train.

  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

Returns

An object wrapping a reference to a future-like computation

occurring in the dask cluster.

Return type

DaskComputation

class evalml.automl.engine.EngineBase[source]

Base class for EvalML engines.

Methods

setup_job_log

Set up logger for job.

submit_evaluation_job

Submit job for pipeline evaluation during AutoMLSearch.

submit_scoring_job

Submit job for pipeline scoring.

submit_training_job

Submit job for pipeline training.

static setup_job_log()[source]

Set up logger for job.

abstract submit_evaluation_job(self, automl_config, pipeline, X, y)[source]

Submit job for pipeline evaluation during AutoMLSearch.

abstract submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)[source]

Submit job for pipeline scoring.

abstract submit_training_job(self, automl_config, pipeline, X, y)[source]

Submit job for pipeline training.

class evalml.automl.engine.EngineComputation[source]

Wrapper around the result of a (possibly asynchronous) engine computation.

Methods

cancel

Cancel the computation.

done

Whether the computation is done.

get_result

Gets the computation result. Will block until the computation is finished.

abstract cancel(self)[source]

Cancel the computation.

abstract done(self)[source]

Whether the computation is done.

abstract get_result(self)[source]

Gets the computation result. Will block until the computation is finished.

Raises Exception: If computation fails. Returns traceback.

evalml.automl.engine.evaluate_pipeline(pipeline, automl_config, X, y, logger)[source]

Function submitted to the submit_evaluation_job engine method.

Parameters
  • pipeline (PipelineBase) – The pipeline to score.

  • automl_config (AutoMLConfig) – The AutoMLSearch object, used to access config and the error callback.

  • X (pd.DataFrame) – Training features.

  • y (pd.Series) – Training target.

  • logger – Logger object to write to.

Returns

First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.

Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.

Return type

tuple of three items

class evalml.automl.engine.SequentialEngine[source]

The default engine for the AutoML search.

Trains and scores pipelines locally and sequentially.

Methods

close

No-op.

setup_job_log

Set up logger for job.

submit_evaluation_job

Submit a job to evaluate a pipeline.

submit_scoring_job

Submit a job to score a pipeline.

submit_training_job

Submit a job to train a pipeline.

close(self)[source]

No-op.

static setup_job_log()

Set up logger for job.

submit_evaluation_job(self, automl_config, pipeline, X, y)[source]

Submit a job to evaluate a pipeline.

Parameters
  • automl_config – Structure containing data passed from AutoMLSearch instance.

  • pipeline (pipeline.PipelineBase) – Pipeline to evaluate.

  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

Returns

Computation result.

Return type

SequentialComputation

submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)[source]

Submit a job to score a pipeline.

Parameters
  • automl_config – Structure containing data passed from AutoMLSearch instance.

  • pipeline (pipeline.PipelineBase) – Pipeline to train.

  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

  • X_train (pd.DataFrame) – Training features. Used for feature engineering in time series.

  • y_train (pd.Series) – Training target. Used for feature engineering in time series.

  • objectives (list[ObjectiveBase]) – List of objectives to score on.

Returns

Computation result.

Return type

SequentialComputation

submit_training_job(self, automl_config, pipeline, X, y)[source]

Submit a job to train a pipeline.

Parameters
  • automl_config – Structure containing data passed from AutoMLSearch instance.

  • pipeline (pipeline.PipelineBase) – Pipeline to evaluate.

  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

Returns

Computation result.

Return type

SequentialComputation

evalml.automl.engine.train_and_score_pipeline(pipeline, automl_config, full_X_train, full_y_train, logger)[source]

Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores.

Parameters
  • pipeline (PipelineBase) – The pipeline to score.

  • automl_config (AutoMLSearch) – The AutoMLSearch object, used to access config and the error callback.

  • full_X_train (pd.DataFrame) – Training features.

  • full_y_train (pd.Series) – Training target.

  • logger – Logger object to write to.

Raises

Exception – If there are missing target values in the training set after data split.

Returns

First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.

Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.

Return type

tuple of three items

evalml.automl.engine.train_pipeline(pipeline, X, y, automl_config, schema=True)[source]

Train a pipeline and tune the threshold if necessary.

Parameters
  • pipeline (PipelineBase) – Pipeline to train.

  • X (pd.DataFrame) – Features to train on.

  • y (pd.Series) – Target to train on.

  • automl_config (AutoMLSearch) – The AutoMLSearch object, used to access config and the error callback.

  • schema (bool) – Whether to use the schemas for X and y. Defaults to True.

Returns

A trained pipeline instance.

Return type

pipeline (PipelineBase)