dask_engine

Module Contents

Classes Summary

DaskComputation

A Future-like wrapper around jobs created by the DaskEngine.

DaskEngine

The dask engine

Contents

class evalml.automl.engine.dask_engine.DaskComputation(dask_future)[source]

A Future-like wrapper around jobs created by the DaskEngine.

Parameters

dask_future (callable) – Computation to do.

Methods

cancel

Cancel the current computation.

done

returns

Whether the computation is done.

get_result

Gets the computation result.

is_cancelled

returns

Returns whether computation was cancelled.

cancel(self)[source]

Cancel the current computation.

done(self)[source]
Returns

Whether the computation is done.

Return type

bool

get_result(self)[source]

Gets the computation result. Will block until the computation is finished.

Raises

Exception – If computation fails. Returns traceback.

property is_cancelled(self)
Returns

Returns whether computation was cancelled.

Return type

bool

class evalml.automl.engine.dask_engine.DaskEngine(cluster=None)[source]

The dask engine

Parameters

cluster (None or dd.Client) – If None, creates a local, threaded Dask client for processing. Defaults to None.

Methods

close

Closes the underlying cluster.

is_closed

Property that determines whether the Engine’s Client’s resources are shutdown.

send_data_to_cluster

Send data to the cluster.

setup_job_log

submit_evaluation_job

Send evaluation job to cluster.

submit_scoring_job

Send scoring job to cluster.

submit_training_job

Send training job to cluster.

close(self)[source]

Closes the underlying cluster.

property is_closed(self)

Property that determines whether the Engine’s Client’s resources are shutdown.

send_data_to_cluster(self, X, y)[source]

Send data to the cluster.

The implementation uses caching so the data is only sent once. This follows dask best practices.

Parameters
  • X (pd.DataFrame) – input data for modeling

  • y (pd.Series) – target data for modeling

Returns

the modeling data

Return type

dask.Future

static setup_job_log()
submit_evaluation_job(self, automl_config, pipeline, X, y)evalml.automl.engine.engine_base.EngineComputation[source]

Send evaluation job to cluster.

Parameters
  • automl_config – structure containing data passed from AutoMLSearch instance

  • pipeline (pipeline.PipelineBase) – pipeline to evaluate

  • X (pd.DataFrame) – input data for modeling

  • y (pd.Series) – target data for modeling

Returns

a object wrapping a reference to a future-like computation

occurring in the dask cluster

Return type

DaskComputation

submit_scoring_job(self, automl_config, pipeline, X, y, objectives)evalml.automl.engine.engine_base.EngineComputation[source]

Send scoring job to cluster.

Parameters
  • automl_config – structure containing data passed from AutoMLSearch instance

  • pipeline (pipeline.PipelineBase) – pipeline to train

  • X (pd.DataFrame) – input data for modeling

  • y (pd.Series) – target data for modeling

Returns

a object wrapping a reference to a future-like computation

occurring in the dask cluster

Return type

DaskComputation

submit_training_job(self, automl_config, pipeline, X, y)evalml.automl.engine.engine_base.EngineComputation[source]

Send training job to cluster.

Parameters
  • automl_config – structure containing data passed from AutoMLSearch instance

  • pipeline (pipeline.PipelineBase) – pipeline to train

  • X (pd.DataFrame) – input data for modeling

  • y (pd.Series) – target data for modeling

Returns

a object wrapping a reference to a future-like computation

occurring in the dask cluster

Return type

DaskComputation