dask_engine

A Future-like wrapper around jobs created by the DaskEngine.

Module Contents

Classes Summary

DaskComputation

A Future-like wrapper around jobs created by the DaskEngine.

DaskEngine

The dask engine.

Contents

class evalml.automl.engine.dask_engine.DaskComputation(dask_future)[source]

A Future-like wrapper around jobs created by the DaskEngine.

Parameters

dask_future (callable) – Computation to do.

Methods

cancel

Cancel the current computation.

done

Returns whether the computation is done.

get_result

Gets the computation result. Will block until the computation is finished.

is_cancelled

Returns whether computation was cancelled.

cancel(self)[source]

Cancel the current computation.

done(self)[source]

Returns whether the computation is done.

get_result(self)[source]

Gets the computation result. Will block until the computation is finished.

Raises

Exception – If computation fails. Returns traceback.

Returns

Computation results.

property is_cancelled(self)

Returns whether computation was cancelled.

class evalml.automl.engine.dask_engine.DaskEngine(cluster=None)[source]

The dask engine.

Parameters

cluster (None or dd.Client) – If None, creates a local, threaded Dask client for processing. Defaults to None.

Methods

close

Closes the underlying cluster.

is_closed

Property that determines whether the Engine’s Client’s resources are shutdown.

send_data_to_cluster

Send data to the cluster.

setup_job_log

Set up logger for job.

submit_evaluation_job

Send evaluation job to cluster.

submit_scoring_job

Send scoring job to cluster.

submit_training_job

Send training job to cluster.

close(self)[source]

Closes the underlying cluster.

property is_closed(self)

Property that determines whether the Engine’s Client’s resources are shutdown.

send_data_to_cluster(self, X, y)[source]

Send data to the cluster.

The implementation uses caching so the data is only sent once. This follows dask best practices.

Parameters
  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

Returns

The modeling data.

Return type

dask.Future

static setup_job_log()

Set up logger for job.

submit_evaluation_job(self, automl_config, pipeline, X, y)[source]

Send evaluation job to cluster.

Parameters
  • automl_config – Structure containing data passed from AutoMLSearch instance.

  • pipeline (pipeline.PipelineBase) – Pipeline to evaluate.

  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

Returns

An object wrapping a reference to a future-like computation

occurring in the dask cluster.

Return type

DaskComputation

submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)[source]

Send scoring job to cluster.

Parameters
  • automl_config – Structure containing data passed from AutoMLSearch instance.

  • pipeline (pipeline.PipelineBase) – Pipeline to train.

  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

  • X_train (pd.DataFrame) – Training features. Used for feature engineering in time series.

  • y_train (pd.Series) – Training target. Used for feature engineering in time series.

  • objectives (list[ObjectiveBase]) – List of objectives to score on.

Returns

An object wrapping a reference to a future-like computation

occurring in the dask cluster.

Return type

DaskComputation

submit_training_job(self, automl_config, pipeline, X, y)[source]

Send training job to cluster.

Parameters
  • automl_config – Structure containing data passed from AutoMLSearch instance.

  • pipeline (pipeline.PipelineBase) – Pipeline to train.

  • X (pd.DataFrame) – Input data for modeling.

  • y (pd.Series) – Target data for modeling.

Returns

An object wrapping a reference to a future-like computation

occurring in the dask cluster.

Return type

DaskComputation