dask_engine#

A Future-like wrapper around jobs created by the DaskEngine.

Module Contents#

Classes Summary#

`DaskComputation`	A Future-like wrapper around jobs created by the DaskEngine.
`DaskEngine`	The dask engine.

Contents#

class evalml.automl.engine.dask_engine.DaskComputation(dask_future)[source]#

A Future-like wrapper around jobs created by the DaskEngine.

Parameters: dask_future (callable) – Computation to do.

Methods

`cancel`	Cancel the current computation.
`done`	Returns whether the computation is done.
`get_result`	Gets the computation result. Will block until the computation is finished.
`is_cancelled`	Returns whether computation was cancelled.

cancel(self)[source]#: Cancel the current computation.

done(self)[source]#: Returns whether the computation is done.

get_result(self)[source]#

Gets the computation result. Will block until the computation is finished.

Raises: Exception – If computation fails. Returns traceback.
Returns: Computation results.

property is_cancelled(self)#: Returns whether computation was cancelled.

class evalml.automl.engine.dask_engine.DaskEngine(cluster=None)[source]#

The dask engine.

Parameters: cluster (None or dd.Client) – If None, creates a local, threaded Dask client for processing. Defaults to None.

Methods

`close`	Closes the underlying cluster.
`is_closed`	Property that determines whether the Engine's Client's resources are shutdown.
`send_data_to_cluster`	Send data to the cluster.
`setup_job_log`	Set up logger for job.
`submit_evaluation_job`	Send evaluation job to cluster.
`submit_scoring_job`	Send scoring job to cluster.
`submit_training_job`	Send training job to cluster.

close(self)[source]#: Closes the underlying cluster.

property is_closed(self)#: Property that determines whether the Engine’s Client’s resources are shutdown.

send_data_to_cluster(self, X, y)[source]#

Send data to the cluster.

The implementation uses caching so the data is only sent once. This follows dask best practices.

Parameters

X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.

Returns

The modeling data.

Return type

dask.Future

static setup_job_log()#: Set up logger for job.

submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None)[source]#

Send evaluation job to cluster.

Parameters

automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to evaluate.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
X_holdout (pd.Series) – Holdout input data for holdout scoring.
y_holdout (pd.Series) – Holdout target data for holdout scoring.

Returns

An object wrapping a reference to a future-like computation: occurring in the dask cluster.

Return type

DaskComputation

submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)[source]#

Send scoring job to cluster.

Parameters

automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
X_train (pd.DataFrame) – Training features. Used for feature engineering in time series.
y_train (pd.Series) – Training target. Used for feature engineering in time series.
objectives (list[ObjectiveBase]) – List of objectives to score on.

Returns

An object wrapping a reference to a future-like computation: occurring in the dask cluster.

Return type

DaskComputation

submit_training_job(self, automl_config, pipeline, X, y)[source]#

Send training job to cluster.

Parameters

automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.

Returns

An object wrapping a reference to a future-like computation: occurring in the dask cluster.

Return type

DaskComputation