dask_engine#
A Future-like wrapper around jobs created by the DaskEngine.
Module Contents#
Classes Summary#
A Future-like wrapper around jobs created by the DaskEngine. |
|
The dask engine. |
Contents#
- class evalml.automl.engine.dask_engine.DaskComputation(dask_future)[source]#
A Future-like wrapper around jobs created by the DaskEngine.
- Parameters
dask_future (callable) – Computation to do.
Methods
Cancel the current computation.
Returns whether the computation is done.
Gets the computation result. Will block until the computation is finished.
Returns whether computation was cancelled.
- get_result(self)[source]#
Gets the computation result. Will block until the computation is finished.
- Raises
Exception – If computation fails. Returns traceback.
- Returns
Computation results.
- property is_cancelled(self)#
Returns whether computation was cancelled.
- class evalml.automl.engine.dask_engine.DaskEngine(cluster=None)[source]#
The dask engine.
- Parameters
cluster (None or dd.Client) – If None, creates a local, threaded Dask client for processing. Defaults to None.
Methods
Closes the underlying cluster.
Property that determines whether the Engine's Client's resources are shutdown.
Send data to the cluster.
Set up logger for job.
Send evaluation job to cluster.
Send scoring job to cluster.
Send training job to cluster.
- property is_closed(self)#
Property that determines whether the Engine’s Client’s resources are shutdown.
- send_data_to_cluster(self, X, y)[source]#
Send data to the cluster.
The implementation uses caching so the data is only sent once. This follows dask best practices.
- Parameters
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
- Returns
The modeling data.
- Return type
dask.Future
- static setup_job_log()#
Set up logger for job.
- submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None)[source]#
Send evaluation job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to evaluate.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
X_holdout (pd.Series) – Holdout input data for holdout scoring.
y_holdout (pd.Series) – Holdout target data for holdout scoring.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the dask cluster.
- Return type
- submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)[source]#
Send scoring job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
X_train (pd.DataFrame) – Training features. Used for feature engineering in time series.
y_train (pd.Series) – Training target. Used for feature engineering in time series.
objectives (list[ObjectiveBase]) – List of objectives to score on.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the dask cluster.
- Return type
- submit_training_job(self, automl_config, pipeline, X, y)[source]#
Send training job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the dask cluster.
- Return type