engine¶
Submodules¶
Package Contents¶
Classes Summary¶
The concurrent.futures (CF) engine |
|
The dask engine |
|
Helper class that provides a standard way to create an ABC using |
|
Wrapper around the result of a (possibly asynchronous) engine computation. |
|
The default engine for the AutoML search. Trains and scores pipelines locally and sequentially. |
Functions¶
Function submitted to the submit_evaluation_job engine method. |
|
Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores |
|
Train a pipeline and tune the threshold if necessary. |
Contents¶
-
class
evalml.automl.engine.
CFEngine
(client)[source]¶ The concurrent.futures (CF) engine
Methods
Send evaluation job to cluster.
Send scoring job to cluster.
Send training job to cluster.
-
static
setup_job_log
()¶
-
submit_evaluation_job
(self, automl_config, pipeline, X, y) → evalml.automl.engine.engine_base.EngineComputation[source]¶ Send evaluation job to cluster.
- Parameters
automl_config – structure containing data passed from AutoMLSearch instance
pipeline (pipeline.PipelineBase) – pipeline to evaluate
X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling
- Returns
- an object wrapping a reference to a future-like computation
occurring in the resource pool
- Return type
CFComputation
-
submit_scoring_job
(self, automl_config, pipeline, X, y, objectives) → evalml.automl.engine.engine_base.EngineComputation[source]¶ Send scoring job to cluster.
- Parameters
automl_config – structure containing data passed from AutoMLSearch instance
pipeline (pipeline.PipelineBase) – pipeline to train
X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling
- Returns
- a object wrapping a reference to a future-like computation
occurring in the resource pool
- Return type
CFComputation
-
submit_training_job
(self, automl_config, pipeline, X, y) → evalml.automl.engine.engine_base.EngineComputation[source]¶ Send training job to cluster.
- Parameters
automl_config – structure containing data passed from AutoMLSearch instance
pipeline (pipeline.PipelineBase) – pipeline to train
X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling
- Returns
- an object wrapping a reference to a future-like computation
occurring in the resource pool
- Return type
CFComputation
-
static
-
class
evalml.automl.engine.
DaskEngine
(client)[source]¶ The dask engine
Methods
Send data to the cluster.
Send evaluation job to cluster.
Send scoring job to cluster.
Send training job to cluster.
-
send_data_to_cluster
(self, X, y)[source]¶ Send data to the cluster.
The implementation uses caching so the data is only sent once. This follows dask best practices.
- Parameters
X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling
- Returns
the modeling data
- Return type
dask.Future
-
static
setup_job_log
()¶
-
submit_evaluation_job
(self, automl_config, pipeline, X, y) → evalml.automl.engine.engine_base.EngineComputation[source]¶ Send evaluation job to cluster.
- Parameters
automl_config – structure containing data passed from AutoMLSearch instance
pipeline (pipeline.PipelineBase) – pipeline to evaluate
X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling
- Returns
- a object wrapping a reference to a future-like computation
occurring in the dask cluster
- Return type
DaskComputation
-
submit_scoring_job
(self, automl_config, pipeline, X, y, objectives) → evalml.automl.engine.engine_base.EngineComputation[source]¶ Send scoring job to cluster.
- Parameters
automl_config – structure containing data passed from AutoMLSearch instance
pipeline (pipeline.PipelineBase) – pipeline to train
X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling
- Returns
- a object wrapping a reference to a future-like computation
occurring in the dask cluster
- Return type
DaskComputation
-
submit_training_job
(self, automl_config, pipeline, X, y) → evalml.automl.engine.engine_base.EngineComputation[source]¶ Send training job to cluster.
- Parameters
automl_config – structure containing data passed from AutoMLSearch instance
pipeline (pipeline.PipelineBase) – pipeline to train
X (pd.DataFrame) – input data for modeling
y (pd.Series) – target data for modeling
- Returns
- a object wrapping a reference to a future-like computation
occurring in the dask cluster
- Return type
DaskComputation
-
-
class
evalml.automl.engine.
EngineBase
[source]¶ Helper class that provides a standard way to create an ABC using inheritance.
Methods
Submit job for pipeline evaluation during AutoMLSearch.
Submit job for pipeline scoring.
Submit job for pipeline training.
-
abstract
submit_evaluation_job
(self, automl_config, pipeline, X, y)[source]¶ Submit job for pipeline evaluation during AutoMLSearch.
-
abstract
-
class
evalml.automl.engine.
EngineComputation
[source]¶ Wrapper around the result of a (possibly asynchronous) engine computation.
Methods
Cancel the computation.
Whether the computation is done.
Gets the computation result.
-
evalml.automl.engine.
evaluate_pipeline
(pipeline, automl_config, X, y, logger)[source]¶ Function submitted to the submit_evaluation_job engine method.
- Parameters
pipeline (PipelineBase) – The pipeline to score
automl_config (AutoMLConfig) – The AutoMLSearch object, used to access config and the error callback
X (pd.DataFrame) – Training features
y (pd.Series) – Training target
- Returns
- First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.
Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.
- Return type
tuple of three items
-
class
evalml.automl.engine.
SequentialEngine
[source]¶ The default engine for the AutoML search. Trains and scores pipelines locally and sequentially.
Methods
Submit job for pipeline evaluation during AutoMLSearch.
Submit job for pipeline scoring.
Submit job for pipeline training.
-
static
setup_job_log
()¶
-
submit_evaluation_job
(self, automl_config, pipeline, X, y)[source]¶ Submit job for pipeline evaluation during AutoMLSearch.
-
static
-
evalml.automl.engine.
train_and_score_pipeline
(pipeline, automl_config, full_X_train, full_y_train, logger)[source]¶ Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores
- Parameters
pipeline (PipelineBase) – The pipeline to score
automl_config (AutoMLSearch) – The AutoMLSearch object, used to access config and the error callback
full_X_train (pd.DataFrame) – Training features
full_y_train (pd.Series) – Training target
- Returns
- First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.
Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.
- Return type
tuple of three items
-
evalml.automl.engine.
train_pipeline
(pipeline, X, y, automl_config, schema=True)[source]¶ Train a pipeline and tune the threshold if necessary.
- Parameters
pipeline (PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Features to train on.
y (pd.Series) – Target to train on.
automl_config (AutoMLSearch) – The AutoMLSearch object, used to access config and the error callback
schema (bool) – Whether to use the schemas for X and y
- Returns
trained pipeline.
- Return type
pipeline (PipelineBase)