engine¶
EvalML Engine classes used to evaluate pipelines in AutoMLSearch.
Submodules¶
Package Contents¶
Classes Summary¶
The concurrent.futures (CF) engine. |
|
The dask engine. |
|
Base class for EvalML engines. |
|
Wrapper around the result of a (possibly asynchronous) engine computation. |
|
The default engine for the AutoML search. |
Functions¶
Function submitted to the submit_evaluation_job engine method. |
|
Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores. |
|
Train a pipeline and tune the threshold if necessary. |
Contents¶
-
class
evalml.automl.engine.
CFEngine
(client=None)[source]¶ The concurrent.futures (CF) engine.
- Parameters
client (None or CFClient) – If None, creates a threaded pool for processing. Defaults to None.
Methods
Function to properly shutdown the Engine’s Client’s resources.
Property that determines whether the Engine’s Client’s resources are shutdown.
Set up logger for job.
Send evaluation job to cluster.
Send scoring job to cluster.
Send training job to cluster.
-
property
is_closed
(self)¶ Property that determines whether the Engine’s Client’s resources are shutdown.
-
static
setup_job_log
()¶ Set up logger for job.
-
submit_evaluation_job
(self, automl_config, pipeline, X, y)[source]¶ Send evaluation job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to evaluate.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the resource pool
- Return type
CFComputation
-
submit_scoring_job
(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)[source]¶ Send scoring job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
X_train (pd.DataFrame) – Training features. Used for feature engineering in time series.
y_train (pd.Series) – Training target. Used for feature engineering in time series.
objectives (list[ObjectiveBase]) – Objectives to score on.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the resource pool.
- Return type
CFComputation
-
submit_training_job
(self, automl_config, pipeline, X, y)[source]¶ Send training job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the resource pool
- Return type
CFComputation
-
class
evalml.automl.engine.
DaskEngine
(cluster=None)[source]¶ The dask engine.
- Parameters
cluster (None or dd.Client) – If None, creates a local, threaded Dask client for processing. Defaults to None.
Methods
Closes the underlying cluster.
Property that determines whether the Engine’s Client’s resources are shutdown.
Send data to the cluster.
Set up logger for job.
Send evaluation job to cluster.
Send scoring job to cluster.
Send training job to cluster.
-
property
is_closed
(self)¶ Property that determines whether the Engine’s Client’s resources are shutdown.
-
send_data_to_cluster
(self, X, y)[source]¶ Send data to the cluster.
The implementation uses caching so the data is only sent once. This follows dask best practices.
- Parameters
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
- Returns
The modeling data.
- Return type
dask.Future
-
static
setup_job_log
()¶ Set up logger for job.
-
submit_evaluation_job
(self, automl_config, pipeline, X, y)[source]¶ Send evaluation job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to evaluate.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the dask cluster.
- Return type
DaskComputation
-
submit_scoring_job
(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)[source]¶ Send scoring job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
X_train (pd.DataFrame) – Training features. Used for feature engineering in time series.
y_train (pd.Series) – Training target. Used for feature engineering in time series.
objectives (list[ObjectiveBase]) – List of objectives to score on.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the dask cluster.
- Return type
DaskComputation
-
submit_training_job
(self, automl_config, pipeline, X, y)[source]¶ Send training job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the dask cluster.
- Return type
DaskComputation
-
class
evalml.automl.engine.
EngineBase
[source]¶ Base class for EvalML engines.
Methods
Set up logger for job.
Submit job for pipeline evaluation during AutoMLSearch.
Submit job for pipeline scoring.
Submit job for pipeline training.
-
abstract
submit_evaluation_job
(self, automl_config, pipeline, X, y)[source]¶ Submit job for pipeline evaluation during AutoMLSearch.
-
abstract
-
class
evalml.automl.engine.
EngineComputation
[source]¶ Wrapper around the result of a (possibly asynchronous) engine computation.
Methods
Cancel the computation.
Whether the computation is done.
Gets the computation result. Will block until the computation is finished.
-
evalml.automl.engine.
evaluate_pipeline
(pipeline, automl_config, X, y, logger)[source]¶ Function submitted to the submit_evaluation_job engine method.
- Parameters
pipeline (PipelineBase) – The pipeline to score.
automl_config (AutoMLConfig) – The AutoMLSearch object, used to access config and the error callback.
X (pd.DataFrame) – Training features.
y (pd.Series) – Training target.
logger – Logger object to write to.
- Returns
- First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.
Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.
- Return type
tuple of three items
-
class
evalml.automl.engine.
SequentialEngine
[source]¶ The default engine for the AutoML search.
Trains and scores pipelines locally and sequentially.
Methods
No-op.
Set up logger for job.
Submit a job to evaluate a pipeline.
Submit a job to score a pipeline.
Submit a job to train a pipeline.
-
static
setup_job_log
()¶ Set up logger for job.
-
submit_evaluation_job
(self, automl_config, pipeline, X, y)[source]¶ Submit a job to evaluate a pipeline.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to evaluate.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
- Returns
Computation result.
- Return type
SequentialComputation
-
submit_scoring_job
(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)[source]¶ Submit a job to score a pipeline.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
X_train (pd.DataFrame) – Training features. Used for feature engineering in time series.
y_train (pd.Series) – Training target. Used for feature engineering in time series.
objectives (list[ObjectiveBase]) – List of objectives to score on.
- Returns
Computation result.
- Return type
SequentialComputation
-
submit_training_job
(self, automl_config, pipeline, X, y)[source]¶ Submit a job to train a pipeline.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to evaluate.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
- Returns
Computation result.
- Return type
SequentialComputation
-
static
-
evalml.automl.engine.
train_and_score_pipeline
(pipeline, automl_config, full_X_train, full_y_train, logger)[source]¶ Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores.
- Parameters
pipeline (PipelineBase) – The pipeline to score.
automl_config (AutoMLSearch) – The AutoMLSearch object, used to access config and the error callback.
full_X_train (pd.DataFrame) – Training features.
full_y_train (pd.Series) – Training target.
logger – Logger object to write to.
- Raises
Exception – If there are missing target values in the training set after data split.
- Returns
- First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.
Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.
- Return type
tuple of three items
-
evalml.automl.engine.
train_pipeline
(pipeline, X, y, automl_config, schema=True)[source]¶ Train a pipeline and tune the threshold if necessary.
- Parameters
pipeline (PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Features to train on.
y (pd.Series) – Target to train on.
automl_config (AutoMLSearch) – The AutoMLSearch object, used to access config and the error callback.
schema (bool) – Whether to use the schemas for X and y. Defaults to True.
- Returns
A trained pipeline instance.
- Return type
pipeline (PipelineBase)