engine#
EvalML Engine classes used to evaluate pipelines in AutoMLSearch.
Submodules#
Package Contents#
Classes Summary#
The concurrent.futures (CF) engine. |
|
The dask engine. |
|
Base class for EvalML engines. |
|
Wrapper around the result of a (possibly asynchronous) engine computation. |
|
The default engine for the AutoML search. |
Functions#
Function submitted to the submit_evaluation_job engine method. |
|
Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores. |
|
Train a pipeline and tune the threshold if necessary. |
Contents#
- class evalml.automl.engine.CFEngine(client=None)[source]#
The concurrent.futures (CF) engine.
- Parameters
client (None or CFClient) – If None, creates a threaded pool for processing. Defaults to None.
Methods
Function to properly shutdown the Engine's Client's resources.
Property that determines whether the Engine's Client's resources are shutdown.
Set up logger for job.
Send evaluation job to cluster.
Send scoring job to cluster.
Send training job to cluster.
- property is_closed(self)#
Property that determines whether the Engine’s Client’s resources are shutdown.
- static setup_job_log()#
Set up logger for job.
- submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None)[source]#
Send evaluation job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to evaluate.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
X_holdout (pd.Series) – Holdout input data for holdout scoring.
y_holdout (pd.Series) – Holdout target data for holdout scoring.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the resource pool
- Return type
CFComputation
- submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)[source]#
Send scoring job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
X_train (pd.DataFrame) – Training features. Used for feature engineering in time series.
y_train (pd.Series) – Training target. Used for feature engineering in time series.
objectives (list[ObjectiveBase]) – Objectives to score on.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the resource pool.
- Return type
CFComputation
- submit_training_job(self, automl_config, pipeline, X, y)[source]#
Send training job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the resource pool
- Return type
CFComputation
- class evalml.automl.engine.DaskEngine(cluster=None)[source]#
The dask engine.
- Parameters
cluster (None or dd.Client) – If None, creates a local, threaded Dask client for processing. Defaults to None.
Methods
Closes the underlying cluster.
Property that determines whether the Engine's Client's resources are shutdown.
Send data to the cluster.
Set up logger for job.
Send evaluation job to cluster.
Send scoring job to cluster.
Send training job to cluster.
- property is_closed(self)#
Property that determines whether the Engine’s Client’s resources are shutdown.
- send_data_to_cluster(self, X, y)[source]#
Send data to the cluster.
The implementation uses caching so the data is only sent once. This follows dask best practices.
- Parameters
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
- Returns
The modeling data.
- Return type
dask.Future
- static setup_job_log()#
Set up logger for job.
- submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None)[source]#
Send evaluation job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to evaluate.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
X_holdout (pd.Series) – Holdout input data for holdout scoring.
y_holdout (pd.Series) – Holdout target data for holdout scoring.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the dask cluster.
- Return type
DaskComputation
- submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)[source]#
Send scoring job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
X_train (pd.DataFrame) – Training features. Used for feature engineering in time series.
y_train (pd.Series) – Training target. Used for feature engineering in time series.
objectives (list[ObjectiveBase]) – List of objectives to score on.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the dask cluster.
- Return type
DaskComputation
- submit_training_job(self, automl_config, pipeline, X, y)[source]#
Send training job to cluster.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
- Returns
- An object wrapping a reference to a future-like computation
occurring in the dask cluster.
- Return type
DaskComputation
- class evalml.automl.engine.EngineBase[source]#
Base class for EvalML engines.
Methods
Set up logger for job.
Submit job for pipeline evaluation during AutoMLSearch.
Submit job for pipeline scoring.
Submit job for pipeline training.
- abstract submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None)[source]#
Submit job for pipeline evaluation during AutoMLSearch.
- class evalml.automl.engine.EngineComputation[source]#
Wrapper around the result of a (possibly asynchronous) engine computation.
Methods
Cancel the computation.
Whether the computation is done.
Gets the computation result. Will block until the computation is finished.
- evalml.automl.engine.evaluate_pipeline(pipeline, automl_config, X, y, logger, X_holdout=None, y_holdout=None)[source]#
Function submitted to the submit_evaluation_job engine method.
- Parameters
pipeline (PipelineBase) – The pipeline to score.
automl_config (AutoMLConfig) – The AutoMLSearch object, used to access config and the error callback.
X (pd.DataFrame) – Training features.
y (pd.Series) – Training target.
logger – Logger object to write to.
X_holdout (pd.DataFrame) – Holdout set features.
y_holdout (pd.DataFrame) – Holdout set target.
- Returns
- First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.
Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.
- Return type
tuple of three items
- class evalml.automl.engine.SequentialEngine[source]#
The default engine for the AutoML search.
Trains and scores pipelines locally and sequentially.
Methods
No-op.
Set up logger for job.
Submit a job to evaluate a pipeline.
Submit a job to score a pipeline.
Submit a job to train a pipeline.
- static setup_job_log()#
Set up logger for job.
- submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None)[source]#
Submit a job to evaluate a pipeline.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to evaluate.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
X_holdout (pd.Series) – Holdout input data for holdout scoring.
y_holdout (pd.Series) – Holdout target data for holdout scoring.
- Returns
Computation result.
- Return type
SequentialComputation
- submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)[source]#
Submit a job to score a pipeline.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
X_train (pd.DataFrame) – Training features. Used for feature engineering in time series.
y_train (pd.Series) – Training target. Used for feature engineering in time series.
objectives (list[ObjectiveBase]) – List of objectives to score on.
- Returns
Computation result.
- Return type
SequentialComputation
- submit_training_job(self, automl_config, pipeline, X, y)[source]#
Submit a job to train a pipeline.
- Parameters
automl_config – Structure containing data passed from AutoMLSearch instance.
pipeline (pipeline.PipelineBase) – Pipeline to evaluate.
X (pd.DataFrame) – Input data for modeling.
y (pd.Series) – Target data for modeling.
- Returns
Computation result.
- Return type
SequentialComputation
- evalml.automl.engine.train_and_score_pipeline(pipeline, automl_config, full_X_train, full_y_train, logger, X_holdout=None, y_holdout=None)[source]#
Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores.
- Parameters
pipeline (PipelineBase) – The pipeline to score.
automl_config (AutoMLSearch) – The AutoMLSearch object, used to access config and the error callback.
full_X_train (pd.DataFrame) – Training features.
full_y_train (pd.Series) – Training target.
logger – Logger object to write to.
X_holdout (pd.DataFrame) – Holdout set features.
y_holdout (pd.DataFrame) – Holdout set target.
- Raises
Exception – If there are missing target values in the training set after data split.
- Returns
- First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.
Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.
- Return type
tuple of three items
- evalml.automl.engine.train_pipeline(pipeline, X, y, automl_config, schema=True, get_hashes=False)[source]#
Train a pipeline and tune the threshold if necessary.
- Parameters
pipeline (PipelineBase) – Pipeline to train.
X (pd.DataFrame) – Features to train on.
y (pd.Series) – Target to train on.
automl_config (AutoMLSearch) – The AutoMLSearch object, used to access config and the error callback.
schema (bool) – Whether to use the schemas for X and y. Defaults to True.
get_hashes (bool) – Whether to return the hashes of the data used to train (and potentially threshold). Defaults to False
- Returns
A trained pipeline instance. hash (optional): The hash of the input data indices, only returned when get_hashes is True.
- Return type
pipeline (PipelineBase)