engine
==============================

.. py:module:: evalml.automl.engine

.. autoapi-nested-parse::

   EvalML Engine classes used to evaluate pipelines in AutoMLSearch.


Submodules
----------
.. toctree::
   :titlesonly:
   :maxdepth: 1

   cf_engine/index.rst
   dask_engine/index.rst
   engine_base/index.rst
   sequential_engine/index.rst


Package Contents
----------------

Classes Summary
~~~~~~~~~~~~~~~

.. autoapisummary::

   evalml.automl.engine.CFEngine
   evalml.automl.engine.DaskEngine
   evalml.automl.engine.EngineBase
   evalml.automl.engine.EngineComputation
   evalml.automl.engine.SequentialEngine


Functions
~~~~~~~~~

.. autoapisummary::
   :nosignatures:

   evalml.automl.engine.evaluate_pipeline
   evalml.automl.engine.train_and_score_pipeline
   evalml.automl.engine.train_pipeline


Contents
~~~~~~~~~~~~~~~~~~~
.. py:class:: CFEngine(client=None)


   The concurrent.futures (CF) engine.

   :param client: If None, creates a threaded pool for processing. Defaults to None.
   :type client: None or CFClient


   **Methods**

   .. autoapisummary::
      :nosignatures:

      evalml.automl.engine.CFEngine.close
      evalml.automl.engine.CFEngine.is_closed
      evalml.automl.engine.CFEngine.setup_job_log
      evalml.automl.engine.CFEngine.submit_evaluation_job
      evalml.automl.engine.CFEngine.submit_scoring_job
      evalml.automl.engine.CFEngine.submit_training_job

   .. py:method:: close(self)

      Function to properly shutdown the Engine's Client's resources.


   .. py:method:: is_closed(self)
      :property:

      Property that determines whether the Engine's Client's resources are shutdown.


   .. py:method:: setup_job_log()
      :staticmethod:

      Set up logger for job.


   .. py:method:: submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None)

      Send evaluation job to cluster.

      :param automl_config: Structure containing data passed from AutoMLSearch instance.
      :param pipeline: Pipeline to evaluate.
      :type pipeline: pipeline.PipelineBase
      :param X: Input data for modeling.
      :type X: pd.DataFrame
      :param y: Target data for modeling.
      :type y: pd.Series
      :param X_holdout: Holdout input data for holdout scoring.
      :type X_holdout: pd.Series
      :param y_holdout: Holdout target data for holdout scoring.
      :type y_holdout: pd.Series

      :returns:

                An object wrapping a reference to a future-like computation
                    occurring in the resource pool
      :rtype: CFComputation


   .. py:method:: submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)

      Send scoring job to cluster.

      :param automl_config: Structure containing data passed from AutoMLSearch instance.
      :param pipeline: Pipeline to train.
      :type pipeline: pipeline.PipelineBase
      :param X: Input data for modeling.
      :type X: pd.DataFrame
      :param y: Target data for modeling.
      :type y: pd.Series
      :param X_train: Training features. Used for feature engineering in time series.
      :type X_train: pd.DataFrame
      :param y_train: Training target. Used for feature engineering in time series.
      :type y_train: pd.Series
      :param objectives: Objectives to score on.
      :type objectives: list[ObjectiveBase]

      :returns:

                An object wrapping a reference to a future-like computation
                    occurring in the resource pool.
      :rtype: CFComputation


   .. py:method:: submit_training_job(self, automl_config, pipeline, X, y)

      Send training job to cluster.

      :param automl_config: Structure containing data passed from AutoMLSearch instance.
      :param pipeline: Pipeline to train.
      :type pipeline: pipeline.PipelineBase
      :param X: Input data for modeling.
      :type X: pd.DataFrame
      :param y: Target data for modeling.
      :type y: pd.Series

      :returns:

                An object wrapping a reference to a future-like computation
                    occurring in the resource pool
      :rtype: CFComputation


.. py:class:: DaskEngine(cluster=None)


   The dask engine.

   :param cluster: If None, creates a local, threaded Dask client for processing.
                   Defaults to None.
   :type cluster: None or dd.Client


   **Methods**

   .. autoapisummary::
      :nosignatures:

      evalml.automl.engine.DaskEngine.close
      evalml.automl.engine.DaskEngine.is_closed
      evalml.automl.engine.DaskEngine.send_data_to_cluster
      evalml.automl.engine.DaskEngine.setup_job_log
      evalml.automl.engine.DaskEngine.submit_evaluation_job
      evalml.automl.engine.DaskEngine.submit_scoring_job
      evalml.automl.engine.DaskEngine.submit_training_job

   .. py:method:: close(self)

      Closes the underlying cluster.


   .. py:method:: is_closed(self)
      :property:

      Property that determines whether the Engine's Client's resources are shutdown.


   .. py:method:: send_data_to_cluster(self, X, y)

      Send data to the cluster.

      The implementation uses caching so the data is only sent once. This follows
      dask best practices.

      :param X: Input data for modeling.
      :type X: pd.DataFrame
      :param y: Target data for modeling.
      :type y: pd.Series

      :returns: The modeling data.
      :rtype: dask.Future


   .. py:method:: setup_job_log()
      :staticmethod:

      Set up logger for job.


   .. py:method:: submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None)

      Send evaluation job to cluster.

      :param automl_config: Structure containing data passed from AutoMLSearch instance.
      :param pipeline: Pipeline to evaluate.
      :type pipeline: pipeline.PipelineBase
      :param X: Input data for modeling.
      :type X: pd.DataFrame
      :param y: Target data for modeling.
      :type y: pd.Series
      :param X_holdout: Holdout input data for holdout scoring.
      :type X_holdout: pd.Series
      :param y_holdout: Holdout target data for holdout scoring.
      :type y_holdout: pd.Series

      :returns:

                An object wrapping a reference to a future-like computation
                    occurring in the dask cluster.
      :rtype: DaskComputation


   .. py:method:: submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)

      Send scoring job to cluster.

      :param automl_config: Structure containing data passed from AutoMLSearch instance.
      :param pipeline: Pipeline to train.
      :type pipeline: pipeline.PipelineBase
      :param X: Input data for modeling.
      :type X: pd.DataFrame
      :param y: Target data for modeling.
      :type y: pd.Series
      :param X_train: Training features. Used for feature engineering in time series.
      :type X_train: pd.DataFrame
      :param y_train: Training target. Used for feature engineering in time series.
      :type y_train: pd.Series
      :param objectives: List of objectives to score on.
      :type objectives: list[ObjectiveBase]

      :returns:

                An object wrapping a reference to a future-like computation
                    occurring in the dask cluster.
      :rtype: DaskComputation


   .. py:method:: submit_training_job(self, automl_config, pipeline, X, y)

      Send training job to cluster.

      :param automl_config: Structure containing data passed from AutoMLSearch instance.
      :param pipeline: Pipeline to train.
      :type pipeline: pipeline.PipelineBase
      :param X: Input data for modeling.
      :type X: pd.DataFrame
      :param y: Target data for modeling.
      :type y: pd.Series

      :returns:

                An object wrapping a reference to a future-like computation
                    occurring in the dask cluster.
      :rtype: DaskComputation


.. py:class:: EngineBase


   Base class for EvalML engines.


   **Methods**

   .. autoapisummary::
      :nosignatures:

      evalml.automl.engine.EngineBase.setup_job_log
      evalml.automl.engine.EngineBase.submit_evaluation_job
      evalml.automl.engine.EngineBase.submit_scoring_job
      evalml.automl.engine.EngineBase.submit_training_job

   .. py:method:: setup_job_log()
      :staticmethod:

      Set up logger for job.


   .. py:method:: submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None)
      :abstractmethod:

      Submit job for pipeline evaluation during AutoMLSearch.


   .. py:method:: submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)
      :abstractmethod:

      Submit job for pipeline scoring.


   .. py:method:: submit_training_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None)
      :abstractmethod:

      Submit job for pipeline training.


.. py:class:: EngineComputation


   Wrapper around the result of a (possibly asynchronous) engine computation.


   **Methods**

   .. autoapisummary::
      :nosignatures:

      evalml.automl.engine.EngineComputation.cancel
      evalml.automl.engine.EngineComputation.done
      evalml.automl.engine.EngineComputation.get_result

   .. py:method:: cancel(self)
      :abstractmethod:

      Cancel the computation.


   .. py:method:: done(self)
      :abstractmethod:

      Whether the computation is done.


   .. py:method:: get_result(self)
      :abstractmethod:

      Gets the computation result. Will block until the computation is finished.

      Raises Exception: If computation fails. Returns traceback.


.. py:function:: evaluate_pipeline(pipeline, automl_config, X, y, logger, X_holdout=None, y_holdout=None)

   Function submitted to the submit_evaluation_job engine method.

   :param pipeline: The pipeline to score.
   :type pipeline: PipelineBase
   :param automl_config: The AutoMLSearch object, used to access config and the error callback.
   :type automl_config: AutoMLConfig
   :param X: Training features.
   :type X: pd.DataFrame
   :param y: Training target.
   :type y: pd.Series
   :param logger: Logger object to write to.
   :param X_holdout: Holdout set features.
   :type X_holdout: pd.DataFrame
   :param y_holdout: Holdout set target.
   :type y_holdout: pd.DataFrame

   :returns:

             First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.
                 Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.
   :rtype: tuple of three items


.. py:class:: SequentialEngine


   The default engine for the AutoML search.

   Trains and scores pipelines locally and sequentially.


   **Methods**

   .. autoapisummary::
      :nosignatures:

      evalml.automl.engine.SequentialEngine.close
      evalml.automl.engine.SequentialEngine.setup_job_log
      evalml.automl.engine.SequentialEngine.submit_evaluation_job
      evalml.automl.engine.SequentialEngine.submit_scoring_job
      evalml.automl.engine.SequentialEngine.submit_training_job

   .. py:method:: close(self)

      No-op.


   .. py:method:: setup_job_log()
      :staticmethod:

      Set up logger for job.


   .. py:method:: submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None)

      Submit a job to evaluate a pipeline.

      :param automl_config: Structure containing data passed from AutoMLSearch instance.
      :param pipeline: Pipeline to evaluate.
      :type pipeline: pipeline.PipelineBase
      :param X: Input data for modeling.
      :type X: pd.DataFrame
      :param y: Target data for modeling.
      :type y: pd.Series
      :param X_holdout: Holdout input data for holdout scoring.
      :type X_holdout: pd.Series
      :param y_holdout: Holdout target data for holdout scoring.
      :type y_holdout: pd.Series

      :returns: Computation result.
      :rtype: SequentialComputation


   .. py:method:: submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None)

      Submit a job to score a pipeline.

      :param automl_config: Structure containing data passed from AutoMLSearch instance.
      :param pipeline: Pipeline to train.
      :type pipeline: pipeline.PipelineBase
      :param X: Input data for modeling.
      :type X: pd.DataFrame
      :param y: Target data for modeling.
      :type y: pd.Series
      :param X_train: Training features. Used for feature engineering in time series.
      :type X_train: pd.DataFrame
      :param y_train: Training target. Used for feature engineering in time series.
      :type y_train: pd.Series
      :param objectives: List of objectives to score on.
      :type objectives: list[ObjectiveBase]

      :returns: Computation result.
      :rtype: SequentialComputation


   .. py:method:: submit_training_job(self, automl_config, pipeline, X, y)

      Submit a job to train a pipeline.

      :param automl_config: Structure containing data passed from AutoMLSearch instance.
      :param pipeline: Pipeline to evaluate.
      :type pipeline: pipeline.PipelineBase
      :param X: Input data for modeling.
      :type X: pd.DataFrame
      :param y: Target data for modeling.
      :type y: pd.Series

      :returns: Computation result.
      :rtype: SequentialComputation


.. py:function:: train_and_score_pipeline(pipeline, automl_config, full_X_train, full_y_train, logger, X_holdout=None, y_holdout=None)

   Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores.

   :param pipeline: The pipeline to score.
   :type pipeline: PipelineBase
   :param automl_config: The AutoMLSearch object, used to access config and the error callback.
   :type automl_config: AutoMLSearch
   :param full_X_train: Training features.
   :type full_X_train: pd.DataFrame
   :param full_y_train: Training target.
   :type full_y_train: pd.Series
   :param logger: Logger object to write to.
   :param X_holdout: Holdout set features.
   :type X_holdout: pd.DataFrame
   :param y_holdout: Holdout set target.
   :type y_holdout: pd.DataFrame

   :raises Exception: If there are missing target values in the training set after data split.

   :returns:

             First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details.
                 Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages.
   :rtype: tuple of three items


.. py:function:: train_pipeline(pipeline, X, y, automl_config, schema=True, get_hashes=False)

   Train a pipeline and tune the threshold if necessary.

   :param pipeline: Pipeline to train.
   :type pipeline: PipelineBase
   :param X: Features to train on.
   :type X: pd.DataFrame
   :param y: Target to train on.
   :type y: pd.Series
   :param automl_config: The AutoMLSearch object, used to access config and the error callback.
   :type automl_config: AutoMLSearch
   :param schema: Whether to use the schemas for X and y. Defaults to True.
   :type schema: bool
   :param get_hashes: Whether to return the hashes of the data used to train (and potentially threshold). Defaults to False
   :type get_hashes: bool

   :returns: A trained pipeline instance.
             hash (optional): The hash of the input data indices, only returned when get_hashes is True.
   :rtype: pipeline (PipelineBase)