engine ============================== .. py:module:: evalml.automl.engine .. autoapi-nested-parse:: EvalML Engine classes used to evaluate pipelines in AutoMLSearch. Submodules ---------- .. toctree:: :titlesonly: :maxdepth: 1 cf_engine/index.rst dask_engine/index.rst engine_base/index.rst sequential_engine/index.rst Package Contents ---------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.automl.engine.CFEngine evalml.automl.engine.DaskEngine evalml.automl.engine.EngineBase evalml.automl.engine.EngineComputation evalml.automl.engine.SequentialEngine Functions ~~~~~~~~~ .. autoapisummary:: :nosignatures: evalml.automl.engine.evaluate_pipeline evalml.automl.engine.train_and_score_pipeline evalml.automl.engine.train_pipeline Contents ~~~~~~~~~~~~~~~~~~~ .. py:class:: CFEngine(client=None) The concurrent.futures (CF) engine. :param client: If None, creates a threaded pool for processing. Defaults to None. :type client: None or CFClient **Methods** .. autoapisummary:: :nosignatures: evalml.automl.engine.CFEngine.close evalml.automl.engine.CFEngine.is_closed evalml.automl.engine.CFEngine.setup_job_log evalml.automl.engine.CFEngine.submit_evaluation_job evalml.automl.engine.CFEngine.submit_scoring_job evalml.automl.engine.CFEngine.submit_training_job .. py:method:: close(self) Function to properly shutdown the Engine's Client's resources. .. py:method:: is_closed(self) :property: Property that determines whether the Engine's Client's resources are shutdown. .. py:method:: setup_job_log() :staticmethod: Set up logger for job. .. py:method:: submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None) Send evaluation job to cluster. :param automl_config: Structure containing data passed from AutoMLSearch instance. :param pipeline: Pipeline to evaluate. :type pipeline: pipeline.PipelineBase :param X: Input data for modeling. :type X: pd.DataFrame :param y: Target data for modeling. :type y: pd.Series :param X_holdout: Holdout input data for holdout scoring. :type X_holdout: pd.Series :param y_holdout: Holdout target data for holdout scoring. :type y_holdout: pd.Series :returns: An object wrapping a reference to a future-like computation occurring in the resource pool :rtype: CFComputation .. py:method:: submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None) Send scoring job to cluster. :param automl_config: Structure containing data passed from AutoMLSearch instance. :param pipeline: Pipeline to train. :type pipeline: pipeline.PipelineBase :param X: Input data for modeling. :type X: pd.DataFrame :param y: Target data for modeling. :type y: pd.Series :param X_train: Training features. Used for feature engineering in time series. :type X_train: pd.DataFrame :param y_train: Training target. Used for feature engineering in time series. :type y_train: pd.Series :param objectives: Objectives to score on. :type objectives: list[ObjectiveBase] :returns: An object wrapping a reference to a future-like computation occurring in the resource pool. :rtype: CFComputation .. py:method:: submit_training_job(self, automl_config, pipeline, X, y) Send training job to cluster. :param automl_config: Structure containing data passed from AutoMLSearch instance. :param pipeline: Pipeline to train. :type pipeline: pipeline.PipelineBase :param X: Input data for modeling. :type X: pd.DataFrame :param y: Target data for modeling. :type y: pd.Series :returns: An object wrapping a reference to a future-like computation occurring in the resource pool :rtype: CFComputation .. py:class:: DaskEngine(cluster=None) The dask engine. :param cluster: If None, creates a local, threaded Dask client for processing. Defaults to None. :type cluster: None or dd.Client **Methods** .. autoapisummary:: :nosignatures: evalml.automl.engine.DaskEngine.close evalml.automl.engine.DaskEngine.is_closed evalml.automl.engine.DaskEngine.send_data_to_cluster evalml.automl.engine.DaskEngine.setup_job_log evalml.automl.engine.DaskEngine.submit_evaluation_job evalml.automl.engine.DaskEngine.submit_scoring_job evalml.automl.engine.DaskEngine.submit_training_job .. py:method:: close(self) Closes the underlying cluster. .. py:method:: is_closed(self) :property: Property that determines whether the Engine's Client's resources are shutdown. .. py:method:: send_data_to_cluster(self, X, y) Send data to the cluster. The implementation uses caching so the data is only sent once. This follows dask best practices. :param X: Input data for modeling. :type X: pd.DataFrame :param y: Target data for modeling. :type y: pd.Series :returns: The modeling data. :rtype: dask.Future .. py:method:: setup_job_log() :staticmethod: Set up logger for job. .. py:method:: submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None) Send evaluation job to cluster. :param automl_config: Structure containing data passed from AutoMLSearch instance. :param pipeline: Pipeline to evaluate. :type pipeline: pipeline.PipelineBase :param X: Input data for modeling. :type X: pd.DataFrame :param y: Target data for modeling. :type y: pd.Series :param X_holdout: Holdout input data for holdout scoring. :type X_holdout: pd.Series :param y_holdout: Holdout target data for holdout scoring. :type y_holdout: pd.Series :returns: An object wrapping a reference to a future-like computation occurring in the dask cluster. :rtype: DaskComputation .. py:method:: submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None) Send scoring job to cluster. :param automl_config: Structure containing data passed from AutoMLSearch instance. :param pipeline: Pipeline to train. :type pipeline: pipeline.PipelineBase :param X: Input data for modeling. :type X: pd.DataFrame :param y: Target data for modeling. :type y: pd.Series :param X_train: Training features. Used for feature engineering in time series. :type X_train: pd.DataFrame :param y_train: Training target. Used for feature engineering in time series. :type y_train: pd.Series :param objectives: List of objectives to score on. :type objectives: list[ObjectiveBase] :returns: An object wrapping a reference to a future-like computation occurring in the dask cluster. :rtype: DaskComputation .. py:method:: submit_training_job(self, automl_config, pipeline, X, y) Send training job to cluster. :param automl_config: Structure containing data passed from AutoMLSearch instance. :param pipeline: Pipeline to train. :type pipeline: pipeline.PipelineBase :param X: Input data for modeling. :type X: pd.DataFrame :param y: Target data for modeling. :type y: pd.Series :returns: An object wrapping a reference to a future-like computation occurring in the dask cluster. :rtype: DaskComputation .. py:class:: EngineBase Base class for EvalML engines. **Methods** .. autoapisummary:: :nosignatures: evalml.automl.engine.EngineBase.setup_job_log evalml.automl.engine.EngineBase.submit_evaluation_job evalml.automl.engine.EngineBase.submit_scoring_job evalml.automl.engine.EngineBase.submit_training_job .. py:method:: setup_job_log() :staticmethod: Set up logger for job. .. py:method:: submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None) :abstractmethod: Submit job for pipeline evaluation during AutoMLSearch. .. py:method:: submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None) :abstractmethod: Submit job for pipeline scoring. .. py:method:: submit_training_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None) :abstractmethod: Submit job for pipeline training. .. py:class:: EngineComputation Wrapper around the result of a (possibly asynchronous) engine computation. **Methods** .. autoapisummary:: :nosignatures: evalml.automl.engine.EngineComputation.cancel evalml.automl.engine.EngineComputation.done evalml.automl.engine.EngineComputation.get_result .. py:method:: cancel(self) :abstractmethod: Cancel the computation. .. py:method:: done(self) :abstractmethod: Whether the computation is done. .. py:method:: get_result(self) :abstractmethod: Gets the computation result. Will block until the computation is finished. Raises Exception: If computation fails. Returns traceback. .. py:function:: evaluate_pipeline(pipeline, automl_config, X, y, logger, X_holdout=None, y_holdout=None) Function submitted to the submit_evaluation_job engine method. :param pipeline: The pipeline to score. :type pipeline: PipelineBase :param automl_config: The AutoMLSearch object, used to access config and the error callback. :type automl_config: AutoMLConfig :param X: Training features. :type X: pd.DataFrame :param y: Training target. :type y: pd.Series :param logger: Logger object to write to. :param X_holdout: Holdout set features. :type X_holdout: pd.DataFrame :param y_holdout: Holdout set target. :type y_holdout: pd.DataFrame :returns: First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details. Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages. :rtype: tuple of three items .. py:class:: SequentialEngine The default engine for the AutoML search. Trains and scores pipelines locally and sequentially. **Methods** .. autoapisummary:: :nosignatures: evalml.automl.engine.SequentialEngine.close evalml.automl.engine.SequentialEngine.setup_job_log evalml.automl.engine.SequentialEngine.submit_evaluation_job evalml.automl.engine.SequentialEngine.submit_scoring_job evalml.automl.engine.SequentialEngine.submit_training_job .. py:method:: close(self) No-op. .. py:method:: setup_job_log() :staticmethod: Set up logger for job. .. py:method:: submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None) Submit a job to evaluate a pipeline. :param automl_config: Structure containing data passed from AutoMLSearch instance. :param pipeline: Pipeline to evaluate. :type pipeline: pipeline.PipelineBase :param X: Input data for modeling. :type X: pd.DataFrame :param y: Target data for modeling. :type y: pd.Series :param X_holdout: Holdout input data for holdout scoring. :type X_holdout: pd.Series :param y_holdout: Holdout target data for holdout scoring. :type y_holdout: pd.Series :returns: Computation result. :rtype: SequentialComputation .. py:method:: submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None) Submit a job to score a pipeline. :param automl_config: Structure containing data passed from AutoMLSearch instance. :param pipeline: Pipeline to train. :type pipeline: pipeline.PipelineBase :param X: Input data for modeling. :type X: pd.DataFrame :param y: Target data for modeling. :type y: pd.Series :param X_train: Training features. Used for feature engineering in time series. :type X_train: pd.DataFrame :param y_train: Training target. Used for feature engineering in time series. :type y_train: pd.Series :param objectives: List of objectives to score on. :type objectives: list[ObjectiveBase] :returns: Computation result. :rtype: SequentialComputation .. py:method:: submit_training_job(self, automl_config, pipeline, X, y) Submit a job to train a pipeline. :param automl_config: Structure containing data passed from AutoMLSearch instance. :param pipeline: Pipeline to evaluate. :type pipeline: pipeline.PipelineBase :param X: Input data for modeling. :type X: pd.DataFrame :param y: Target data for modeling. :type y: pd.Series :returns: Computation result. :rtype: SequentialComputation .. py:function:: train_and_score_pipeline(pipeline, automl_config, full_X_train, full_y_train, logger, X_holdout=None, y_holdout=None) Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores. :param pipeline: The pipeline to score. :type pipeline: PipelineBase :param automl_config: The AutoMLSearch object, used to access config and the error callback. :type automl_config: AutoMLSearch :param full_X_train: Training features. :type full_X_train: pd.DataFrame :param full_y_train: Training target. :type full_y_train: pd.Series :param logger: Logger object to write to. :param X_holdout: Holdout set features. :type X_holdout: pd.DataFrame :param y_holdout: Holdout set target. :type y_holdout: pd.DataFrame :raises Exception: If there are missing target values in the training set after data split. :returns: First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details. Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages. :rtype: tuple of three items .. py:function:: train_pipeline(pipeline, X, y, automl_config, schema=True, get_hashes=False) Train a pipeline and tune the threshold if necessary. :param pipeline: Pipeline to train. :type pipeline: PipelineBase :param X: Features to train on. :type X: pd.DataFrame :param y: Target to train on. :type y: pd.Series :param automl_config: The AutoMLSearch object, used to access config and the error callback. :type automl_config: AutoMLSearch :param schema: Whether to use the schemas for X and y. Defaults to True. :type schema: bool :param get_hashes: Whether to return the hashes of the data used to train (and potentially threshold). Defaults to False :type get_hashes: bool :returns: A trained pipeline instance. hash (optional): The hash of the input data indices, only returned when get_hashes is True. :rtype: pipeline (PipelineBase)