engine_base ========================================== .. py:module:: evalml.automl.engine.engine_base .. autoapi-nested-parse:: Base class for EvalML engines. Module Contents --------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.automl.engine.engine_base.EngineBase evalml.automl.engine.engine_base.EngineComputation evalml.automl.engine.engine_base.JobLogger Functions ~~~~~~~~~ .. autoapisummary:: :nosignatures: evalml.automl.engine.engine_base.evaluate_pipeline evalml.automl.engine.engine_base.score_pipeline evalml.automl.engine.engine_base.train_and_score_pipeline evalml.automl.engine.engine_base.train_pipeline Contents ~~~~~~~~~~~~~~~~~~~ .. py:class:: EngineBase Base class for EvalML engines. **Methods** .. autoapisummary:: :nosignatures: evalml.automl.engine.engine_base.EngineBase.setup_job_log evalml.automl.engine.engine_base.EngineBase.submit_evaluation_job evalml.automl.engine.engine_base.EngineBase.submit_scoring_job evalml.automl.engine.engine_base.EngineBase.submit_training_job .. py:method:: setup_job_log() :staticmethod: Set up logger for job. .. py:method:: submit_evaluation_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None) :abstractmethod: Submit job for pipeline evaluation during AutoMLSearch. .. py:method:: submit_scoring_job(self, automl_config, pipeline, X, y, objectives, X_train=None, y_train=None) :abstractmethod: Submit job for pipeline scoring. .. py:method:: submit_training_job(self, automl_config, pipeline, X, y, X_holdout=None, y_holdout=None) :abstractmethod: Submit job for pipeline training. .. py:class:: EngineComputation Wrapper around the result of a (possibly asynchronous) engine computation. **Methods** .. autoapisummary:: :nosignatures: evalml.automl.engine.engine_base.EngineComputation.cancel evalml.automl.engine.engine_base.EngineComputation.done evalml.automl.engine.engine_base.EngineComputation.get_result .. py:method:: cancel(self) :abstractmethod: Cancel the computation. .. py:method:: done(self) :abstractmethod: Whether the computation is done. .. py:method:: get_result(self) :abstractmethod: Gets the computation result. Will block until the computation is finished. Raises Exception: If computation fails. Returns traceback. .. py:function:: evaluate_pipeline(pipeline, automl_config, X, y, logger, X_holdout=None, y_holdout=None) Function submitted to the submit_evaluation_job engine method. :param pipeline: The pipeline to score. :type pipeline: PipelineBase :param automl_config: The AutoMLSearch object, used to access config and the error callback. :type automl_config: AutoMLConfig :param X: Training features. :type X: pd.DataFrame :param y: Training target. :type y: pd.Series :param logger: Logger object to write to. :param X_holdout: Holdout set features. :type X_holdout: pd.DataFrame :param y_holdout: Holdout set target. :type y_holdout: pd.DataFrame :returns: First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details. Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages. :rtype: tuple of three items .. py:class:: JobLogger Mimic the behavior of a python logging.Logger but stores all messages rather than actually logging them. This is used during engine jobs so that log messages are recorded after the job completes. This is desired so that all of the messages for a single job are grouped together in the log. **Methods** .. autoapisummary:: :nosignatures: evalml.automl.engine.engine_base.JobLogger.debug evalml.automl.engine.engine_base.JobLogger.error evalml.automl.engine.engine_base.JobLogger.info evalml.automl.engine.engine_base.JobLogger.warning evalml.automl.engine.engine_base.JobLogger.write_to_logger .. py:method:: debug(self, msg) Store message at the debug level. .. py:method:: error(self, msg) Store message at the error level. .. py:method:: info(self, msg) Store message at the info level. .. py:method:: warning(self, msg) Store message at the warning level. .. py:method:: write_to_logger(self, logger) Write all the messages to the logger, first in, first out (FIFO) order. .. py:function:: score_pipeline(pipeline, X, y, objectives, X_train=None, y_train=None, X_schema=None, y_schema=None) Wrap around pipeline.score method to make it easy to score pipelines with dask. :param pipeline: The pipeline to score. :type pipeline: PipelineBase :param X: Features to score on. :type X: pd.DataFrame :param y: Target used to calculate scores. :type y: pd.Series :param objectives: List of objectives to score on. :type objectives: list[ObjectiveBase] :param X_train: Training features. Used for feature engineering in time series. :type X_train: pd.DataFrame :param y_train: Training target. Used for feature engineering in time series. :type y_train: pd.Series :param X_schema: Schema for features. Defaults to None. :type X_schema: ww.TableSchema :param y_schema: Schema for columns. Defaults to None. :type y_schema: ww.ColumnSchema :returns: Dictionary object containing pipeline scores. :rtype: dict .. py:function:: train_and_score_pipeline(pipeline, automl_config, full_X_train, full_y_train, logger, X_holdout=None, y_holdout=None) Given a pipeline, config and data, train and score the pipeline and return the CV or TV scores. :param pipeline: The pipeline to score. :type pipeline: PipelineBase :param automl_config: The AutoMLSearch object, used to access config and the error callback. :type automl_config: AutoMLSearch :param full_X_train: Training features. :type full_X_train: pd.DataFrame :param full_y_train: Training target. :type full_y_train: pd.Series :param logger: Logger object to write to. :param X_holdout: Holdout set features. :type X_holdout: pd.DataFrame :param y_holdout: Holdout set target. :type y_holdout: pd.DataFrame :raises Exception: If there are missing target values in the training set after data split. :returns: First - A dict containing cv_score_mean, cv_scores, training_time and a cv_data structure with details. Second - The pipeline class we trained and scored. Third - the job logger instance with all the recorded messages. :rtype: tuple of three items .. py:function:: train_pipeline(pipeline, X, y, automl_config, schema=True, get_hashes=False) Train a pipeline and tune the threshold if necessary. :param pipeline: Pipeline to train. :type pipeline: PipelineBase :param X: Features to train on. :type X: pd.DataFrame :param y: Target to train on. :type y: pd.Series :param automl_config: The AutoMLSearch object, used to access config and the error callback. :type automl_config: AutoMLSearch :param schema: Whether to use the schemas for X and y. Defaults to True. :type schema: bool :param get_hashes: Whether to return the hashes of the data used to train (and potentially threshold). Defaults to False :type get_hashes: bool :returns: A trained pipeline instance. hash (optional): The hash of the input data indices, only returned when get_hashes is True. :rtype: pipeline (PipelineBase)