utils

Utility methods for EvalML pipelines.

Module Contents

Functions

generate_pipeline_code

Creates and returns a string that contains the Python imports and code required for running the EvalML pipeline.

make_pipeline

Given input data, target data, an estimator class and the problem type, generates a pipeline class with a preprocessing chain which was recommended based on the inputs. The pipeline will be a subclass of the appropriate pipeline base class for the specified problem_type.

make_timeseries_baseline_pipeline

Make a baseline pipeline for time series regression problems.

Attributes Summary

logger

Contents

evalml.pipelines.utils.generate_pipeline_code(element)[source]

Creates and returns a string that contains the Python imports and code required for running the EvalML pipeline.

Parameters

element (pipeline instance) – The instance of the pipeline to generate string Python code.

Returns

String representation of Python code that can be run separately in order to recreate the pipeline instance. Does not include code for custom component implementation.

Return type

str

Raises

ValueError – If element is not a pipeline, or if the pipeline is nonlinear.

evalml.pipelines.utils.logger
evalml.pipelines.utils.make_pipeline(X, y, estimator, problem_type, parameters=None, sampler_name=None, extra_components=None)[source]

Given input data, target data, an estimator class and the problem type, generates a pipeline class with a preprocessing chain which was recommended based on the inputs. The pipeline will be a subclass of the appropriate pipeline base class for the specified problem_type.

Parameters
  • X (pd.DataFrame) – The input data of shape [n_samples, n_features].

  • y (pd.Series) – The target data of length [n_samples].

  • estimator (Estimator) – Estimator for pipeline.

  • problem_type (ProblemTypes or str) – Problem type for pipeline to generate.

  • parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters.

  • sampler_name (str) – The name of the sampler component to add to the pipeline. Only used in classification problems. Defaults to None

  • extra_components (list[ComponentBase]) – List of extra components to be added after preprocessing components. Defaults to None.

Returns

PipelineBase instance with dynamically generated preprocessing components and specified estimator.

Return type

PipelineBase object

Raises

ValueError – If estimator is not valid for the given problem type, or sampling is not supported for the given problem type.

evalml.pipelines.utils.make_timeseries_baseline_pipeline(problem_type, gap, forecast_horizon)[source]

Make a baseline pipeline for time series regression problems.

Parameters
  • problem_type – One of TIME_SERIES_REGRESSION, TIME_SERIES_MULTICLASS, TIME_SERIES_BINARY

  • gap (int) – Non-negative gap parameter.

  • forecast_horizon (int) – Positive forecast_horizon parameter.

Returns

TimeSeriesPipelineBase, a time series pipeline corresponding to the problem type.