utils¶

Utility methods for EvalML pipelines.

Module Contents¶

Functions¶

`generate_pipeline_code`	Creates and returns a string that contains the Python imports and code required for running the EvalML pipeline.
`make_pipeline`	Given input data, target data, an estimator class and the problem type, generates a pipeline class with a preprocessing chain which was recommended based on the inputs. The pipeline will be a subclass of the appropriate pipeline base class for the specified problem_type.
`make_timeseries_baseline_pipeline`	Make a baseline pipeline for time series regression problems.

Attributes Summary¶

logger

Contents¶

evalml.pipelines.utils.generate_pipeline_code(element)[source]¶

Creates and returns a string that contains the Python imports and code required for running the EvalML pipeline.

Parameters: element (pipeline instance) – The instance of the pipeline to generate string Python code.
Returns: String representation of Python code that can be run separately in order to recreate the pipeline instance. Does not include code for custom component implementation.
Return type: str
Raises: ValueError – If element is not a pipeline, or if the pipeline is nonlinear.

evalml.pipelines.utils.logger¶

evalml.pipelines.utils.make_pipeline(X, y, estimator, problem_type, parameters=None, sampler_name=None, extra_components=None)[source]¶

Given input data, target data, an estimator class and the problem type, generates a pipeline class with a preprocessing chain which was recommended based on the inputs. The pipeline will be a subclass of the appropriate pipeline base class for the specified problem_type.

Parameters

X (pd.DataFrame) – The input data of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples].
estimator (Estimator) – Estimator for pipeline.
problem_type (ProblemTypes or str) – Problem type for pipeline to generate.
parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary or None implies using all default values for component parameters.
sampler_name (str) – The name of the sampler component to add to the pipeline. Only used in classification problems. Defaults to None
extra_components (list[ComponentBase]) – List of extra components to be added after preprocessing components. Defaults to None.

Returns

PipelineBase instance with dynamically generated preprocessing components and specified estimator.

Return type

PipelineBase object

Raises

ValueError – If estimator is not valid for the given problem type, or sampling is not supported for the given problem type.

evalml.pipelines.utils.make_timeseries_baseline_pipeline(problem_type, gap, forecast_horizon)[source]¶

Make a baseline pipeline for time series regression problems.

Parameters

problem_type – One of TIME_SERIES_REGRESSION, TIME_SERIES_MULTICLASS, TIME_SERIES_BINARY
gap (int) – Non-negative gap parameter.
forecast_horizon (int) – Positive forecast_horizon parameter.

Returns

TimeSeriesPipelineBase, a time series pipeline corresponding to the problem type.

time_series_regression_pipeline

Preprocessing