pipeline_base ======================================== .. py:module:: evalml.pipelines.pipeline_base .. autoapi-nested-parse:: Base machine learning pipeline class. Module Contents --------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.pipelines.pipeline_base.PipelineBase Attributes Summary ~~~~~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.pipelines.pipeline_base.logger Contents ~~~~~~~~~~~~~~~~~~~ .. py:data:: logger .. py:class:: PipelineBase(component_graph, parameters=None, custom_name=None, random_seed=0) Machine learning pipeline. :param component_graph: ComponentGraph instance, list of components in order, or dictionary of components. Accepts strings or ComponentBase subclasses in the list. Note that when duplicate components are specified in a list, the duplicate component names will be modified with the component's index in the list. For example, the component graph [Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names ["Imputer", "One Hot Encoder", "Imputer_2", "Logistic Regression Classifier"]. :type component_graph: ComponentGraph, list, dict :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param custom_name: Custom name for the pipeline. Defaults to None. :type custom_name: str :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int **Attributes** .. list-table:: :widths: 15 85 :header-rows: 0 * - **problem_type** - None **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.pipeline_base.PipelineBase.can_tune_threshold_with_objective evalml.pipelines.pipeline_base.PipelineBase.clone evalml.pipelines.pipeline_base.PipelineBase.create_objectives evalml.pipelines.pipeline_base.PipelineBase.custom_name evalml.pipelines.pipeline_base.PipelineBase.describe evalml.pipelines.pipeline_base.PipelineBase.feature_importance evalml.pipelines.pipeline_base.PipelineBase.fit evalml.pipelines.pipeline_base.PipelineBase.fit_transform evalml.pipelines.pipeline_base.PipelineBase.get_component evalml.pipelines.pipeline_base.PipelineBase.get_hyperparameter_ranges evalml.pipelines.pipeline_base.PipelineBase.graph evalml.pipelines.pipeline_base.PipelineBase.graph_dict evalml.pipelines.pipeline_base.PipelineBase.graph_feature_importance evalml.pipelines.pipeline_base.PipelineBase.inverse_transform evalml.pipelines.pipeline_base.PipelineBase.load evalml.pipelines.pipeline_base.PipelineBase.model_family evalml.pipelines.pipeline_base.PipelineBase.name evalml.pipelines.pipeline_base.PipelineBase.new evalml.pipelines.pipeline_base.PipelineBase.parameters evalml.pipelines.pipeline_base.PipelineBase.predict evalml.pipelines.pipeline_base.PipelineBase.save evalml.pipelines.pipeline_base.PipelineBase.score evalml.pipelines.pipeline_base.PipelineBase.summary evalml.pipelines.pipeline_base.PipelineBase.transform evalml.pipelines.pipeline_base.PipelineBase.transform_all_but_final .. py:method:: can_tune_threshold_with_objective(self, objective) Determine whether the threshold of a binary classification pipeline can be tuned. :param objective: Primary AutoMLSearch objective. :type objective: ObjectiveBase :returns: True if the pipeline threshold can be tuned. :rtype: bool .. py:method:: clone(self) Constructs a new pipeline with the same components, parameters, and random seed. :returns: A new instance of this pipeline with identical components, parameters, and random seed. .. py:method:: create_objectives(objectives) :staticmethod: Create objective instances from a list of strings or objective classes. .. py:method:: custom_name(self) :property: Custom name of the pipeline. .. py:method:: describe(self, return_dict=False) Outputs pipeline details including component parameters. :param return_dict: If True, return dictionary of information about pipeline. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None. :rtype: dict .. py:method:: feature_importance(self) :property: Importance associated with each feature. Features dropped by the feature selection are excluded. :returns: Feature names and their corresponding importance :rtype: pd.DataFrame .. py:method:: fit(self, X, y) :abstractmethod: Build a model. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: The target training data of length [n_samples]. :type y: pd.Series, np.ndarray :returns: self .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: get_component(self, name) Returns component by name. :param name: Name of component. :type name: str :returns: Component to return :rtype: Component .. py:method:: get_hyperparameter_ranges(self, custom_hyperparameters) Returns hyperparameter ranges from all components as a dictionary. :param custom_hyperparameters: Custom hyperparameters for the pipeline. :type custom_hyperparameters: dict :returns: Dictionary of hyperparameter ranges for each component in the pipeline. :rtype: dict .. py:method:: graph(self, filepath=None) Generate an image representing the pipeline graph. :param filepath: Path to where the graph should be saved. If set to None (as by default), the graph will not be saved. :type filepath: str, optional :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. :raises ValueError: If path is not writeable. .. py:method:: graph_dict(self) Generates a dictionary with nodes consisting of the component names and parameters, and edges detailing component relationships. This dictionary is JSON serializable in most cases. x_edges specifies from which component feature data is being passed. y_edges specifies from which component target data is being passed. This can be used to build graphs across a variety of visualization tools. Template: {"Nodes": {"component_name": {"Name": class_name, "Parameters": parameters_attributes}, ...}}, "x_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...], "y_edges": [[from_component_name, to_component_name], [from_component_name, to_component_name], ...]} :returns: A dictionary representing the DAG structure. :rtype: dag_dict (dict) .. py:method:: graph_feature_importance(self, importance_threshold=0) Generate a bar graph of the pipeline's feature importance. :param importance_threshold: If provided, graph features with a permutation importance whose absolute value is larger than importance_threshold. Defaults to zero. :type importance_threshold: float, optional :returns: A bar graph showing features and their corresponding importance. :rtype: plotly.Figure :raises ValueError: If importance threshold is not valid. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: Final component features. :type y: pd.Series :returns: The inverse transform of the target. :rtype: pd.Series .. py:method:: load(file_path: Union[str, io.BytesIO]) :staticmethod: Loads pipeline at file path. :param file_path: load filepath or a BytesIO object. :type file_path: str|BytesIO :returns: PipelineBase object .. py:method:: model_family(self) :property: Returns model family of this pipeline. .. py:method:: name(self) :property: Name of the pipeline. .. py:method:: new(self, parameters, random_seed=0) Constructs a new instance of the pipeline with the same component graph but with a different set of parameters. Not to be confused with python's __new__ method. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary or None implies using all default values for component parameters. Defaults to None. :type parameters: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int :returns: A new instance of this pipeline with identical components. .. py:method:: parameters(self) :property: Parameter dictionary for this pipeline. :returns: Dictionary of all component parameters. :rtype: dict .. py:method:: predict(self, X, objective=None, X_train=None, y_train=None) Make predictions using selected features. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param objective: The objective to use to make predictions. :type objective: Object or string :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series or None :returns: Predicted values. :rtype: pd.Series .. py:method:: save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL) Saves pipeline at file path. :param file_path: Location to save file. :type file_path: str :param pickle_protocol: The pickle data stream format. :type pickle_protocol: int .. py:method:: score(self, X, y, objectives, X_train=None, y_train=None) :abstractmethod: Evaluate model performance on current and additional objectives. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: True labels of length [n_samples]. :type y: pd.Series, np.ndarray :param objectives: Non-empty list of objectives to score on. :type objectives: list :param X_train: Training data. Ignored. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Ignored. Only used for time series. :type y_train: pd.Series or None :returns: Ordered dictionary of objective scores. :rtype: dict .. py:method:: summary(self) :property: A short summary of the pipeline structure, describing the list of components used. Example: Logistic Regression Classifier w/ Simple Imputer + One Hot Encoder :returns: A string describing the pipeline structure. .. py:method:: transform(self, X, y=None) Transform the input. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame, or np.ndarray :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame .. py:method:: transform_all_but_final(self, X, y=None, X_train=None, y_train=None) Transforms the data by applying all pre-processing components. :param X: Input data to the pipeline to transform. :type X: pd.DataFrame :param y: Targets corresponding to X. Optional. :type y: pd.Series or None :param X_train: Training data. Only used for time series. :type X_train: pd.DataFrame or np.ndarray or None :param y_train: Training labels. Only used for time series. :type y_train: pd.Series or None :returns: New transformed features. :rtype: pd.DataFrame