component_graph ========================================== .. py:module:: evalml.pipelines.component_graph .. autoapi-nested-parse:: Component graph for a pipeline as a directed acyclic graph (DAG). Module Contents --------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.pipelines.component_graph.ComponentGraph Attributes Summary ~~~~~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.pipelines.component_graph.logger Contents ~~~~~~~~~~~~~~~~~~~ .. py:class:: ComponentGraph(component_dict=None, cached_data=None, random_seed=0) Component graph for a pipeline as a directed acyclic graph (DAG). :param component_dict: A dictionary which specifies the components and edges between components that should be used to create the component graph. Defaults to None. :type component_dict: dict :param cached_data: A dictionary of nested cached data. If the hashes and components are in this cache, we skip fitting for these components. Expected to be of format {hash1: {component_name: trained_component, ...}, hash2: {...}, ...}. Defaults to None. :type cached_data: dict :param random_seed: Seed for the random number generator. Defaults to 0. :type random_seed: int .. rubric:: Examples >>> component_dict = {'Imputer': ['Imputer', 'X', 'y'], ... 'Logistic Regression': ['Logistic Regression Classifier', 'Imputer.x', 'y']} >>> component_graph = ComponentGraph(component_dict) >>> assert component_graph.compute_order == ['Imputer', 'Logistic Regression'] ... ... >>> component_dict = {'Imputer': ['Imputer', 'X', 'y'], ... 'OHE': ['One Hot Encoder', 'Imputer.x', 'y'], ... 'estimator_1': ['Random Forest Classifier', 'OHE.x', 'y'], ... 'estimator_2': ['Decision Tree Classifier', 'OHE.x', 'y'], ... 'final': ['Logistic Regression Classifier', 'estimator_1.x', 'estimator_2.x', 'y']} >>> component_graph = ComponentGraph(component_dict) The default parameters for every component in the component graph. >>> assert component_graph.default_parameters == { ... 'Imputer': {'categorical_impute_strategy': 'most_frequent', ... 'numeric_impute_strategy': 'mean', ... 'boolean_impute_strategy': 'most_frequent', ... 'categorical_fill_value': None, ... 'numeric_fill_value': None, ... 'boolean_fill_value': None}, ... 'One Hot Encoder': {'top_n': 10, ... 'features_to_encode': None, ... 'categories': None, ... 'drop': 'if_binary', ... 'handle_unknown': 'ignore', ... 'handle_missing': 'error'}, ... 'Random Forest Classifier': {'n_estimators': 100, ... 'max_depth': 6, ... 'n_jobs': -1}, ... 'Decision Tree Classifier': {'criterion': 'gini', ... 'max_features': 'sqrt', ... 'max_depth': 6, ... 'min_samples_split': 2, ... 'min_weight_fraction_leaf': 0.0}, ... 'Logistic Regression Classifier': {'penalty': 'l2', ... 'C': 1.0, ... 'n_jobs': -1, ... 'multi_class': 'auto', ... 'solver': 'lbfgs'}} **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.component_graph.ComponentGraph.compute_order evalml.pipelines.component_graph.ComponentGraph.default_parameters evalml.pipelines.component_graph.ComponentGraph.describe evalml.pipelines.component_graph.ComponentGraph.fit evalml.pipelines.component_graph.ComponentGraph.fit_and_transform_all_but_final evalml.pipelines.component_graph.ComponentGraph.fit_transform evalml.pipelines.component_graph.ComponentGraph.generate_order evalml.pipelines.component_graph.ComponentGraph.get_component evalml.pipelines.component_graph.ComponentGraph.get_component_input_logical_types evalml.pipelines.component_graph.ComponentGraph.get_estimators evalml.pipelines.component_graph.ComponentGraph.get_inputs evalml.pipelines.component_graph.ComponentGraph.get_last_component evalml.pipelines.component_graph.ComponentGraph.graph evalml.pipelines.component_graph.ComponentGraph.has_dfs evalml.pipelines.component_graph.ComponentGraph.instantiate evalml.pipelines.component_graph.ComponentGraph.inverse_transform evalml.pipelines.component_graph.ComponentGraph.last_component_input_logical_types evalml.pipelines.component_graph.ComponentGraph.predict evalml.pipelines.component_graph.ComponentGraph.transform evalml.pipelines.component_graph.ComponentGraph.transform_all_but_final .. py:method:: compute_order(self) :property: The order that components will be computed or called in. .. py:method:: default_parameters(self) :property: The default parameter dictionary for this pipeline. :returns: Dictionary of all component default parameters. :rtype: dict .. py:method:: describe(self, return_dict=False) Outputs component graph details including component parameters. :param return_dict: If True, return dictionary of information about component graph. Defaults to False. :type return_dict: bool :returns: Dictionary of all component parameters if return_dict is True, else None :rtype: dict :raises ValueError: If the componentgraph is not instantiated .. py:method:: fit(self, X, y) Fit each component in the graph. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: self .. py:method:: fit_and_transform_all_but_final(self, X, y) Fit and transform all components save the final one, usually an estimator. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. :type y: pd.Series :returns: Transformed features and target. :rtype: Tuple (pd.DataFrame, pd.Series) .. py:method:: fit_transform(self, X, y) Fit and transform all components in the component graph, if all components are Transformers. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is an Estimator. .. py:method:: generate_order(cls, component_dict) :classmethod: Regenerated the topologically sorted order of the graph. .. py:method:: get_component(self, component_name) Retrieves a single component object from the graph. :param component_name: Name of the component to retrieve :type component_name: str :returns: ComponentBase object :raises ValueError: If the component is not in the graph. .. py:method:: get_component_input_logical_types(self, component_name) Get the logical types that are passed to the given component. :param component_name: Name of component in the graph :type component_name: str :returns: Dict - Mapping feature name to logical type instance. :raises ValueError: If the component is not in the graph. :raises ValueError: If the component graph as not been fitted .. py:method:: get_estimators(self) Gets a list of all the estimator components within this graph. :returns: All estimator objects within the graph. :rtype: list :raises ValueError: If the component graph is not yet instantiated. .. py:method:: get_inputs(self, component_name) Retrieves all inputs for a given component. :param component_name: Name of the component to look up. :type component_name: str :returns: List of inputs for the component to use. :rtype: list[str] :raises ValueError: If the component is not in the graph. .. py:method:: get_last_component(self) Retrieves the component that is computed last in the graph, usually the final estimator. :returns: ComponentBase object :raises ValueError: If the component graph has no edges. .. py:method:: graph(self, name=None, graph_format=None) Generate an image representing the component graph. :param name: Name of the graph. Defaults to None. :type name: str :param graph_format: file format to save the graph in. Defaults to None. :type graph_format: str :returns: Graph object that can be directly displayed in Jupyter notebooks. :rtype: graphviz.Digraph :raises RuntimeError: If graphviz is not installed. .. py:method:: has_dfs(self) :property: Whether this component graph contains a DFSTransformer or not. .. py:method:: instantiate(self, parameters=None) Instantiates all uninstantiated components within the graph using the given parameters. An error will be raised if a component is already instantiated but the parameters dict contains arguments for that component. :param parameters: Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary {} or None implies using all default values for component parameters. If a component in the component graph is already instantiated, it will not use any of its parameters defined in this dictionary. Defaults to None. :type parameters: dict :returns: self :raises ValueError: If component graph is already instantiated or if a component errored while instantiating. .. py:method:: inverse_transform(self, y) Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). :param y: (pd.Series): Final component features. :returns: The target with inverse transformation applied. :rtype: pd.Series .. py:method:: last_component_input_logical_types(self) :property: Get the logical types that are passed to the last component in the pipeline. :returns: Dict - Mapping feature name to logical type instance. :raises ValueError: If the component is not in the graph. :raises ValueError: If the component graph as not been fitted .. py:method:: predict(self, X) Make predictions using selected features. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :returns: Predicted values. :rtype: pd.Series :raises ValueError: If final component is not an Estimator. .. py:method:: transform(self, X, y=None) Transform the input using the component graph. :param X: Input features of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed output. :rtype: pd.DataFrame :raises ValueError: If final component is not a Transformer. .. py:method:: transform_all_but_final(self, X, y=None) Transform all components save the final one, and gathers the data from any number of parents to get all the information that should be fed to the final component. :param X: Data of shape [n_samples, n_features]. :type X: pd.DataFrame :param y: The target training data of length [n_samples]. Defaults to None. :type y: pd.Series :returns: Transformed values. :rtype: pd.DataFrame .. py:data:: logger