"""Component graph for a pipeline as a directed acyclic graph (DAG)."""importinspectimportwarningsimportnetworkxasnximportpandasaspdimportwoodworkaswwfromnetworkx.algorithms.dagimporttopological_sortfromnetworkx.exceptionimportNetworkXUnfeasiblefromevalml.exceptions.exceptionsimport(MethodPropertyNotFoundError,MissingComponentError,ParameterNotUsedWarning,PipelineError,PipelineErrorCodeEnum,)fromevalml.pipelines.componentsimport(ComponentBase,DFSTransformer,Estimator,Transformer,)fromevalml.pipelines.components.utilsimporthandle_component_classfromevalml.utilsimport(_schema_is_equal,get_logger,import_or_raise,infer_feature_types,)logger=get_logger(__file__)
[docs]classComponentGraph:"""Component graph for a pipeline as a directed acyclic graph (DAG). Args: component_dict (dict): A dictionary which specifies the components and edges between components that should be used to create the component graph. Defaults to None. cached_data (dict): A dictionary of nested cached data. If the hashes and components are in this cache, we skip fitting for these components. Expected to be of format {hash1: {component_name: trained_component, ...}, hash2: {...}, ...}. Defaults to None. random_seed (int): Seed for the random number generator. Defaults to 0. Examples: >>> component_dict = {'Imputer': ['Imputer', 'X', 'y'], ... 'Logistic Regression': ['Logistic Regression Classifier', 'Imputer.x', 'y']} >>> component_graph = ComponentGraph(component_dict) >>> assert component_graph.compute_order == ['Imputer', 'Logistic Regression'] ... ... >>> component_dict = {'Imputer': ['Imputer', 'X', 'y'], ... 'OHE': ['One Hot Encoder', 'Imputer.x', 'y'], ... 'estimator_1': ['Random Forest Classifier', 'OHE.x', 'y'], ... 'estimator_2': ['Decision Tree Classifier', 'OHE.x', 'y'], ... 'final': ['Logistic Regression Classifier', 'estimator_1.x', 'estimator_2.x', 'y']} >>> component_graph = ComponentGraph(component_dict) The default parameters for every component in the component graph. >>> assert component_graph.default_parameters == { ... 'Imputer': {'categorical_impute_strategy': 'most_frequent', ... 'numeric_impute_strategy': 'mean', ... 'boolean_impute_strategy': 'most_frequent', ... 'categorical_fill_value': None, ... 'numeric_fill_value': None, ... 'boolean_fill_value': None}, ... 'One Hot Encoder': {'top_n': 10, ... 'features_to_encode': None, ... 'categories': None, ... 'drop': 'if_binary', ... 'handle_unknown': 'ignore', ... 'handle_missing': 'error'}, ... 'Random Forest Classifier': {'n_estimators': 100, ... 'max_depth': 6, ... 'n_jobs': -1}, ... 'Decision Tree Classifier': {'criterion': 'gini', ... 'max_features': 'sqrt', ... 'max_depth': 6, ... 'min_samples_split': 2, ... 'min_weight_fraction_leaf': 0.0}, ... 'Logistic Regression Classifier': {'penalty': 'l2', ... 'C': 1.0, ... 'n_jobs': -1, ... 'multi_class': 'auto', ... 'solver': 'lbfgs'}} """def__init__(self,component_dict=None,cached_data=None,random_seed=0):self.random_seed=random_seedself.component_dict=component_dictor{}ifnotisinstance(self.component_dict,dict):raiseValueError("component_dict must be a dictionary which specifies the components and edges between components",)self._validate_component_dict()self.cached_data=cached_dataself.component_instances={}self._is_instantiated=Falseforcomponent_name,component_infoinself.component_dict.items():component_class=handle_component_class(component_info[0])self.component_instances[component_name]=component_classself._validate_component_dict_edges()self.input_feature_names={}self._feature_provenance={}self._feature_logical_types={}self._i=0self._compute_order=self.generate_order(self.component_dict)self._input_types={}def_validate_component_dict(self):for_,component_inputsinself.component_dict.items():ifnotisinstance(component_inputs,list):raiseValueError("All component information should be passed in as a list",)def_validate_component_dict_edges(self):for_,component_inputsinself.component_dict.items():component_inputs=component_inputs[1:]has_feature_input=any(component_input.endswith(".x")orcomponent_input=="X"forcomponent_inputincomponent_inputs)num_target_inputs=sum(component_input.endswith(".y")orcomponent_input=="y"forcomponent_inputincomponent_inputs)ifnothas_feature_input:raiseValueError("All components must have at least one input feature (.x/X) edge.",)ifnum_target_inputs!=1:raiseValueError("All components must have exactly one target (.y/y) edge.",)defcheck_all_inputs_have_correct_syntax(edge):returnnot(edge.endswith(".y")oredge=="y"oredge.endswith(".x")oredge=="X")if(len(list(filter(check_all_inputs_have_correct_syntax,component_inputs),),)!=0):raiseValueError("All edges must be specified as either an input feature ('X'/.x) or input target ('y'/.y).",)target_inputs=[componentforcomponentincomponent_inputsif(component.endswith(".y"))]iftarget_inputs:target_component_name=target_inputs[0][:-2]target_component_class=self.get_component(target_component_name)ifnottarget_component_class.modifies_target:raiseValueError(f"{target_inputs[0]} is not a valid input edge because {target_component_name} does not return a target.",)@propertydefcompute_order(self):"""The order that components will be computed or called in."""returnself._compute_order@propertydefdefault_parameters(self):"""The default parameter dictionary for this pipeline. Returns: dict: Dictionary of all component default parameters. """defaults={}forcomponentinself.component_instances.values():ifcomponent.default_parameters:defaults[component.name]=component.default_parametersreturndefaults@propertydefhas_dfs(self):"""Whether this component graph contains a DFSTransformer or not."""returnany(isinstance(c,DFSTransformer)forcinself.component_instances.values())
[docs]definstantiate(self,parameters=None):"""Instantiates all uninstantiated components within the graph using the given parameters. An error will be raised if a component is already instantiated but the parameters dict contains arguments for that component. Args: parameters (dict): Dictionary with component names as keys and dictionary of that component's parameters as values. An empty dictionary {} or None implies using all default values for component parameters. If a component in the component graph is already instantiated, it will not use any of its parameters defined in this dictionary. Defaults to None. Returns: self Raises: ValueError: If component graph is already instantiated or if a component errored while instantiating. """ifself._is_instantiated:raiseValueError("Cannot reinstantiate a component graph that was previously instantiated",)parameters=parametersor{}param_set=set(sforsinparameters.keys()ifsnotin["pipeline"])diff=param_set.difference(set(self.component_instances.keys()))iflen(diff):warnings.warn(ParameterNotUsedWarning(diff))self._is_instantiated=Truecomponent_instances={}forcomponent_name,component_classinself.component_instances.items():component_parameters=parameters.get(component_name,{})ifinspect.isclass(component_class):try:new_component=component_class(**component_parameters,random_seed=self.random_seed)except(ValueError,TypeError)ase:self._is_instantiated=Falseerr="Error received when instantiating component {} with the following arguments {}".format(component_name,component_parameters,)raiseValueError(err)fromecomponent_instances[component_name]=new_componentelifisinstance(component_class,ComponentBase):component_instances[component_name]=component_classself.component_instances=component_instancesreturnself
[docs]deffit(self,X,y):"""Fit each component in the graph. Args: X (pd.DataFrame): The input training data of shape [n_samples, n_features]. y (pd.Series): The target training data of length [n_samples]. Returns: self """X=infer_feature_types(X)y=infer_feature_types(y)self._transform_features(self.compute_order,X,y,fit=True)self._feature_provenance=self._get_feature_provenance(X.columns)returnself
[docs]deffit_transform(self,X,y):"""Fit and transform all components in the component graph, if all components are Transformers. Args: X (pd.DataFrame): Input features of shape [n_samples, n_features]. y (pd.Series): The target data of length [n_samples]. Returns: pd.DataFrame: Transformed output. Raises: ValueError: If final component is an Estimator. """final_component_instance=self.get_last_component()ifisinstance(final_component_instance,Estimator):raiseValueError("Cannot call fit_transform() on a component graph because the final component is an Estimator. Use fit_and_transform_all_but_final instead.",)returnself.fit(X,y).transform(X,y)
[docs]deffit_and_transform_all_but_final(self,X,y):"""Fit and transform all components save the final one, usually an estimator. Args: X (pd.DataFrame): The input training data of shape [n_samples, n_features]. y (pd.Series): The target training data of length [n_samples]. Returns: Tuple (pd.DataFrame, pd.Series): Transformed features and target. """returnself._fit_transform_features_helper(True,X,y)
[docs]deftransform_all_but_final(self,X,y=None):"""Transform all components save the final one, and gathers the data from any number of parents to get all the information that should be fed to the final component. Args: X (pd.DataFrame): Data of shape [n_samples, n_features]. y (pd.Series): The target training data of length [n_samples]. Defaults to None. Returns: pd.DataFrame: Transformed values. """features,_=self._fit_transform_features_helper(False,X,y)returnfeatures
def_fit_transform_features_helper(self,needs_fitting,X,y=None):"""Transform (and possibly fit) all components save the final one, and returns the data that should be fed to the final component, usually an estimator. Args: needs_fitting (boolean): Determines if components should be fit. X (pd.DataFrame): Data of shape [n_samples, n_features]. y (pd.Series): The target training data of length [n_samples]. Defaults to None. Returns: Tuple: pd.DataFrame, pd.Series: Transformed features and target. """iflen(self.compute_order)<=1:X=infer_feature_types(X)self.input_feature_names.update({self.compute_order[0]:list(X.columns)})returnX,ycomponent_outputs=self._transform_features(self.compute_order[:-1],X,y=y,fit=needs_fitting,evaluate_training_only_components=needs_fitting,)x_inputs,y_output=self._consolidate_inputs_for_component(component_outputs,self.compute_order[-1],X,y,)ifneeds_fitting:self.input_feature_names.update({self.compute_order[-1]:list(x_inputs.columns)},)returnx_inputs,y_outputdef_consolidate_inputs_for_component(self,component_outputs,component,X,y=None,):x_inputs=[]y_input=Noneforparent_inputinself.get_inputs(component):ifparent_input=="y":y_input=yelifparent_input=="X":x_inputs.append(X)elifparent_input.endswith(".y"):y_input=component_outputs[parent_input]elifparent_input.endswith(".x"):parent_x=component_outputs[parent_input]ifisinstance(parent_x,pd.Series):parent_x=parent_x.rename(parent_input)x_inputs.append(parent_x)x_inputs=ww.concat_columns(x_inputs)returnx_inputs,y_input
[docs]deftransform(self,X,y=None):"""Transform the input using the component graph. Args: X (pd.DataFrame): Input features of shape [n_samples, n_features]. y (pd.Series): The target data of length [n_samples]. Defaults to None. Returns: pd.DataFrame: Transformed output. Raises: ValueError: If final component is not a Transformer. """iflen(self.compute_order)==0:returninfer_feature_types(X)final_component_name=self.compute_order[-1]final_component_instance=self.get_last_component()ifnotisinstance(final_component_instance,Transformer):raiseValueError("Cannot call transform() on a component graph because the final component is not a Transformer.",)outputs=self._transform_features(self.compute_order,X,y,fit=False,evaluate_training_only_components=True,)output_x=infer_feature_types(outputs.get(f"{final_component_name}.x"))output_y=outputs.get(f"{final_component_name}.y",None)ifoutput_yisnotNone:returnoutput_x,output_yreturnoutput_x
[docs]defpredict(self,X):"""Make predictions using selected features. Args: X (pd.DataFrame): Input features of shape [n_samples, n_features]. Returns: pd.Series: Predicted values. Raises: ValueError: If final component is not an Estimator. """iflen(self.compute_order)==0:returninfer_feature_types(X)final_component=self.compute_order[-1]final_component_instance=self.get_last_component()ifnotisinstance(final_component_instance,Estimator):raiseValueError("Cannot call predict() on a component graph because the final component is not an Estimator.",)outputs=self._transform_features(self.compute_order,X,evaluate_training_only_components=False,)returninfer_feature_types(outputs.get(f"{final_component}.x"))
def_return_non_engineered_features(self,X):base_features=[cforcinX.ww.columnsifX.ww[c].ww.origin=="base"orX.ww[c].ww.originisNone]returnX.ww[base_features]def_transform_features(self,component_list,X,y=None,fit=False,evaluate_training_only_components=False,):"""Transforms the data by applying the given components. Args: component_list (list): The list of component names to compute. X (pd.DataFrame): Input data to the pipeline to transform. y (pd.Series): The target training data of length [n_samples]. fit (boolean): Whether to fit the estimators as well as transform it. Defaults to False. evaluate_training_only_components (boolean): Whether to evaluate training-only components (such as the samplers) during transform or predict. Defaults to False. Returns: dict: Outputs from each component. Raises: PipelineError: if input data types are different from the input types the pipeline was fitted on """X=infer_feature_types(X)ifnotfit:X_schema=(self._return_non_engineered_features(X).ww.schemaifself.has_dfselseX.ww.schema)ifnot_schema_is_equal(X_schema,self._input_types):raisePipelineError("Input X data types are different from the input types the pipeline was fitted on.",code=PipelineErrorCodeEnum.PREDICT_INPUT_SCHEMA_UNEQUAL,details={"input_features_types":X_schema.types,"pipeline_features_types":self._input_types.types,},)else:self._input_types=(self._return_non_engineered_features(X).ww.schemaifself.has_dfselseX.ww.schema)ifyisnotNone:y=infer_feature_types(y)iflen(component_list)==0:returnXhashes=Noneifself.cached_dataisnotNone:hashes=hash(tuple(X.index))output_cache={}forcomponent_nameincomponent_list:component_instance=self._get_component_from_cache(hashes,component_name,fit,)ifnotisinstance(component_instance,ComponentBase):raiseValueError("All components must be instantiated before fitting or predicting",)x_inputs,y_input=self._consolidate_inputs_for_component(output_cache,component_name,X,y,)self.input_feature_names.update({component_name:list(x_inputs.columns)})self._feature_logical_types[component_name]=x_inputs.ww.logical_typesifisinstance(component_instance,Transformer):iffit:ifcomponent_instance._is_fitted:output=component_instance.transform(x_inputs,y_input)else:output=component_instance.fit_transform(x_inputs,y_input)elif(component_instance.training_onlyandevaluate_training_only_componentsisFalse):output=x_inputs,y_inputelse:output=component_instance.transform(x_inputs,y_input)ifisinstance(output,tuple):output_x,output_y=output[0],output[1]else:output_x=outputoutput_y=Noneoutput_cache[f"{component_name}.x"]=output_xoutput_cache[f"{component_name}.y"]=output_yelse:iffitandnotcomponent_instance._is_fitted:component_instance.fit(x_inputs,y_input)iffitandcomponent_name==self.compute_order[-1]:# Don't call predict on the final component during fitoutput=Noneelifcomponent_name!=self.compute_order[-1]:try:output=component_instance.predict_proba(x_inputs)ifisinstance(output,pd.DataFrame):iflen(output.columns)==2:# If it is a binary problem, drop the first column since both columns are colinearoutput=output.ww.drop(output.columns[0])output=output.ww.rename({col:f"Col {str(col)}{component_name}.x"forcolinoutput.columns},)exceptMethodPropertyNotFoundError:output=component_instance.predict(x_inputs)else:output=component_instance.predict(x_inputs)output_cache[f"{component_name}.x"]=outputifself.cached_dataisnotNoneandfit:self.component_instances[component_name]=component_instancereturnoutput_cachedef_get_component_from_cache(self,hashes,component_name,fit):"""Gets either the stacked ensemble component or the component from component_instances."""component_instance=self.get_component(component_name)ifself.cached_dataisnotNoneandfit:try:component_instance=self.cached_data[hashes][component_name]exceptKeyError:passreturncomponent_instancedef_get_feature_provenance(self,input_feature_names):"""Get the feature provenance for each feature in the input_feature_names. The provenance is a mapping from the original feature names in the dataset to a list of features that were created from that original feature. For example, after fitting a OHE on a feature called 'cats', with categories 'a' and 'b', the provenance would have the following entry: {'cats': ['a', 'b']}. If a feature is then calculated from feature 'a', e.g. 'a_squared', then the provenance would instead be {'cats': ['a', 'a_squared', 'b']}. Args: input_feature_names (list(str)): Names of the features in the input dataframe. Returns: dict: Dictionary mapping of feature name to set feature names that were created from that feature. """ifnotself.compute_order:return{}# Every feature comes from one of the original features so# each one starts with an empty setprovenance={col:set([])forcolininput_feature_names}transformers=filter(lambdac:isinstance(c,Transformer),[self.get_component(c)forcinself.compute_order],)forcomponent_instanceintransformers:component_provenance=component_instance._get_feature_provenance()forcomponent_input,component_outputincomponent_provenance.items():# Case 1: The transformer created features from one of the original featuresifcomponent_inputinprovenance:provenance[component_input]=provenance[component_input].union(set(component_output),)# Case 2: The transformer created features from a feature created from an original feature.# Add it to the provenance of the original feature it was created fromelse:forin_feature,out_featureinprovenance.items():ifcomponent_inputinout_feature:provenance[in_feature]=out_feature.union(set(component_output),)# Get rid of features that are not in the dataset the final estimator usesfinal_estimator_features=set(self.input_feature_names.get(self.compute_order[-1],[]),)forfeatureinprovenance:provenance[feature]=provenance[feature].intersection(final_estimator_features,)# Delete features that weren't used to create other featuresreturn{feature:childrenforfeature,childreninprovenance.items()iflen(children)}
[docs]defget_component_input_logical_types(self,component_name):"""Get the logical types that are passed to the given component. Args: component_name (str): Name of component in the graph Returns: Dict - Mapping feature name to logical type instance. Raises: ValueError: If the component is not in the graph. ValueError: If the component graph as not been fitted """ifnotself._feature_logical_types:raiseValueError("Component Graph has not been fit.")ifcomponent_namenotinself._feature_logical_types:raiseValueError(f"Component {component_name} is not in the graph")returnself._feature_logical_types[component_name]
@propertydeflast_component_input_logical_types(self):"""Get the logical types that are passed to the last component in the pipeline. Returns: Dict - Mapping feature name to logical type instance. Raises: ValueError: If the component is not in the graph. ValueError: If the component graph as not been fitted """returnself.get_component_input_logical_types(self.compute_order[-1])
[docs]defget_component(self,component_name):"""Retrieves a single component object from the graph. Args: component_name (str): Name of the component to retrieve Returns: ComponentBase object Raises: ValueError: If the component is not in the graph. """try:returnself.component_instances[component_name]exceptKeyError:raiseValueError(f"Component {component_name} is not in the graph")
[docs]defget_last_component(self):"""Retrieves the component that is computed last in the graph, usually the final estimator. Returns: ComponentBase object Raises: ValueError: If the component graph has no edges. """iflen(self.compute_order)==0:raiseValueError("Cannot get last component from edgeless graph")last_component_name=self.compute_order[-1]returnself.get_component(last_component_name)
[docs]defget_estimators(self):"""Gets a list of all the estimator components within this graph. Returns: list: All estimator objects within the graph. Raises: ValueError: If the component graph is not yet instantiated. """ifnotisinstance(self.get_last_component(),ComponentBase):raiseValueError("Cannot get estimators until the component graph is instantiated",)return[component_classforcomponent_classinself.component_instances.values()ifisinstance(component_class,Estimator)]
[docs]defget_inputs(self,component_name):"""Retrieves all inputs for a given component. Args: component_name (str): Name of the component to look up. Returns: list[str]: List of inputs for the component to use. Raises: ValueError: If the component is not in the graph. """try:component_info=self.component_dict[component_name]exceptKeyError:raiseValueError(f"Component {component_name} not in the graph")iflen(component_info)>1:returncomponent_info[1:]return[]
[docs]defdescribe(self,return_dict=False):"""Outputs component graph details including component parameters. Args: return_dict (bool): If True, return dictionary of information about component graph. Defaults to False. Returns: dict: Dictionary of all component parameters if return_dict is True, else None Raises: ValueError: If the componentgraph is not instantiated """logger=get_logger(f"{__name__}.describe")components={}fornumber,componentinenumerate(self.component_instances.values(),1):component_string=str(number)+". "+component.namelogger.info(component_string)try:components.update({component.name:component.describe(print_name=False,return_dict=return_dict,),},)exceptTypeError:raiseValueError("You need to instantiate ComponentGraph before calling describe()",)ifreturn_dict:returncomponents
[docs]defgraph(self,name=None,graph_format=None):"""Generate an image representing the component graph. Args: name (str): Name of the graph. Defaults to None. graph_format (str): file format to save the graph in. Defaults to None. Returns: graphviz.Digraph: Graph object that can be directly displayed in Jupyter notebooks. Raises: RuntimeError: If graphviz is not installed. """graphviz=import_or_raise("graphviz",error_msg="Please install graphviz to visualize pipelines.",)# Try rendering a dummy graph to see if a working backend is installedtry:graphviz.Digraph().pipe()exceptgraphviz.backend.ExecutableNotFound:raiseRuntimeError("To visualize component graphs, a graphviz backend is required.\n"+"Install the backend using one of the following commands:\n"+" Mac OS: brew install graphviz\n"+" Linux (Ubuntu): sudo apt-get install graphviz\n"+" Windows: conda install python-graphviz\n",)graph=graphviz.Digraph(name=name,format=graph_format,graph_attr={"splines":"true","overlap":"scale","rankdir":"LR"},)forcomponent_name,component_classinself.component_instances.items():label="%s\l"%(component_name)# noqa: W605ifisinstance(component_class,ComponentBase):# Reformat labels for nodes: cast values as strings, reformat floats to 2 decimal points and remove brackets from dictionary values so Digraph can parse itparameters="\\l".join([(key+" : "+"{:0.2f}".format(val)if(isinstance(val,float))elsekey+" : "+str(val).replace("{","").replace("}",""))forkey,valincomponent_class.parameters.items()],)# noqa: W605label="%s |%s\l"%(component_name,parameters)# noqa: W605graph.node(component_name,shape="record",label=label,nodesep="0.03")graph.node("X",shape="circle",label="X")graph.node("y",shape="circle",label="y")x_edges=self._get_edges(self.component_dict,"features")y_edges=self._get_edges(self.component_dict,"target")forcomponent_name,component_infoinself.component_dict.items():forparentincomponent_info[1:]:ifparent=="X":x_edges.append(("X",component_name))elifparent=="y":y_edges.append(("y",component_name))foredgeinx_edges:graph.edge(edge[0],edge[1],color="black")foredgeiny_edges:graph.edge(edge[0],edge[1],style="dotted")returngraph
@staticmethoddef_get_edges(component_dict,edges_to_return="all"):"""Gets the edges for a component graph. Args: component_dict (dict): Component dictionary to get edges from. edges_to_return (str): The types of edges to return. Defaults to "all". - if "all", returns all types of edges. - if "features", returns only feature edges - if "target", returns only target edges """edges=[]forcomponent_name,component_infoincomponent_dict.items():forparentincomponent_info[1:]:feature_edge=parent[-2:]==".x"target_edge=parent[-2:]==".y"return_edge=((edges_to_return=="features"andfeature_edge)or(edges_to_return=="target"andtarget_edge)or(edges_to_return=="all"and(feature_edgeortarget_edge)))ifparent=="X"orparent=="y":continueelifreturn_edge:parent=parent[:-2]edges.append((parent,component_name))returnedges
[docs]@classmethoddefgenerate_order(cls,component_dict):"""Regenerated the topologically sorted order of the graph."""edges=cls._get_edges(component_dict)iflen(component_dict)==1:returnlist(component_dict.keys())iflen(edges)==0:return[]digraph=nx.DiGraph()digraph.add_nodes_from(list(component_dict.keys()))digraph.add_edges_from(edges)ifnotnx.is_weakly_connected(digraph):raiseValueError("The given graph is not completely connected")try:compute_order=list(topological_sort(digraph))exceptNetworkXUnfeasible:raiseValueError("The given graph contains a cycle")end_components=[componentforcomponentincompute_orderiflen(nx.descendants(digraph,component))==0]iflen(end_components)!=1:raiseValueError("The given graph has more than one final (childless) component",)returncompute_order
def__getitem__(self,index):"""Get an element in the component graph."""ifisinstance(index,int):returnself.get_component(self.compute_order[index])else:returnself.get_component(index)def__iter__(self):"""Iterator for the component graph."""self._i=0returnselfdef__next__(self):"""Iterator for graphs, retrieves the components in the graph in order. Returns: ComponentBase obj: The next component class or instance in the graph """ifself._i<len(self.compute_order):self._i+=1returnself.get_component(self.compute_order[self._i-1])else:self._i=0raiseStopIterationdef__eq__(self,other):"""Test for equality."""ifnotisinstance(other,self.__class__):returnFalserandom_seed_eq=self.random_seed==other.random_seedifnotrandom_seed_eq:returnFalseattributes_to_check=["component_dict","compute_order"]forattributeinattributes_to_check:ifgetattr(self,attribute)!=getattr(other,attribute):returnFalsereturnTruedef__repr__(self):"""String representation of a component graph."""component_strs=[]for(component_name,component_info,)inself.component_dict.items():try:component_key=f"'{component_name}': "ifisinstance(component_info[0],str):component_class=handle_component_class(component_info[0])else:component_class=handle_component_class(component_info[0].name)component_name=f"'{component_class.name}'"exceptMissingComponentError:# Not an EvalML component, use component class namecomponent_name=f"{component_info[0].__name__}"component_edges_str=""iflen(component_info)>1:component_edges_str=", "component_edges_str+=", ".join([f"'{info}'"forinfoincomponent_info[1:]],)component_str=f"{component_key}[{component_name}{component_edges_str}]"component_strs.append(component_str)component_dict_str=f"{{{', '.join(component_strs)}}}"returncomponent_dict_strdef_get_parent_y(self,component_name):"""Helper for inverse_transform method."""parents=self.get_inputs(component_name)returnnext(iter(p[:-2]forpinparentsif".y"inp),None)
[docs]definverse_transform(self,y):"""Apply component inverse_transform methods to estimator predictions in reverse order. Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd). Args: y: (pd.Series): Final component features. Returns: pd.Series: The target with inverse transformation applied. """data_to_transform=infer_feature_types(y)current_component=self.compute_order[-1]has_incoming_y_from_parent=Truewhilehas_incoming_y_from_parent:parent_y=self._get_parent_y(current_component)ifparent_y:component=self.get_component(parent_y)ifhasattr(component,"inverse_transform"):data_to_transform=component.inverse_transform(data_to_transform)current_component=parent_yelse:has_incoming_y_from_parent=Falsereturndata_to_transform