component_graph#

Component graph for a pipeline as a directed acyclic graph (DAG).

Module Contents#

Classes Summary#

ComponentGraph

Component graph for a pipeline as a directed acyclic graph (DAG).

Attributes Summary#

logger

Contents#

class evalml.pipelines.component_graph.ComponentGraph(component_dict=None, cached_data=None, random_seed=0)[source]#

Component graph for a pipeline as a directed acyclic graph (DAG).

Parameters

component_dict (dict) – A dictionary which specifies the components and edges between components that should be used to create the component graph. Defaults to None.
cached_data (dict) – A dictionary of nested cached data. If the hashes and components are in this cache, we skip fitting for these components. Expected to be of format {hash1: {component_name: trained_component, …}, hash2: {…}, …}. Defaults to None.
random_seed (int) – Seed for the random number generator. Defaults to 0.

Examples

>>> component_dict = {'Imputer': ['Imputer', 'X', 'y'],
...                   'Logistic Regression': ['Logistic Regression Classifier', 'Imputer.x', 'y']}
>>> component_graph = ComponentGraph(component_dict)
>>> assert component_graph.compute_order == ['Imputer', 'Logistic Regression']
...
...
>>> component_dict = {'Imputer': ['Imputer', 'X', 'y'],
...                   'OHE': ['One Hot Encoder', 'Imputer.x', 'y'],
...                   'estimator_1': ['Random Forest Classifier', 'OHE.x', 'y'],
...                   'estimator_2': ['Decision Tree Classifier', 'OHE.x', 'y'],
...                   'final': ['Logistic Regression Classifier', 'estimator_1.x', 'estimator_2.x', 'y']}
>>> component_graph = ComponentGraph(component_dict)

The default parameters for every component in the component graph.

>>> assert component_graph.default_parameters == {
...     'Imputer': {'categorical_impute_strategy': 'most_frequent',
...                 'numeric_impute_strategy': 'mean',
...                 'boolean_impute_strategy': 'most_frequent',
...                 'categorical_fill_value': None,
...                 'numeric_fill_value': None,
...                 'boolean_fill_value': None},
...     'One Hot Encoder': {'top_n': 10,
...                         'features_to_encode': None,
...                         'categories': None,
...                         'drop': 'if_binary',
...                         'handle_unknown': 'ignore',
...                         'handle_missing': 'error'},
...     'Random Forest Classifier': {'n_estimators': 100,
...                                  'max_depth': 6,
...                                  'n_jobs': -1},
...     'Decision Tree Classifier': {'criterion': 'gini',
...                                  'max_features': 'sqrt',
...                                  'max_depth': 6,
...                                  'min_samples_split': 2,
...                                  'min_weight_fraction_leaf': 0.0},
...     'Logistic Regression Classifier': {'penalty': 'l2',
...                                        'C': 1.0,
...                                        'n_jobs': -1,
...                                        'multi_class': 'auto',
...                                        'solver': 'lbfgs'}}

Methods

`compute_order`	The order that components will be computed or called in.
`default_parameters`	The default parameter dictionary for this pipeline.
`describe`	Outputs component graph details including component parameters.
`fit`	Fit each component in the graph.
`fit_and_transform_all_but_final`	Fit and transform all components save the final one, usually an estimator.
`fit_transform`	Fit and transform all components in the component graph, if all components are Transformers.
`generate_order`	Regenerated the topologically sorted order of the graph.
`get_component`	Retrieves a single component object from the graph.
`get_component_input_logical_types`	Get the logical types that are passed to the given component.
`get_estimators`	Gets a list of all the estimator components within this graph.
`get_inputs`	Retrieves all inputs for a given component.
`get_last_component`	Retrieves the component that is computed last in the graph, usually the final estimator.
`graph`	Generate an image representing the component graph.
`has_dfs`	Whether this component graph contains a DFSTransformer or not.
`instantiate`	Instantiates all uninstantiated components within the graph using the given parameters. An error will be raised if a component is already instantiated but the parameters dict contains arguments for that component.
`inverse_transform`	Apply component inverse_transform methods to estimator predictions in reverse order.
`last_component_input_logical_types`	Get the logical types that are passed to the last component in the pipeline.
`predict`	Make predictions using selected features.
`transform`	Transform the input using the component graph.
`transform_all_but_final`	Transform all components save the final one, and gathers the data from any number of parents to get all the information that should be fed to the final component.

property compute_order(self)#: The order that components will be computed or called in.

property default_parameters(self)#

The default parameter dictionary for this pipeline.

Returns: Dictionary of all component default parameters.
Return type: dict

describe(self, return_dict=False)[source]#

Outputs component graph details including component parameters.

Parameters: return_dict (bool) – If True, return dictionary of information about component graph. Defaults to False.
Returns: Dictionary of all component parameters if return_dict is True, else None
Return type: dict
Raises: ValueError – If the componentgraph is not instantiated

fit(self, X, y)[source]#

Fit each component in the graph.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].

Returns

self

fit_and_transform_all_but_final(self, X, y)[source]#

Fit and transform all components save the final one, usually an estimator.

Parameters

X (pd.DataFrame) – The input training data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples].

Returns

Transformed features and target.

Return type

Tuple (pd.DataFrame, pd.Series)

fit_transform(self, X, y)[source]#

Fit and transform all components in the component graph, if all components are Transformers.

Parameters

X (pd.DataFrame) – Input features of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples].

Returns

Transformed output.

Return type

pd.DataFrame

Raises

ValueError – If final component is an Estimator.

classmethod generate_order(cls, component_dict)[source]#: Regenerated the topologically sorted order of the graph.

get_component(self, component_name)[source]#

Retrieves a single component object from the graph.

Parameters: component_name (str) – Name of the component to retrieve
Returns: ComponentBase object
Raises: ValueError – If the component is not in the graph.

get_component_input_logical_types(self, component_name)[source]#

Get the logical types that are passed to the given component.

Parameters

component_name (str) – Name of component in the graph

Returns

Dict - Mapping feature name to logical type instance.

Raises

ValueError – If the component is not in the graph.
ValueError – If the component graph as not been fitted

get_estimators(self)[source]#

Gets a list of all the estimator components within this graph.

Returns: All estimator objects within the graph.
Return type: list
Raises: ValueError – If the component graph is not yet instantiated.

get_inputs(self, component_name)[source]#

Retrieves all inputs for a given component.

Parameters: component_name (str) – Name of the component to look up.
Returns: List of inputs for the component to use.
Return type: list[str]
Raises: ValueError – If the component is not in the graph.

get_last_component(self)[source]#

Retrieves the component that is computed last in the graph, usually the final estimator.

Returns: ComponentBase object
Raises: ValueError – If the component graph has no edges.

graph(self, name=None, graph_format=None)[source]#

Generate an image representing the component graph.

Parameters

name (str) – Name of the graph. Defaults to None.
graph_format (str) – file format to save the graph in. Defaults to None.

Returns

Graph object that can be directly displayed in Jupyter notebooks.

Return type

graphviz.Digraph

Raises

RuntimeError – If graphviz is not installed.

property has_dfs(self)#: Whether this component graph contains a DFSTransformer or not.

instantiate(self, parameters=None)[source]#

Instantiates all uninstantiated components within the graph using the given parameters. An error will be raised if a component is already instantiated but the parameters dict contains arguments for that component.

Parameters: parameters (dict) – Dictionary with component names as keys and dictionary of that component’s parameters as values. An empty dictionary {} or None implies using all default values for component parameters. If a component in the component graph is already instantiated, it will not use any of its parameters defined in this dictionary. Defaults to None.
Returns: self
Raises: ValueError – If component graph is already instantiated or if a component errored while instantiating.

inverse_transform(self, y)[source]#

Apply component inverse_transform methods to estimator predictions in reverse order.

Components that implement inverse_transform are PolynomialDecomposer, LogTransformer, LabelEncoder (tbd).

Parameters: y – (pd.Series): Final component features.
Returns: The target with inverse transformation applied.
Return type: pd.Series

property last_component_input_logical_types(self)#

Get the logical types that are passed to the last component in the pipeline.

Returns

Dict - Mapping feature name to logical type instance.

Raises

ValueError – If the component is not in the graph.
ValueError – If the component graph as not been fitted

predict(self, X)[source]#

Make predictions using selected features.

Parameters: X (pd.DataFrame) – Input features of shape [n_samples, n_features].
Returns: Predicted values.
Return type: pd.Series
Raises: ValueError – If final component is not an Estimator.

transform(self, X, y=None)[source]#

Transform the input using the component graph.

Parameters

X (pd.DataFrame) – Input features of shape [n_samples, n_features].
y (pd.Series) – The target data of length [n_samples]. Defaults to None.

Returns

Transformed output.

Return type

pd.DataFrame

Raises

ValueError – If final component is not a Transformer.

transform_all_but_final(self, X, y=None)[source]#

Transform all components save the final one, and gathers the data from any number of parents to get all the information that should be fed to the final component.

Parameters

X (pd.DataFrame) – Data of shape [n_samples, n_features].
y (pd.Series) – The target training data of length [n_samples]. Defaults to None.

Returns

Transformed values.

Return type

pd.DataFrame

evalml.pipelines.component_graph.logger#