# Model Understanding¶

Simply examining a model’s performance metrics is not enough to select a model and promote it for use in a production setting. While developing an ML algorithm, it is important to understand how the model behaves on the data, to examine the key factors influencing its predictions and to consider where it may be deficient. Determination of what “success” may mean for an ML project depends first and foremost on the user’s domain expertise.

EvalML includes a variety of tools for understanding models, from graphing utilities to methods for explaining predictions.

** Graphing methods on Jupyter Notebook and Jupyter Lab require ipywidgets to be installed.

** If graphing on Jupyter Lab, jupyterlab-plotly required. To download this, make sure you have npm installed.

## Graphing Utilities¶

First, let’s train a pipeline on some data.

```
[1]:
```

```
import evalml
class DTBinaryClassificationPipeline(evalml.pipelines.BinaryClassificationPipeline):
component_graph = ['Simple Imputer', 'Decision Tree Classifier']
X, y = evalml.demos.load_breast_cancer()
pipeline_dt = DTBinaryClassificationPipeline({})
pipeline_dt.fit(X, y)
```

```
[1]:
```

```
DTBinaryClassificationPipeline(parameters={'Simple Imputer':{'impute_strategy': 'most_frequent', 'fill_value': None}, 'Decision Tree Classifier':{'criterion': 'gini', 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0},})
```

### Tree Visualization¶

We can visualize the structure of the Decision Tree that was fit to that data, and save it if necessary.

```
[2]:
```

```
from evalml.model_understanding.graphs import visualize_decision_tree
visualize_decision_tree(pipeline_dt.estimator, max_depth=2, rotate=False, filled=True, filepath=None)
```

```
[2]:
```

Lets replace the Decision Tree Classifier with a Random Forest Classifier.

```
[3]:
```

```
class RFBinaryClassificationPipeline(evalml.pipelines.BinaryClassificationPipeline):
component_graph = ['Simple Imputer', 'Random Forest Classifier']
pipeline = RFBinaryClassificationPipeline({})
pipeline.fit(X, y)
print(pipeline.score(X, y, objectives=['log loss binary']))
```

```
OrderedDict([('Log Loss Binary', 0.038403828027876195)])
```

### Feature Importance¶

We can get the importance associated with each feature of the resulting pipeline

```
[4]:
```

```
pipeline.feature_importance
```

```
[4]:
```

feature | importance | |
---|---|---|

0 | worst perimeter | 0.176488 |

1 | worst concave points | 0.125260 |

2 | worst radius | 0.124161 |

3 | mean concave points | 0.086443 |

4 | worst area | 0.072465 |

5 | mean concavity | 0.072320 |

6 | mean perimeter | 0.056685 |

7 | mean area | 0.049599 |

8 | area error | 0.037229 |

9 | worst concavity | 0.028181 |

10 | mean radius | 0.023294 |

11 | radius error | 0.019457 |

12 | worst texture | 0.014990 |

13 | perimeter error | 0.014103 |

14 | mean texture | 0.013618 |

15 | worst compactness | 0.011310 |

16 | worst smoothness | 0.011139 |

17 | worst fractal dimension | 0.008118 |

18 | worst symmetry | 0.007818 |

19 | mean smoothness | 0.006152 |

20 | concave points error | 0.005887 |

21 | fractal dimension error | 0.005059 |

22 | concavity error | 0.004510 |

23 | smoothness error | 0.004493 |

24 | texture error | 0.004476 |

25 | mean compactness | 0.004050 |

26 | compactness error | 0.003559 |

27 | mean symmetry | 0.003243 |

28 | symmetry error | 0.003124 |

29 | mean fractal dimension | 0.002768 |

We can also create a bar plot of the feature importances

```
[5]:
```

```
pipeline.graph_feature_importance()
```