EvalML Components

From the overview, we see how each machine learning pipeline consists of individual components that process data before the data is ultimately sent to an estimator. Below we will describe each type of component in an EvalML pipeline.

Component Classes

Components can be split into two distinct classes: transformers and estimators.

[1]:
import numpy as np
import pandas as pd
from evalml.pipelines.components import SimpleImputer

X = pd.DataFrame([[1, 2, 3], [1, np.nan, 3]])
display(X)
0 1 2
0 1 2.0 3
1 1 NaN 3

Transformers take in data as input and output altered data. For example, an imputer takes in data and outputs filled in missing data with the mean, median, or most frequent value of each column.

A transformer can fit on data and then transform it in two steps by calling .fit() and .transform() or in one step by calling fit_transform().

[2]:
imp = SimpleImputer(impute_strategy="mean")
X = imp.fit_transform(X)

display(X)
0 1 2
0 1 2.0 3
1 1 2.0 3

On the other hand, an estimator fits on data (X) and labels (y) in order to take in new data as input and return the predicted label as output. Therefore, an estimator can fit on data and labels by calling .fit() and then predict by calling .predict() on new data. An example of this would be the LogisticRegressionClassifier. We can now see how a transformer alters data to make it easier for an estimator to learn and predict.

[3]:
from evalml.pipelines.components import LogisticRegressionClassifier

clf = LogisticRegressionClassifier()

X = X
y = [1, 0]

clf.fit(X, y)
clf.predict(X)
[3]:
array([0, 0])

Component Types

Components can further separate into different types that serve different functionality. Below we will go over the different types of transformers and estimators.

Transformer Types

Estimator Types