EvalML Components¶
From the overview, we see how each machine learning pipeline consists of individual components that process data before the data is ultimately sent to an estimator. Below we will describe each type of component in an EvalML pipeline.
Component Classes¶
Components can be split into two distinct classes: transformers and estimators.
[1]:
import numpy as np
import pandas as pd
from evalml.pipelines.components import SimpleImputer
X = pd.DataFrame([[1, 2, 3], [1, np.nan, 3]])
display(X)
0 | 1 | 2 | |
---|---|---|---|
0 | 1 | 2.0 | 3 |
1 | 1 | NaN | 3 |
Transformers take in data as input and output altered data. For example, an imputer takes in data and outputs filled in missing data with the mean, median, or most frequent value of each column.
A transformer can fit on data and then transform it in two steps by calling .fit()
and .transform()
or in one step by calling fit_transform()
.
[2]:
imp = SimpleImputer(impute_strategy="mean")
X = imp.fit_transform(X)
display(X)
0 | 1 | 2 | |
---|---|---|---|
0 | 1 | 2.0 | 3 |
1 | 1 | 2.0 | 3 |
On the other hand, an estimator fits on data (X) and labels (y) in order to take in new data as input and return the predicted label as output. Therefore, an estimator can fit on data and labels by calling .fit()
and then predict by calling .predict()
on new data. An example of this would be the LogisticRegressionClassifier. We can now see how a transformer alters data to make it easier for an estimator to
learn and predict.
[3]:
from evalml.pipelines.components import LogisticRegressionClassifier
clf = LogisticRegressionClassifier()
X = X
y = [1, 0]
clf.fit(X, y)
clf.predict(X)
[3]:
array([0, 0])
Component Types¶
Components can further separate into different types that serve different functionality. Below we will go over the different types of transformers and estimators.
Transformer Types¶
Imputer: fills missing data
Ex: SimpleImputer
Scaler: alters numerical data into different scales
Ex: StandardScaler
Encoder: translates different data types
Ex: OneHotEncoder
Feature Selection: selects most useful columns of data
Estimator Types¶
Regressor: predicts numerical or continuous labels
Ex: LinearRegressor
Classifier: predicts categorical or discrete labels