EvalML Components

Components are the lowest level of building blocks in EvalML. Each component represents a fundamental operation to be applied to data.

All components accept parameters as keyword arguments to their __init__ methods. These parameters can be used to configure behavior.

Each component class definition must include a human-readable name for the component. Additionally, each component class may expose parameters for AutoML search by defining a hyperparameter_ranges attribute containing the parameters in question.

EvalML splits components into two categories: transformers and estimators.

Transformers

Transformers subclass the Transformer class, and define a fit method to learn information from training data and a transform method to apply a learned transformation to new data.

For example, an imputer is configured with the desired impute strategy to follow, for instance the mean value. The imputers fit method would learn the mean from the training data, and the transform method would fill the learned mean value in for any missing values in new data.

All transformers can execute fit and transform separately or in one step by calling fit_transform. Defining a custom fit_transform method can facilitate useful performance optimizations in some cases.

[1]:
import numpy as np
import pandas as pd
from evalml.pipelines.components import SimpleImputer

X = pd.DataFrame([[1, 2, 3], [1, np.nan, 3]])
display(X)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/v0.11.2/lib/python3.7/site-packages/evalml/pipelines/components/transformers/preprocessing/text_featurization.py:35: RuntimeWarning: No text columns were given to TextFeaturizer, component will have no effect
  warnings.warn("No text columns were given to TextFeaturizer, component will have no effect", RuntimeWarning)
0 1 2
0 1 2.0 3
1 1 NaN 3
[2]:
imp = SimpleImputer(impute_strategy="mean")
X = imp.fit_transform(X)

display(X)
0 1 2
0 1 2.0 3
1 1 2.0 3

Below is a list of all transformers included with EvalML:

[3]:
from evalml.pipelines.components.utils import all_components, Estimator, Transformer
for component in all_components:
    if issubclass(component, Transformer):
        print(f"Transformer: {component.name}")
Transformer: Text Featurization Component
Transformer: Drop Null Columns Transformer
Transformer: DateTime Featurization Component
Transformer: Select Columns Transformer
Transformer: Drop Columns Transformer
Transformer: Standard Scaler
Transformer: Per Column Imputer
Transformer: Simple Imputer
Transformer: RF Regressor Select From Model
Transformer: RF Classifier Select From Model
Transformer: One Hot Encoder

Estimators

Each estimator wraps an ML algorithm. Estimators subclass the Estimator class, and define a fit method to learn information from training data and a predict method for generating predictions from new data. Classification estimators should also define a predict_proba method for generating predicted probabilities.

Estimator classes each define a model_family attribute indicating what type of model is used.

Here’s an example of using the `LogisticRegressionClassifier <../generated/evalml.pipelines.components.LogisticRegressionClassifier.html>`__ estimator to fit and predict on a simple dataset:

[4]:
from evalml.pipelines.components import LogisticRegressionClassifier

clf = LogisticRegressionClassifier()

X = X
y = [1, 0]

clf.fit(X, y)
clf.predict(X)
[4]:
0    0
1    0
dtype: int64

Below is a list of all estimators included with EvalML:

[5]:
from evalml.pipelines.components.utils import all_components, Estimator, Transformer
for component in all_components:
    if issubclass(component, Estimator):
        print(f"Estimator: {component.name}")
Estimator: Baseline Regressor
Estimator: Extra Trees Regressor
Estimator: XGBoost Regressor
Estimator: CatBoost Regressor
Estimator: Random Forest Regressor
Estimator: Linear Regressor
Estimator: Elastic Net Regressor
Estimator: Baseline Classifier
Estimator: Extra Trees Classifier
Estimator: Elastic Net Classifier
Estimator: CatBoost Classifier
Estimator: XGBoost Classifier
Estimator: Random Forest Classifier
Estimator: Logistic Regression Classifier