API Reference

Demo Datasets

load_fraud

Load credit card fraud dataset.

load_wine

Load wine dataset.

load_breast_cancer

Load breast cancer dataset.

load_diabetes

Load diabetes dataset.

Preprocessing

Utilities to preprocess data before using evalml.

drop_nan_target_rows

Drops rows in X and y when row in the target y has a value of NaN.

label_distribution

Get the label distributions

load_data

Load features and labels from file.

number_of_features

Get the number of features for specific dtypes

split_data

Splits data into train and test sets.

AutoML

AutoML Search Classes

AutoClassificationSearch

Automatic pipeline search class for classification problems

AutoRegressionSearch

Automatic pipeline search for regression problems

AutoSearchBase

Base class for AutoML searches.

AutoML Algorithm Classes

AutoMLAlgorithm

Base class for the automl algorithms which power evalml.

IterativeAlgorithm

An automl algorithm which first fits a base round of pipelines with default parameters, then does a round of parameter tuning on each pipeline in order of performance.

Pipelines

Pipeline Base Classes

PipelineBase

Base class for all pipelines.

ClassificationPipeline

Pipeline subclass for all classification pipelines.

BinaryClassificationPipeline

Pipeline subclass for all binary classification pipelines.

MulticlassClassificationPipeline

Pipeline subclass for all multiclass classification pipelines.

RegressionPipeline

Pipeline subclass for all regression pipelines.

Classification Pipelines

CatBoostBinaryClassificationPipeline

CatBoost Pipeline for binary classification.

CatBoostMulticlassClassificationPipeline

CatBoost Pipeline for multiclass classification.

ENBinaryPipeline

Elastic Net Pipeline for binary classification problems

ENMulticlassPipeline

Elastic Net Pipeline for multiclass classification problems

ETBinaryClassificationPipeline

Extra Trees Pipeline for binary classification

ETMulticlassClassificationPipeline

Extra Trees Pipeline for multiclass classification

LogisticRegressionBinaryPipeline

Logistic Regression Pipeline for binary classification

LogisticRegressionMulticlassPipeline

Logistic Regression Pipeline for multiclass classification

RFBinaryClassificationPipeline

Random Forest Pipeline for binary classification

RFMulticlassClassificationPipeline

Random Forest Pipeline for multiclass classification

XGBoostBinaryPipeline

XGBoost Pipeline for binary classification

XGBoostMulticlassPipeline

XGBoost Pipeline for multiclass classification

BaselineBinaryPipeline

“Baseline Pipeline for binary classification

BaselineMulticlassPipeline

“Baseline Pipeline for multiclass classification

ModeBaselineBinaryPipeline

“Mode Baseline Pipeline for binary classification

ModeBaselineMulticlassPipeline

“Mode Baseline Pipeline for multiclass classification

Regression Pipelines

RFRegressionPipeline

Random Forest Pipeline for regression problems

CatBoostRegressionPipeline

CatBoost Pipeline for regression problems.

ENRegressionPipeline

Elastic Net Pipeline for regression problems

ETRegressionPipeline

Extra Trees Pipeline for regression problems

LinearRegressionPipeline

Linear Regression Pipeline for regression problems

XGBoostRegressionPipeline

XGBoost Pipeline for regression problems

BaselineRegressionPipeline

Baseline Pipeline for regression problems

MeanBaselineRegressionPipeline

Baseline Pipeline for regression problems

Pipeline Utils

all_pipelines

Returns a complete list of all supported pipeline classes.

get_pipelines

Returns the pipelines allowed for a particular problem type.

list_model_families

List model type for a particular problem type

Pipeline Graph Utils

precision_recall_curve

Given labels and binary classifier predicted probabilities, compute and return the data representing a precision-recall curve.

graph_precision_recall_curve

Generate and display a precision-recall plot.

roc_curve

Given labels and binary classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve.

graph_roc_curve

Generate and display a Receiver Operating Characteristic (ROC) plot.

confusion_matrix

Confusion matrix for binary and multiclass classification.

normalize_confusion_matrix

Normalizes a confusion matrix.

graph_confusion_matrix

Generate and display a confusion matrix plot.

Components

Component Base Classes

Components represent a step in a pipeline.

ComponentBase

Base class for all components

Transformer

A component that may or may not need fitting that transforms data.

Estimator

A component that fits and predicts given data

Transformers

Transformers are components that take in data as input and output transformed data.

OneHotEncoder

One-hot encoder to encode non-numeric data

SimpleImputer

Imputes missing data according to a specified imputation strategy

StandardScaler

Standardize features: removes mean and scales to unit variance

RFRegressorSelectFromModel

Selects top features based on importance weights using a Random Forest regressor

RFClassifierSelectFromModel

Selects top features based on importance weights using a Random Forest classifier

Estimators

Classifiers

Classifiers are components that output a predicted class label.

CatBoostClassifier

CatBoost Classifier, a classifier that uses gradient-boosting on decision trees.

ElasticNetClassifier

Elastic Net Classifier

ExtraTreesClassifier

Extra Trees Classifier

RandomForestClassifier

Random Forest Classifier

LogisticRegressionClassifier

Logistic Regression Classifier

XGBoostClassifier

XGBoost Classifier

BaselineClassifier

Classifier that predicts using the specified strategy.

Regressors

Regressors are components that output a predicted target value.

CatBoostRegressor

CatBoost Regressor, a regressor that uses gradient-boosting on decision trees.

ElasticNetRegressor

Elastic Net Regressor

LinearRegressor

Linear Regressor

ExtraTreesRegressor

Extra Trees Regressor

RandomForestRegressor

Random Forest Regressor

XGBoostRegressor

XGBoost Regressor

BaselineRegressor

Regressor that predicts using the specified strategy.

Objective Functions

Objective Base Classes

ObjectiveBase

Base class for all objectives.

BinaryClassificationObjective

Base class for all binary classification objectives.

MulticlassClassificationObjective

Base class for all multiclass classification objectives.

RegressionObjective

Base class for all regression objectives.

Domain-Specific Objectives

FraudCost

Score the percentage of money lost of the total transaction amount process due to fraud

LeadScoring

Lead scoring

Classification Objectives

AccuracyBinary

Accuracy score for binary classification

AccuracyMulticlass

Accuracy score for multiclass classification

AUC

AUC score for binary classification

AUCMacro

AUC score for multiclass classification using macro averaging

AUCMicro

AUC score for multiclass classification using micro averaging

AUCWeighted

AUC Score for multiclass classification using weighted averaging

BalancedAccuracyBinary

Balanced accuracy score for binary classification

BalancedAccuracyMulticlass

Balanced accuracy score for multiclass classification

F1

F1 score for binary classification

F1Micro

F1 score for multiclass classification using micro averaging

F1Macro

F1 score for multiclass classification using macro averaging

F1Weighted

F1 score for multiclass classification using weighted averaging

LogLossBinary

Log Loss for binary classification

LogLossMulticlass

Log Loss for multiclass classification

MCCBinary

Matthews correlation coefficient for binary classification

MCCMulticlass

Matthews correlation coefficient for multiclass classification

Precision

Precision score for binary classification

PrecisionMicro

Precision score for multiclass classification using micro averaging

PrecisionMacro

Precision score for multiclass classification using macro averaging

PrecisionWeighted

Precision score for multiclass classification using weighted averaging

Recall

Recall score for binary classification

RecallMicro

Recall score for multiclass classification using micro averaging

RecallMacro

Recall score for multiclass classification using macro averaging

RecallWeighted

Recall score for multiclass classification using weighted averaging

Regression Objectives

R2

Coefficient of determination for regression

MAE

Mean absolute error for regression

MSE

Mean squared error for regression

MeanSquaredLogError

Mean squared log error for regression.

MedianAE

Median absolute error for regression

MaxError

Maximum residual error for regression

ExpVariance

Explained variance score for regression

RootMeanSquaredError

Root mean squared error for regression

RootMeanSquaredLogError

Root mean squared log error for regression.

Problem Types

ProblemTypes

Enum for type of machine learning problem: BINARY, MULTICLASS, or REGRESSION

handle_problem_types

Handles problem_type by either returning the ProblemTypes or converting from a str

Model Family

ModelFamily

Enum for family of machine learning models.

Tuners

Tuner

Defines API for Tuners

SKOptTuner

Bayesian Optimizer

GridSearchTuner

Grid Search Optimizer

RandomSearchTuner

Random Search Optimizer

Data Checks

Data Check Classes

DataCheck

Base class for all data checks.

HighlyNullDataCheck

Checks if there are any highly-null columns in the input.

IDColumnsDataCheck

Check if any of the features are likely to be ID columns.

LabelLeakageDataCheck

Check if any of the features are highly correlated with the target.

OutliersDataCheck

Checks if there are any outliers in input data by using an Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies.

DataChecks

A collection of data checks.

DefaultDataChecks

A collection of basic data checks that is used by AutoML by default.

Data Check Messages

DataCheckMessage

Base class for all DataCheckMessages.

DataCheckError

DataCheckMessage subclass for errors returned by data checks.

DataCheckWarning

DataCheckMessage subclass for warnings returned by data checks.

Data Check Message Types

DataCheckMessageType

Enum for type of data check message: WARNING or ERROR

Utils

import_or_raise

Attempts to import the requested library by name.

convert_to_seconds

get_random_state

Generates a numpy.random.RandomState instance using seed.

get_random_seed

Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator.