API Reference

Demo Datasets

load_fraud

Load credit card fraud dataset.

load_wine

Load wine dataset.

load_breast_cancer

Load breast cancer dataset.

load_diabetes

Load diabetes dataset.

Preprocessing

Utilities to preprocess data before using evalml.

drop_nan_target_rows

Drops rows in X and y when row in the target y has a value of NaN.

label_distribution

Get the label distributions.

load_data

Load features and labels from file.

number_of_features

Get the number of features for specific dtypes.

split_data

Splits data into train and test sets.

AutoML

AutoML Search Classes

AutoMLSearch

Automated Pipeline search.

AutoML Algorithm Classes

AutoMLAlgorithm

Base class for the automl algorithms which power evalml.

IterativeAlgorithm

An automl algorithm which first fits a base round of pipelines with default parameters, then does a round of parameter tuning on each pipeline in order of performance.

Pipelines

Pipeline Base Classes

PipelineBase

Base class for all pipelines.

ClassificationPipeline

Pipeline subclass for all classification pipelines.

BinaryClassificationPipeline

Pipeline subclass for all binary classification pipelines.

MulticlassClassificationPipeline

Pipeline subclass for all multiclass classification pipelines.

RegressionPipeline

Pipeline subclass for all regression pipelines.

Classification Pipelines

BaselineBinaryPipeline

Baseline Pipeline for binary classification.

BaselineMulticlassPipeline

Baseline Pipeline for multiclass classification.

ModeBaselineBinaryPipeline

Mode Baseline Pipeline for binary classification.

ModeBaselineMulticlassPipeline

Mode Baseline Pipeline for multiclass classification.

Regression Pipelines

BaselineRegressionPipeline

Baseline Pipeline for regression problems.

MeanBaselineRegressionPipeline

Baseline Pipeline for regression problems.

Pipeline Graph Utils

precision_recall_curve

Given labels and binary classifier predicted probabilities, compute and return the data representing a precision-recall curve.

graph_precision_recall_curve

Generate and display a precision-recall plot.

roc_curve

Given labels and classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve.

graph_roc_curve

Generate and display a Receiver Operating Characteristic (ROC) plot.

confusion_matrix

Confusion matrix for binary and multiclass classification.

normalize_confusion_matrix

Normalizes a confusion matrix.

graph_confusion_matrix

Generate and display a confusion matrix plot.

calculate_permutation_importance

Calculates permutation importance for features.

graph_permutation_importance

Generate a bar graph of the pipeline’s permutation importance.

Pipeline Utils

get_estimators

Returns the estimators allowed for a particular problem type.

make_pipeline

Given input data, target data, an estimator class and the problem type,

Components

Component Base Classes

Components represent a step in a pipeline.

ComponentBase

Base class for all components.

Transformer

A component that may or may not need fitting that transforms data.

Estimator

A component that fits and predicts given data.

Transformers

Transformers are components that take in data as input and output transformed data.

DropColumns

Drops specified columns in input data.

SelectColumns

Selects specified columns in input data.

OneHotEncoder

One-hot encoder to encode non-numeric data.

PerColumnImputer

Imputes missing data according to a specified imputation strategy per column

SimpleImputer

Imputes missing data according to a specified imputation strategy.

StandardScaler

Standardize features: removes mean and scales to unit variance.

RFRegressorSelectFromModel

Selects top features based on importance weights using a Random Forest regressor.

RFClassifierSelectFromModel

Selects top features based on importance weights using a Random Forest classifier.

DropNullColumns

Transformer to drop features whose percentage of NaN values exceeds a specified threshold

DateTimeFeaturization

Transformer that can automatically featurize DateTime columns.

TextFeaturizer

Transformer that can automatically featurize text columns.

Estimators

Classifiers

Classifiers are components that output a predicted class label.

CatBoostClassifier

CatBoost Classifier, a classifier that uses gradient-boosting on decision trees.

ElasticNetClassifier

Elastic Net Classifier.

ExtraTreesClassifier

Extra Trees Classifier.

RandomForestClassifier

Random Forest Classifier.

LogisticRegressionClassifier

Logistic Regression Classifier.

XGBoostClassifier

XGBoost Classifier.

BaselineClassifier

Classifier that predicts using the specified strategy.

Regressors

Regressors are components that output a predicted target value.

CatBoostRegressor

CatBoost Regressor, a regressor that uses gradient-boosting on decision trees.

ElasticNetRegressor

Elastic Net Regressor.

LinearRegressor

Linear Regressor.

ExtraTreesRegressor

Extra Trees Regressor.

RandomForestRegressor

Random Forest Regressor.

XGBoostRegressor

XGBoost Regressor.

BaselineRegressor

Regressor that predicts using the specified strategy.

Objective Functions

Objective Base Classes

ObjectiveBase

Base class for all objectives.

BinaryClassificationObjective

Base class for all binary classification objectives.

MulticlassClassificationObjective

Base class for all multiclass classification objectives.

RegressionObjective

Base class for all regression objectives.

Domain-Specific Objectives

FraudCost

Score the percentage of money lost of the total transaction amount process due to fraud.

LeadScoring

Lead scoring.

Classification Objectives

AccuracyBinary

Accuracy score for binary classification.

AccuracyMulticlass

Accuracy score for multiclass classification.

AUC

AUC score for binary classification.

AUCMacro

AUC score for multiclass classification using macro averaging.

AUCMicro

AUC score for multiclass classification using micro averaging.

AUCWeighted

AUC Score for multiclass classification using weighted averaging.

BalancedAccuracyBinary

Balanced accuracy score for binary classification.

BalancedAccuracyMulticlass

Balanced accuracy score for multiclass classification.

F1

F1 score for binary classification.

F1Micro

F1 score for multiclass classification using micro averaging.

F1Macro

F1 score for multiclass classification using macro averaging.

F1Weighted

F1 score for multiclass classification using weighted averaging.

LogLossBinary

Log Loss for binary classification.

LogLossMulticlass

Log Loss for multiclass classification.

MCCBinary

Matthews correlation coefficient for binary classification.

MCCMulticlass

Matthews correlation coefficient for multiclass classification.

Precision

Precision score for binary classification.

PrecisionMicro

Precision score for multiclass classification using micro averaging.

PrecisionMacro

Precision score for multiclass classification using macro averaging.

PrecisionWeighted

Precision score for multiclass classification using weighted averaging.

Recall

Recall score for binary classification.

RecallMicro

Recall score for multiclass classification using micro averaging.

RecallMacro

Recall score for multiclass classification using macro averaging.

RecallWeighted

Recall score for multiclass classification using weighted averaging.

Regression Objectives

R2

Coefficient of determination for regression.

MAE

Mean absolute error for regression.

MSE

Mean squared error for regression.

MeanSquaredLogError

Mean squared log error for regression.

MedianAE

Median absolute error for regression.

MaxError

Maximum residual error for regression.

ExpVariance

Explained variance score for regression.

RootMeanSquaredError

Root mean squared error for regression.

RootMeanSquaredLogError

Root mean squared log error for regression.

Problem Types

ProblemTypes

Enum for type of machine learning problem: BINARY, MULTICLASS, or REGRESSION.

handle_problem_types

Handles problem_type by either returning the ProblemTypes or converting from a str.

Model Family

ModelFamily

Enum for family of machine learning models.

handle_model_family

Handles model_family by either returning the ModelFamily or converting from a str :param model_family: model type that needs to be handled :type model_family: str or ModelFamily

list_model_families

List model type for a particular problem type.

Tuners

Tuner

Defines API for Tuners.

SKOptTuner

Bayesian Optimizer.

GridSearchTuner

Grid Search Optimizer.

RandomSearchTuner

Random Search Optimizer.

Data Checks

Data Check Classes

DataCheck

Base class for all data checks.

InvalidTargetDataCheck

Checks if the target labels contain missing or invalid data.

HighlyNullDataCheck

Checks if there are any highly-null columns in the input.

IDColumnsDataCheck

Check if any of the features are likely to be ID columns.

LabelLeakageDataCheck

Check if any of the features are highly correlated with the target.

OutliersDataCheck

Checks if there are any outliers in input data by using an Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies.

NoVarianceDataCheck

Check if any of the features or labels have no variance.

DataChecks

A collection of data checks.

DefaultDataChecks

A collection of basic data checks that is used by AutoML by default.

Data Check Messages

DataCheckMessage

Base class for all DataCheckMessages.

DataCheckError

DataCheckMessage subclass for errors returned by data checks.

DataCheckWarning

DataCheckMessage subclass for warnings returned by data checks.

Data Check Message Types

DataCheckMessageType

Enum for type of data check message: WARNING or ERROR.

Utils

import_or_raise

Attempts to import the requested library by name.

convert_to_seconds

Converts a string describing a length of time to its length in seconds.

get_random_state

Generates a numpy.random.RandomState instance using seed.

get_random_seed

Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator.