API Reference

Demo Datasets

load_fraud

Load credit card fraud dataset.

load_wine

Load wine dataset.

load_breast_cancer

Load breast cancer dataset.

load_diabetes

Load diabetes dataset.

Preprocessing

load_data

Load features and labels from file(s).

split_data

Splits data into train and test sets.

AutoML

AutoClassificationSearch

Automatic pipeline search class for classification problems

AutoRegressionSearch

Automatic pipeline search for regression problems

Plotting

AutoClassificationSearch.plot.get_roc_data

Gets data that can be used to create a ROC plot.

AutoClassificationSearch.plot.generate_roc_plot

Generate Receiver Operating Characteristic (ROC) plot for a given pipeline using cross-validation using the data returned from get_roc_data().

AutoClassificationSearch.plot.get_confusion_matrix_data

Gets data that can be used to create a confusion matrix plot.

AutoClassificationSearch.plot.generate_confusion_matrix

Generate confusion matrix plot for a given pipeline using the data returned from get_confusion_matrix_data().

AutoClassificationSearch.plot.generate_confusion_matrix

Generate confusion matrix plot for a given pipeline using the data returned from get_confusion_matrix_data().

Model Family

ModelFamily

Enum for family of machine learning models.

Components

Transformers

OneHotEncoder

One-hot encoder to encode non-numeric data

RFRegressorSelectFromModel

Selects top features based on importance weights using a Random Forest regressor

RFClassifierSelectFromModel

Selects top features based on importance weights using a Random Forest classifier

SimpleImputer

Imputes missing data according to a specified imputation strategy

StandardScaler

Standardize features: removes mean and scales to unit variance

Estimators

LogisticRegressionClassifier

Logistic Regression Classifier

RandomForestClassifier

Random Forest Classifier

XGBoostClassifier

XGBoost Classifier

LinearRegressor

Linear Regressor

RandomForestRegressor

Random Forest Regressor

Pipelines

Pipelines

PipelineBase

Base class for all pipelines.

RFClassificationPipeline

Random Forest Pipeline for both binary and multiclass classification

XGBoostPipeline

XGBoost Pipeline for both binary and multiclass classification

CatBoostClassificationPipeline

CatBoost Pipeline for both binary and multiclass classification.

LogisticRegressionPipeline

Logistic Regression Pipeline for both binary and multiclass classification

RFRegressionPipeline

Random Forest Pipeline for regression problems

CatBoostRegressionPipeline

CatBoost Pipeline for regression problems.

LinearRegressionPipeline

Linear Regression Pipeline for regression problems

Pipeline Utils

get_pipelines

Returns the pipelines allowed for a particular problem type.

list_model_families

List model type for a particular problem type

Plotting

PipelineBase.graph([filepath])

Generate an image representing the pipeline graph

PipelineBase.feature_importance_graph([…])

Generate a bar graph of the pipeline’s feature importances

Objective Functions

Domain Specific

FraudCost

Score the percentage of money lost of the total transaction amount process due to fraud

LeadScoring

Lead scoring

Classification

F1

F1 score for binary classification

F1Micro

F1 score for multiclass classification using micro averaging

F1Macro

F1 score for multiclass classification using macro averaging

F1Weighted

F1 score for multiclass classification using weighted averaging

Precision

Precision score for binary classification

PrecisionMicro

Precision score for multiclass classification using micro averaging

PrecisionMacro

Precision score for multiclass classification using macro averaging

PrecisionWeighted

Precision score for multiclass classification using weighted averaging

Recall

Recall score for binary classification

RecallMicro

Recall score for multiclass classification using micro averaging

RecallMacro

Recall score for multiclass classification using macro averaging

RecallWeighted

Recall score for multiclass classification using weighted averaging

AUC

AUC score for binary classification

AUCMicro

AUC score for multiclass classification using micro averaging

AUCMacro

AUC score for multiclass classification using macro averaging

AUCWeighted

AUC Score for multiclass classification using weighted averaging

LogLoss

Log Loss for both binary and multiclass classification

MCC

Matthews correlation coefficient for both binary and multiclass classification

ROC

Receiver Operating Characteristic score for binary classification.

ConfusionMatrix

Confusion matrix for classification problems

Regression

R2

Coefficient of determination for regression

MAE

Mean absolute error for regression

MSE

Mean squared error for regression

MSLE

Mean squared log error for regression

MedianAE

Median absolute error for regression

MaxError

Maximum residual error for regression

ExpVariance

Explained variance score for regression

Problem Types

ProblemTypes

Enum for type of machine learning problem: BINARY, MULTICLASS, or REGRESSION

handle_problem_types

Handles problem_type by either returning the ProblemTypes or converting from a str

Tuners

Tuner

Defines API for Tuners

SKOptTuner

Bayesian Optimizer

GridSearchTuner

Grid Search Optimizer

RandomSearchTuner

Random Search Optimizer

Guardrails

detect_highly_null

Checks if there are any highly-null columns in a dataframe.

detect_label_leakage

Check if any of the features are highly correlated with the target.

detect_outliers

Checks if there are any outliers in a dataframe by using first Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies.

detect_id_columns

Check if any of the features are ID columns.