API Reference¶
Demo Datasets¶
Load credit card fraud dataset. |
|
Load wine dataset. |
|
Load breast cancer dataset. |
|
Load diabetes dataset. |
Preprocessing¶
Utilities to preprocess data before using evalml.
Drops rows in X and y when row in the target y has a value of NaN. |
|
Get the label distributions |
|
Load features and labels from file. |
|
Get the number of features for specific dtypes |
|
Splits data into train and test sets. |
AutoML¶
AutoML Search Classes¶
Automatic pipeline search class for classification problems |
|
Automatic pipeline search for regression problems |
|
Base class for AutoML searches. |
AutoML Algorithm Classes¶
Base class for the automl algorithms which power evalml. |
|
An automl algorithm which first fits a base round of pipelines with default parameters, then does a round of parameter tuning on each pipeline in order of performance. |
Pipelines¶
Pipeline Base Classes¶
Base class for all pipelines. |
|
Pipeline subclass for all classification pipelines. |
|
Pipeline subclass for all binary classification pipelines. |
|
Pipeline subclass for all multiclass classification pipelines. |
|
Pipeline subclass for all regression pipelines. |
Classification Pipelines¶
CatBoost Pipeline for binary classification. |
|
CatBoost Pipeline for multiclass classification. |
|
Elastic Net Pipeline for binary classification problems |
|
Elastic Net Pipeline for multiclass classification problems |
|
Extra Trees Pipeline for binary classification |
|
Extra Trees Pipeline for multiclass classification |
|
Logistic Regression Pipeline for binary classification |
|
Logistic Regression Pipeline for multiclass classification |
|
Random Forest Pipeline for binary classification |
|
Random Forest Pipeline for multiclass classification |
|
XGBoost Pipeline for binary classification |
|
XGBoost Pipeline for multiclass classification |
|
“Baseline Pipeline for binary classification |
|
“Baseline Pipeline for multiclass classification |
|
“Mode Baseline Pipeline for binary classification |
|
“Mode Baseline Pipeline for multiclass classification |
Regression Pipelines¶
Random Forest Pipeline for regression problems |
|
CatBoost Pipeline for regression problems. |
|
Elastic Net Pipeline for regression problems |
|
Extra Trees Pipeline for regression problems |
|
Linear Regression Pipeline for regression problems |
|
XGBoost Pipeline for regression problems |
|
Baseline Pipeline for regression problems |
|
Baseline Pipeline for regression problems |
Pipeline Utils¶
Returns a complete list of all supported pipeline classes. |
|
Returns the pipelines allowed for a particular problem type. |
|
List model type for a particular problem type |
Pipeline Graph Utils¶
Given labels and binary classifier predicted probabilities, compute and return the data representing a precision-recall curve. |
|
Generate and display a precision-recall plot. |
|
Given labels and binary classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve. |
|
Generate and display a Receiver Operating Characteristic (ROC) plot. |
|
Confusion matrix for binary and multiclass classification. |
|
Normalizes a confusion matrix. |
|
Generate and display a confusion matrix plot. |
Components¶
Component Base Classes¶
Components represent a step in a pipeline.
Base class for all components |
|
A component that may or may not need fitting that transforms data. |
|
A component that fits and predicts given data |
Transformers¶
Transformers are components that take in data as input and output transformed data.
One-hot encoder to encode non-numeric data |
|
Imputes missing data according to a specified imputation strategy |
|
Standardize features: removes mean and scales to unit variance |
|
Selects top features based on importance weights using a Random Forest regressor |
|
Selects top features based on importance weights using a Random Forest classifier |
Estimators¶
Classifiers¶
Classifiers are components that output a predicted class label.
CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. |
|
Elastic Net Classifier |
|
Extra Trees Classifier |
|
Random Forest Classifier |
|
Logistic Regression Classifier |
|
XGBoost Classifier |
|
Classifier that predicts using the specified strategy. |
Regressors¶
Regressors are components that output a predicted target value.
CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. |
|
Elastic Net Regressor |
|
Linear Regressor |
|
Extra Trees Regressor |
|
Random Forest Regressor |
|
XGBoost Regressor |
|
Regressor that predicts using the specified strategy. |
Objective Functions¶
Objective Base Classes¶
Base class for all objectives. |
|
Base class for all binary classification objectives. |
|
Base class for all multiclass classification objectives. |
|
Base class for all regression objectives. |
Domain-Specific Objectives¶
Score the percentage of money lost of the total transaction amount process due to fraud |
|
Lead scoring |
Classification Objectives¶
Accuracy score for binary classification |
|
Accuracy score for multiclass classification |
|
AUC score for binary classification |
|
AUC score for multiclass classification using macro averaging |
|
AUC score for multiclass classification using micro averaging |
|
AUC Score for multiclass classification using weighted averaging |
|
Balanced accuracy score for binary classification |
|
Balanced accuracy score for multiclass classification |
|
F1 score for binary classification |
|
F1 score for multiclass classification using micro averaging |
|
F1 score for multiclass classification using macro averaging |
|
F1 score for multiclass classification using weighted averaging |
|
Log Loss for binary classification |
|
Log Loss for multiclass classification |
|
Matthews correlation coefficient for binary classification |
|
Matthews correlation coefficient for multiclass classification |
|
Precision score for binary classification |
|
Precision score for multiclass classification using micro averaging |
|
Precision score for multiclass classification using macro averaging |
|
Precision score for multiclass classification using weighted averaging |
|
Recall score for binary classification |
|
Recall score for multiclass classification using micro averaging |
|
Recall score for multiclass classification using macro averaging |
|
Recall score for multiclass classification using weighted averaging |
Regression Objectives¶
Coefficient of determination for regression |
|
Mean absolute error for regression |
|
Mean squared error for regression |
|
Mean squared log error for regression. |
|
Median absolute error for regression |
|
Maximum residual error for regression |
|
Explained variance score for regression |
|
Root mean squared error for regression |
|
Root mean squared log error for regression. |
Problem Types¶
Enum for type of machine learning problem: BINARY, MULTICLASS, or REGRESSION |
Handles problem_type by either returning the ProblemTypes or converting from a str |
Model Family¶
Enum for family of machine learning models. |
Tuners¶
Defines API for Tuners |
|
Bayesian Optimizer |
|
Grid Search Optimizer |
|
Random Search Optimizer |
Data Checks¶
Data Check Classes¶
Base class for all data checks. |
|
Checks if there are any highly-null columns in the input. |
|
Check if any of the features are likely to be ID columns. |
|
Check if any of the features are highly correlated with the target. |
|
Checks if there are any outliers in input data by using an Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies. |
A collection of data checks. |
|
A collection of basic data checks that is used by AutoML by default. |
Data Check Messages¶
Base class for all DataCheckMessages. |
|
DataCheckMessage subclass for errors returned by data checks. |
|
DataCheckMessage subclass for warnings returned by data checks. |
Data Check Message Types¶
Enum for type of data check message: WARNING or ERROR |
Utils¶
Attempts to import the requested library by name. |
|
Generates a numpy.random.RandomState instance using seed. |
|
Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator. |