API Reference¶
Demo Datasets¶
Load credit card fraud dataset. |
|
Load wine dataset. |
|
Load breast cancer dataset. |
|
Load diabetes dataset. |
Preprocessing¶
Utilities to preprocess data before using evalml.
Drops rows in X and y when row in the target y has a value of NaN. |
|
Get the label distributions |
|
Load features and labels from file. |
|
Get the number of features for specific dtypes |
|
Splits data into train and test sets. |
AutoML¶
Automatic pipeline search class for classification problems |
|
Automatic pipeline search for regression problems |
Pipelines¶
Pipeline Base Classes¶
Base class for all pipelines. |
|
Pipeline subclass for all classification pipelines. |
|
Pipeline subclass for all binary classification pipelines. |
|
Pipeline subclass for all multiclass classification pipelines. |
|
Pipeline subclass for all regression pipelines. |
Classification Pipelines¶
CatBoost Pipeline for binary classification. |
|
CatBoost Pipeline for multiclass classification. |
|
Logistic Regression Pipeline for binary classification |
|
Logistic Regression Pipeline for multiclass classification |
|
Random Forest Pipeline for binary classification |
|
Random Forest Pipeline for multiclass classification |
|
XGBoost Pipeline for binary classification |
|
XGBoost Pipeline for multiclass classification |
Regression Pipelines¶
Random Forest Pipeline for regression problems |
|
CatBoost Pipeline for regression problems. |
|
Linear Regression Pipeline for regression problems |
|
XGBoost Pipeline for regression problems |
Pipeline Utils¶
Returns a complete list of all supported pipeline classes. |
|
Returns the pipelines allowed for a particular problem type. |
|
List model type for a particular problem type |
Pipeline Plot Utils¶
Receiver Operating Characteristic score for binary classification. |
|
Confusion matrix for binary and multiclass classification. |
|
Normalizes a confusion matrix. |
Components¶
Transformers¶
Encoders¶
Encoders convert categorical or non-numerical features into numerical features.
One-hot encoder to encode non-numeric data |
Imputers¶
Imputers fill in missing data.
Imputes missing data according to a specified imputation strategy |
Scalers¶
Scalers transform and standardize the range of data.
Standardize features: removes mean and scales to unit variance |
Feature Selectors¶
Feature selectors select a subset of relevant features for the model.
Selects top features based on importance weights using a Random Forest regressor |
|
Selects top features based on importance weights using a Random Forest classifier |
Estimators¶
Classifiers¶
Classifiers are models which can be trained to predict a class label from input data.
CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. |
|
Random Forest Classifier |
|
Logistic Regression Classifier |
|
XGBoost Classifier |
Regressors¶
Regressors are models which can be trained to predict a target value from input data.
CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. |
|
Linear Regressor |
|
Random Forest Regressor |
|
XGBoost Regressor |
Objective Functions¶
Domain-Specific Objectives¶
Score the percentage of money lost of the total transaction amount process due to fraud |
|
Lead scoring |
Classification Objectives¶
Accuracy score for binary classification |
|
Accuracy score for multiclass classification |
|
AUC score for binary classification |
|
AUC score for multiclass classification using macro averaging |
|
AUC score for multiclass classification using micro averaging |
|
AUC Score for multiclass classification using weighted averaging |
|
Balanced accuracy score for binary classification |
|
Balanced accuracy score for multiclass classification |
|
F1 score for binary classification |
|
F1 score for multiclass classification using micro averaging |
|
F1 score for multiclass classification using macro averaging |
|
F1 score for multiclass classification using weighted averaging |
|
Log Loss for binary classification |
|
Log Loss for multiclass classification |
|
Matthews correlation coefficient for binary classification |
|
Matthews correlation coefficient for multiclass classification |
|
Precision score for binary classification |
|
Precision score for multiclass classification using micro averaging |
|
Precision score for multiclass classification using macro averaging |
|
Precision score for multiclass classification using weighted averaging |
|
Recall score for binary classification |
|
Recall score for multiclass classification using micro averaging |
|
Recall score for multiclass classification using macro averaging |
|
Recall score for multiclass classification using weighted averaging |
Problem Types¶
Enum for type of machine learning problem: BINARY, MULTICLASS, or REGRESSION |
Handles problem_type by either returning the ProblemTypes or converting from a str |
Model Family¶
Enum for family of machine learning models. |
Tuners¶
Defines API for Tuners |
|
Bayesian Optimizer |
|
Grid Search Optimizer |
|
Random Search Optimizer |
Guardrails¶
Checks if there are any highly-null columns in a dataframe. |
|
Check if any of the features are highly correlated with the target. |
|
Checks if there are any outliers in a dataframe by using first Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies. |
|
Check if any of the features are ID columns. |
Utils¶
Attempts to import the requested library by name. |
|
Generates a numpy.random.RandomState instance using seed. |
|
Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator. |