API Reference¶
Demo Datasets¶
Load credit card fraud dataset. |
|
Load wine dataset. |
|
Load breast cancer dataset. |
|
Load diabetes dataset. |
Preprocessing¶
Load features and labels from file(s). |
|
Splits data into train and test sets. |
AutoML¶
Automatic pipeline search class for classification problems |
|
Automatic pipeline search for regression problems |
Plotting¶
Gets data that can be used to create a ROC plot. |
|
Generate Receiver Operating Characteristic (ROC) plot for a given pipeline using cross-validation using the data returned from get_roc_data(). |
|
Gets data that can be used to create a confusion matrix plot. |
|
Generate confusion matrix plot for a given pipeline using the data returned from get_confusion_matrix_data(). |
|
Generate confusion matrix plot for a given pipeline using the data returned from get_confusion_matrix_data(). |
Model Family¶
Enum for family of machine learning models. |
Components¶
Transformers¶
One-hot encoder to encode non-numeric data |
|
Selects top features based on importance weights using a Random Forest regressor |
|
Selects top features based on importance weights using a Random Forest classifier |
|
Imputes missing data according to a specified imputation strategy |
|
Standardize features: removes mean and scales to unit variance |
Estimators¶
Logistic Regression Classifier |
|
Random Forest Classifier |
|
XGBoost Classifier |
|
Linear Regressor |
|
Random Forest Regressor |
Pipelines¶
Pipelines¶
Base class for all pipelines. |
Random Forest Pipeline for both binary and multiclass classification |
|
XGBoost Pipeline for both binary and multiclass classification |
|
CatBoost Pipeline for both binary and multiclass classification. |
|
Logistic Regression Pipeline for both binary and multiclass classification |
|
Random Forest Pipeline for regression problems |
|
CatBoost Pipeline for regression problems. |
|
Linear Regression Pipeline for regression problems |
Pipeline Utils¶
Returns the pipelines allowed for a particular problem type. |
|
List model type for a particular problem type |
Plotting¶
|
Generate an image representing the pipeline graph |
Generate a bar graph of the pipeline’s feature importances |
Objective Functions¶
Domain Specific¶
Score the percentage of money lost of the total transaction amount process due to fraud |
|
Lead scoring |
Classification¶
F1 score for binary classification |
|
F1 score for multiclass classification using micro averaging |
|
F1 score for multiclass classification using macro averaging |
|
F1 score for multiclass classification using weighted averaging |
|
Precision score for binary classification |
|
Precision score for multiclass classification using micro averaging |
|
Precision score for multiclass classification using macro averaging |
|
Precision score for multiclass classification using weighted averaging |
|
Recall score for binary classification |
|
Recall score for multiclass classification using micro averaging |
|
Recall score for multiclass classification using macro averaging |
|
Recall score for multiclass classification using weighted averaging |
|
AUC score for binary classification |
|
AUC score for multiclass classification using micro averaging |
|
AUC score for multiclass classification using macro averaging |
|
AUC Score for multiclass classification using weighted averaging |
|
Log Loss for both binary and multiclass classification |
|
Matthews correlation coefficient for both binary and multiclass classification |
|
Receiver Operating Characteristic score for binary classification. |
|
Confusion matrix for classification problems |
Regression¶
Coefficient of determination for regression |
|
Mean absolute error for regression |
|
Mean squared error for regression |
|
Mean squared log error for regression |
|
Median absolute error for regression |
|
Maximum residual error for regression |
|
Explained variance score for regression |
Problem Types¶
Enum for type of machine learning problem: BINARY, MULTICLASS, or REGRESSION |
Handles problem_type by either returning the ProblemTypes or converting from a str |
Tuners¶
Defines API for Tuners |
|
Bayesian Optimizer |
|
Grid Search Optimizer |
|
Random Search Optimizer |
Guardrails¶
Checks if there are any highly-null columns in a dataframe. |
|
Check if any of the features are highly correlated with the target. |
|
Checks if there are any outliers in a dataframe by using first Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies. |
|
Check if any of the features are ID columns. |