API Reference¶
Demo Datasets¶
Load credit card fraud dataset. |
|
Load wine dataset. |
|
Load breast cancer dataset. |
|
Load diabetes dataset. |
Preprocessing¶
Load features and labels from file(s). |
|
Splits data into train and test sets. |
Models¶
Automatic pipeline search for classification problems |
|
Automatic pipeline search for regression problems |
Model Types¶
List model type for a particular problem type |
Pipelines¶
Returns potential pipelines by model type |
|
Saves pipeline at file path |
|
Loads pipeline at file path |
|
Random Forest Pipeline for both binary and multiclass classification |
|
XGBoost Pipeline for both binary and multiclass classification |
|
Logistic Regression Pipeline for both binary and multiclass classification |
|
Random Forest Pipeline for regression |
Objective Functions¶
Domain Specific¶
Score the percentage of money lost of the total transaction amount process due to fraud |
|
Lead scoring |
Classification¶
F1 Score for binary classification |
|
F1 Score for multiclass classification using micro averaging |
|
F1 Score for multiclass classification using macro averaging |
|
F1 Score for multiclass classification using weighted averaging |
|
Precision Score for binary classification |
|
Precision Score for multiclass classification using micro averaging |
|
Precision Score for multiclass classification using macro averaging |
|
Precision Score for multiclass classification using weighted averaging |
|
Recall Score for binary classification |
|
Recall Score for multiclass classification using micro averaging |
|
Recall Score for multiclass classification using macro averaging |
|
Recall Score for multiclass classification using weighted averaging |
|
AUC Score for binary classification |
|
AUC Score for multiclass classification using micro averaging |
|
AUC Score for multiclass classification using macro averaging |
|
AUC Score for multiclass classification using weighted averaging |
|
Log Loss for both binary and multiclass classification |
|
Matthews correlation coefficient for both binary and multiclass classification |
Regression¶
Coefficient of determination for regression |
|
Mean absolute error for regression |
|
Mean squared error for regression |
|
Mean squared log error for regression |
|
Median absolute error for regression |
|
Maximum residual error for regression |
|
Explained variance score for regression |
Problem Types¶
Enum for type of machine learning problem: BINARY, MULTICLASS, or REGRESSION |
|
Handles problem_type by either returning the ProblemTypes or converting from a str |
Tuners¶
Bayesian Optimizer |
Guardrails¶
Checks if there are any highly-null columns in a dataframe. |
|
Check if any of the features are highly correlated with the target. |
|
Checks if there are any outliers in a dataframe by using first Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies. |
|
Check if any of the features are ID columns. |