API Reference¶

Demo Datasets¶

`load_fraud`	Load credit card fraud dataset.
`load_wine`	Load wine dataset.
`load_breast_cancer`	Load breast cancer dataset.
`load_diabetes`	Load diabetes dataset.

`load_data`	Load features and labels from file(s).
`split_data`	Splits data into train and test sets.

`AutoClassificationSearch`	Automatic pipeline search class for classification problems
`AutoRegressionSearch`	Automatic pipeline search for regression problems

`AutoClassificationSearch.plot.get_roc_data`	Gets data that can be used to create a ROC plot.
`AutoClassificationSearch.plot.generate_roc_plot`	Generate Receiver Operating Characteristic (ROC) plot for a given pipeline using cross-validation using the data returned from get_roc_data().
`AutoClassificationSearch.plot.get_confusion_matrix_data`	Gets data that can be used to create a confusion matrix plot.
`AutoClassificationSearch.plot.generate_confusion_matrix`	Generate confusion matrix plot for a given pipeline using the data returned from get_confusion_matrix_data().
`AutoClassificationSearch.plot.generate_confusion_matrix`	Generate confusion matrix plot for a given pipeline using the data returned from get_confusion_matrix_data().

Enum for family of machine learning models.

`OneHotEncoder`	One-hot encoder to encode non-numeric data
`RFRegressorSelectFromModel`	Selects top features based on importance weights using a Random Forest regressor
`RFClassifierSelectFromModel`	Selects top features based on importance weights using a Random Forest classifier
`SimpleImputer`	Imputes missing data according to a specified imputation strategy
`StandardScaler`	Standardize features: removes mean and scales to unit variance

`LogisticRegressionClassifier`	Logistic Regression Classifier
`RandomForestClassifier`	Random Forest Classifier
`XGBoostClassifier`	XGBoost Classifier
`LinearRegressor`	Linear Regressor
`RandomForestRegressor`	Random Forest Regressor

Base class for all pipelines.

`RFClassificationPipeline`	Random Forest Pipeline for both binary and multiclass classification
`XGBoostPipeline`	XGBoost Pipeline for both binary and multiclass classification
`CatBoostClassificationPipeline`	CatBoost Pipeline for both binary and multiclass classification.
`LogisticRegressionPipeline`	Logistic Regression Pipeline for both binary and multiclass classification
`RFRegressionPipeline`	Random Forest Pipeline for regression problems
`CatBoostRegressionPipeline`	CatBoost Pipeline for regression problems.
`LinearRegressionPipeline`	Linear Regression Pipeline for regression problems

`get_pipelines`	Returns the pipelines allowed for a particular problem type.
`list_model_families`	List model type for a particular problem type

`PipelineBase.graph`([filepath])	Generate an image representing the pipeline graph
`PipelineBase.feature_importance_graph`([…])	Generate a bar graph of the pipeline’s feature importances

`FraudCost`	Score the percentage of money lost of the total transaction amount process due to fraud
`LeadScoring`	Lead scoring

`F1`	F1 score for binary classification
`F1Micro`	F1 score for multiclass classification using micro averaging
`F1Macro`	F1 score for multiclass classification using macro averaging
`F1Weighted`	F1 score for multiclass classification using weighted averaging
`Precision`	Precision score for binary classification
`PrecisionMicro`	Precision score for multiclass classification using micro averaging
`PrecisionMacro`	Precision score for multiclass classification using macro averaging
`PrecisionWeighted`	Precision score for multiclass classification using weighted averaging
`Recall`	Recall score for binary classification
`RecallMicro`	Recall score for multiclass classification using micro averaging
`RecallMacro`	Recall score for multiclass classification using macro averaging
`RecallWeighted`	Recall score for multiclass classification using weighted averaging
`AUC`	AUC score for binary classification
`AUCMicro`	AUC score for multiclass classification using micro averaging
`AUCMacro`	AUC score for multiclass classification using macro averaging
`AUCWeighted`	AUC Score for multiclass classification using weighted averaging
`LogLoss`	Log Loss for both binary and multiclass classification
`MCC`	Matthews correlation coefficient for both binary and multiclass classification
`ROC`	Receiver Operating Characteristic score for binary classification.
`ConfusionMatrix`	Confusion matrix for classification problems

`R2`	Coefficient of determination for regression
`MAE`	Mean absolute error for regression
`MSE`	Mean squared error for regression
`MSLE`	Mean squared log error for regression
`MedianAE`	Median absolute error for regression
`MaxError`	Maximum residual error for regression
`ExpVariance`	Explained variance score for regression

Enum for type of machine learning problem: BINARY, MULTICLASS, or REGRESSION

Handles problem_type by either returning the ProblemTypes or converting from a str

`detect_highly_null`	Checks if there are any highly-null columns in a dataframe.
`detect_label_leakage`	Check if any of the features are highly correlated with the target.
`detect_outliers`	Checks if there are any outliers in a dataframe by using first Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies.
`detect_id_columns`	Check if any of the features are ID columns.