API Reference¶

Demo Datasets¶

`load_fraud`	Load credit card fraud dataset.
`load_wine`	Load wine dataset.
`load_breast_cancer`	Load breast cancer dataset.
`load_diabetes`	Load diabetes dataset.

Preprocessing¶

Utilities to preprocess data before using evalml.

`drop_nan_target_rows`	Drops rows in X and y when row in the target y has a value of NaN.
`label_distribution`	Get the label distributions
`load_data`	Load features and labels from file.
`number_of_features`	Get the number of features for specific dtypes
`split_data`	Splits data into train and test sets.

AutoML¶

`AutoClassificationSearch`	Automatic pipeline search class for classification problems
`AutoRegressionSearch`	Automatic pipeline search for regression problems

Pipelines¶

Pipeline Base Classes¶

`PipelineBase`	Base class for all pipelines.
`ClassificationPipeline`	Pipeline subclass for all classification pipelines.
`BinaryClassificationPipeline`	Pipeline subclass for all binary classification pipelines.
`MulticlassClassificationPipeline`	Pipeline subclass for all multiclass classification pipelines.
`RegressionPipeline`	Pipeline subclass for all regression pipelines.

Classification Pipelines¶

`CatBoostBinaryClassificationPipeline`	CatBoost Pipeline for binary classification.
`CatBoostMulticlassClassificationPipeline`	CatBoost Pipeline for multiclass classification.
`LogisticRegressionBinaryPipeline`	Logistic Regression Pipeline for binary classification
`LogisticRegressionMulticlassPipeline`	Logistic Regression Pipeline for multiclass classification
`RFBinaryClassificationPipeline`	Random Forest Pipeline for binary classification
`RFMulticlassClassificationPipeline`	Random Forest Pipeline for multiclass classification
`XGBoostBinaryPipeline`	XGBoost Pipeline for binary classification
`XGBoostMulticlassPipeline`	XGBoost Pipeline for multiclass classification

Regression Pipelines¶

`RFRegressionPipeline`	Random Forest Pipeline for regression problems
`CatBoostRegressionPipeline`	CatBoost Pipeline for regression problems.
`LinearRegressionPipeline`	Linear Regression Pipeline for regression problems
`XGBoostRegressionPipeline`	XGBoost Pipeline for regression problems

Pipeline Utils¶

`all_pipelines`	Returns a complete list of all supported pipeline classes.
`get_pipelines`	Returns the pipelines allowed for a particular problem type.
`list_model_families`	List model type for a particular problem type

Pipeline Plot Utils¶

`roc_curve`	Receiver Operating Characteristic score for binary classification.
`confusion_matrix`	Confusion matrix for binary and multiclass classification.
`normalize_confusion_matrix`	Normalizes a confusion matrix.

Components¶

Transformers¶

Encoders¶

Encoders convert categorical or non-numerical features into numerical features.

OneHotEncoder

One-hot encoder to encode non-numeric data

Imputers¶

Imputers fill in missing data.

SimpleImputer

Imputes missing data according to a specified imputation strategy

Scalers¶

Scalers transform and standardize the range of data.

StandardScaler

Standardize features: removes mean and scales to unit variance

Feature Selectors¶

Feature selectors select a subset of relevant features for the model.

`RFRegressorSelectFromModel`	Selects top features based on importance weights using a Random Forest regressor
`RFClassifierSelectFromModel`	Selects top features based on importance weights using a Random Forest classifier

Estimators¶

Classifiers¶

Classifiers are models which can be trained to predict a class label from input data.

`CatBoostClassifier`	CatBoost Classifier, a classifier that uses gradient-boosting on decision trees.
`RandomForestClassifier`	Random Forest Classifier
`LogisticRegressionClassifier`	Logistic Regression Classifier
`XGBoostClassifier`	XGBoost Classifier

Regressors¶

Regressors are models which can be trained to predict a target value from input data.

`CatBoostRegressor`	CatBoost Regressor, a regressor that uses gradient-boosting on decision trees.
`LinearRegressor`	Linear Regressor
`RandomForestRegressor`	Random Forest Regressor
`XGBoostRegressor`	XGBoost Regressor

Objective Functions¶

Domain-Specific Objectives¶

`FraudCost`	Score the percentage of money lost of the total transaction amount process due to fraud
`LeadScoring`	Lead scoring

Classification Objectives¶

`AccuracyBinary`	Accuracy score for binary classification
`AccuracyMulticlass`	Accuracy score for multiclass classification
`AUC`	AUC score for binary classification
`AUCMacro`	AUC score for multiclass classification using macro averaging
`AUCMicro`	AUC score for multiclass classification using micro averaging
`AUCWeighted`	AUC Score for multiclass classification using weighted averaging
`BalancedAccuracyBinary`	Balanced accuracy score for binary classification
`BalancedAccuracyMulticlass`	Balanced accuracy score for multiclass classification
`F1`	F1 score for binary classification
`F1Micro`	F1 score for multiclass classification using micro averaging
`F1Macro`	F1 score for multiclass classification using macro averaging
`F1Weighted`	F1 score for multiclass classification using weighted averaging
`LogLossBinary`	Log Loss for binary classification
`LogLossMulticlass`	Log Loss for multiclass classification
`MCCBinary`	Matthews correlation coefficient for binary classification
`MCCMulticlass`	Matthews correlation coefficient for multiclass classification
`Precision`	Precision score for binary classification
`PrecisionMicro`	Precision score for multiclass classification using micro averaging
`PrecisionMacro`	Precision score for multiclass classification using macro averaging
`PrecisionWeighted`	Precision score for multiclass classification using weighted averaging
`Recall`	Recall score for binary classification
`RecallMicro`	Recall score for multiclass classification using micro averaging
`RecallMacro`	Recall score for multiclass classification using macro averaging
`RecallWeighted`	Recall score for multiclass classification using weighted averaging

Regression Objectives¶

`R2`	Coefficient of determination for regression
`MAE`	Mean absolute error for regression
`MSE`	Mean squared error for regression
`MedianAE`	Median absolute error for regression
`MaxError`	Maximum residual error for regression
`ExpVariance`	Explained variance score for regression

Problem Types¶

ProblemTypes

Enum for type of machine learning problem: BINARY, MULTICLASS, or REGRESSION

handle_problem_types

Handles problem_type by either returning the ProblemTypes or converting from a str

Model Family¶

ModelFamily

Enum for family of machine learning models.

Tuners¶

`Tuner`	Defines API for Tuners
`SKOptTuner`	Bayesian Optimizer
`GridSearchTuner`	Grid Search Optimizer
`RandomSearchTuner`	Random Search Optimizer

Guardrails¶

`detect_highly_null`	Checks if there are any highly-null columns in a dataframe.
`detect_label_leakage`	Check if any of the features are highly correlated with the target.
`detect_outliers`	Checks if there are any outliers in a dataframe by using first Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies.
`detect_id_columns`	Check if any of the features are ID columns.

Utils¶

`import_or_raise`	Attempts to import the requested library by name.
`convert_to_seconds`
`get_random_state`	Generates a numpy.random.RandomState instance using seed.
`get_random_seed`	Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator.