API Reference¶
Demo Datasets¶
Load credit card fraud dataset. |
Load wine dataset. |
Load breast cancer dataset. |
Load diabetes dataset. |
Load credit card fraud dataset. |
Utilities to preprocess data before using evalml.
Load features and target from file. |
Drops rows in X and y when row in the target y has a value of NaN. |
Get the target distributions. |
Get the number of features of each specific dtype in a DataFrame. |
Splits data into train and test sets. |
Data Splitter Classes¶
Data splitter classes for imbalanced classification datasets.
Data splitter for generating training and validation split using Balanced Classification Data Sampler. |
Data splitter for generating k-fold cross-validation split using Balanced Classification Data Sampler. |
Splits the data into KFold cross validation sets and balances the training data using K-Means SMOTE. |
Splits the data into training and validation sets and balances the training data using K-Means SMOTE. |
Splits the data into KFold cross validation sets and uses SMOTE + Tomek links to balance the training data. |
Splits the data into training and validation sets and uses SMOTE + Tomek links balance the training data. |
Splits the training data into KFold cross validation sets and uses RandomUnderSampler to balance the training data. |
Splits the data into training and validation sets and uses RandomUnderSampler to balance the training data. |
Splits the data into KFold cross validation sets and uses SMOTENC to balance the training data. |
Splits the data into training and validation sets and uses SMOTENC to balance the training data. |
Exception to raise when a class is does not have an expected method or property. |
An exception raised when a particular pipeline is not found in automl search results |
Exception to raise when specified objective does not exist. |
Exception to raise when a class name does not comply with EvalML standards |
An exception raised when a component is not found in all_components() |
An exception to be raised when predict/predict_proba/transform is called on a component without fitting first. |
An exception to be raised when predict/predict_proba/transform is called on a pipeline without fitting first. |
Exception raised when all pipelines in an automl batch return a score of NaN for the primary objective. |
An exception raised when an ensemble is missing estimators (list) as a parameter. |
An exception raised when a pipeline errors while scoring any objective in a list of objectives. |
Exception raised when a data check can’t initialize with the parameters given. |
Warning thrown when there are null values in the column of interest |
AutoML Search Classes¶
Automated Pipeline search. |
AutoML Utils¶
Get the default primary search objective for a problem type. |
Given the training data and ML problem parameters, compute a data splitting method to use during AutoML search. |
AutoML Algorithm Classes¶
Base class for the automl algorithms which power evalml. |
An automl algorithm which first fits a base round of pipelines with default parameters, then does a round of parameter tuning on each pipeline in order of performance. |
AutoML Callbacks¶
No-op. |
Logs the exception thrown as an error. |
Raises the exception thrown by the AutoMLSearch object. |
Pipeline Base Classes¶
Base class for all pipelines. |
Pipeline subclass for all classification pipelines. |
Pipeline subclass for all binary classification pipelines. |
Pipeline subclass for all multiclass classification pipelines. |
Pipeline subclass for all regression pipelines. |
Pipeline base class for time series classifcation problems. |
Pipeline base class for time series regression problems. |
Classification Pipelines¶
Baseline Pipeline for binary classification. |
Baseline Pipeline for multiclass classification. |
Mode Baseline Pipeline for binary classification. |
Mode Baseline Pipeline for multiclass classification. |
Regression Pipelines¶
Baseline Pipeline for regression problems. |
Baseline Pipeline for regression problems. |
Baseline Pipeline for time series regression problems. |
Pipeline Utils¶
Given input data, target data, an estimator class and the problem type, |
Given a list of component instances and the problem type, an pipeline instance is generated with the component instances. |
Creates and returns a string that contains the Python imports and code required for running the EvalML pipeline. |
Component Base Classes¶
Components represent a step in a pipeline.
Base class for all components. |
A component that may or may not need fitting that transforms data. |
A component that fits and predicts given data. |
Component Utils¶
List the model types allowed for a particular problem type. |
Returns the estimators allowed for a particular problem type. |
Creates and returns a string that contains the Python imports and code required for running the EvalML component. |
Transformers are components that take in data as input and output transformed data.
Drops specified columns in input data. |
Selects specified columns in input data. |
One-hot encoder to encode non-numeric data. |
Target encoder to encode categorical data |
Imputes missing data according to a specified imputation strategy per column |
Imputes missing data according to a specified imputation strategy. |
Imputes missing data according to a specified imputation strategy. |
Standardize features: removes mean and scales to unit variance. |
Selects top features based on importance weights using a Random Forest regressor. |
Selects top features based on importance weights using a Random Forest classifier. |
Transformer to drop features whose percentage of NaN values exceeds a specified threshold |
Transformer that can automatically featurize DateTime columns. |
Transformer that can automatically featurize text columns. |
Transformer that delayes input features and target variable for time series problems. |
Featuretools DFS component that generates features for ww.DataTables and pd.DataFrames |
Removes trends from time series by fitting a polynomial to the data. |
Random undersampler component. |
Classifiers are components that output a predicted class label.
CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. |
Elastic Net Classifier. |
Extra Trees Classifier. |
Random Forest Classifier. |
LightGBM Classifier |
Logistic Regression Classifier. |
XGBoost Classifier. |
Classifier that predicts using the specified strategy. |
Stacked Ensemble Classifier. |
Decision Tree Classifier. |
K-Nearest Neighbors Classifier. |
Support Vector Machine Classifier. |
Regressors are components that output a predicted target value.
Autoregressive Integrated Moving Average Model. |
CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. |
Elastic Net Regressor. |
Linear Regressor. |
Extra Trees Regressor. |
Random Forest Regressor. |
XGBoost Regressor. |
Regressor that predicts using the specified strategy. |
Time series estimator that predicts using the naive forecasting approach. |
Stacked Ensemble Regressor. |
Decision Tree Regressor. |
LightGBM Regressor |
Support Vector Machine Regressor. |
Model Understanding¶
Utility Methods¶
Confusion matrix for binary and multiclass classification. |
Normalizes a confusion matrix. |
Given labels and binary classifier predicted probabilities, compute and return the data representing a precision-recall curve. |
Given labels and classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve. |
Calculates permutation importance for features. |
Computes objective score as a function of potential binary classification |
Get the data needed for the prediction_vs_actual_over_time plot. |
Calculates one or two-way partial dependence. |
Combines y_true and y_pred into a single dataframe and adds a column for outliers. |
Returns a dataframe showing the features with the greatest predictive power for a linear model. |
Get the transformed output after fitting X to the embedded space using t-SNE. |
Graph Utility Methods¶
Generate and display a precision-recall plot. |
Generate and display a Receiver Operating Characteristic (ROC) plot for binary and multiclass classification problems. |
Generate and display a confusion matrix plot. |
Generate a bar graph of the pipeline’s permutation importance. |
Generates a plot graphing objective score vs. |
Generate a scatter plot comparing the true and predicted values. |
Plot the target values and predictions against time on the x-axis. |
Create an one-way or two-way partial dependence plot. |
Plot high dimensional data into lower dimensional space using t-SNE . |
Prediction Explanations¶
Creates a report summarizing the top contributing features for each data point in the input features. |
Creates a report summarizing the top contributing features for the best and worst points in the dataset as measured by error to true labels. |
Objective Functions¶
Objective Base Classes¶
Base class for all objectives. |
Base class for all binary classification objectives. |
Base class for all multiclass classification objectives. |
Base class for all regression objectives. |
Domain-Specific Objectives¶
Score the percentage of money lost of the total transaction amount process due to fraud. |
Lead scoring. |
Score using a cost-benefit matrix. |
Classification Objectives¶
Accuracy score for binary classification. |
Accuracy score for multiclass classification. |
AUC score for binary classification. |
AUC score for multiclass classification using macro averaging. |
AUC score for multiclass classification using micro averaging. |
AUC Score for multiclass classification using weighted averaging. |
Balanced accuracy score for binary classification. |
Balanced accuracy score for multiclass classification. |
F1 score for binary classification. |
F1 score for multiclass classification using micro averaging. |
F1 score for multiclass classification using macro averaging. |
F1 score for multiclass classification using weighted averaging. |
Log Loss for binary classification. |
Log Loss for multiclass classification. |
Matthews correlation coefficient for binary classification. |
Matthews correlation coefficient for multiclass classification. |
Precision score for binary classification. |
Precision score for multiclass classification using micro averaging. |
Precision score for multiclass classification using macro averaging. |
Precision score for multiclass classification using weighted averaging. |
Recall score for binary classification. |
Recall score for multiclass classification using micro averaging. |
Recall score for multiclass classification using macro averaging. |
Recall score for multiclass classification using weighted averaging. |
Regression Objectives¶
Coefficient of determination for regression. |
Mean absolute error for regression. |
Mean absolute percentage error for time series regression. |
Mean squared error for regression. |
Mean squared log error for regression. |
Median absolute error for regression. |
Maximum residual error for regression. |
Explained variance score for regression. |
Root mean squared error for regression. |
Root mean squared log error for regression. |
Objective Utils¶
Get a list of the names of all objectives. |
Returns all core objective instances associated with the given problem type. |
Get a list of all valid core objectives. |
Get non-core objective classes. |
Returns the Objective class corresponding to a given objective name. |
Problem Types¶
Handles problem_type by either returning the ProblemTypes or converting from a str. |
Determine the type of problem is being solved based on the targets (binary vs multiclass classification, regression) |
Enum defining the supported types of machine learning problems. |
Model Family¶
Handles model_family by either returning the ModelFamily or converting from a string |
Enum for family of machine learning models. |
Defines API for Tuners. |
Bayesian Optimizer. |
Grid Search Optimizer. |
Random Search Optimizer. |
Data Checks¶
Data Check Classes¶
Base class for all data checks. |
Checks if the target data contains missing or invalid values. |
Checks if there are any highly-null columns in the input. |
Check if any of the features are likely to be ID columns. |
Check if any of the features are highly correlated with the target by using mutual information or Pearson correlation. |
Checks if there are any outliers in input data by using IQR to determine score anomalies. |
Check if the target or any of the features have no variance. |
Checks if any target labels are imbalanced beyond a threshold. |
Check if any set features are likely to be multicollinear. |
Checks if datetime columns contain NaN values. |
Checks if natural language columns contain NaN values. |
A collection of data checks. |
A collection of basic data checks that is used by AutoML by default. |
Data Check Messages¶
Base class for all DataCheckMessages. |
DataCheckMessage subclass for errors returned by data checks. |
DataCheckMessage subclass for warnings returned by data checks. |
Data Check Message Types¶
Enum for type of data check message: WARNING or ERROR. |
Data Check Message Codes¶
Enum for data check message code. |
General Utils¶
Attempts to import the requested library by name. |
Converts a string describing a length of time to its length in seconds. |
Generates a numpy.random.RandomState instance using seed. |
Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator. |
Pad the beginning num_to_pad rows with nans. |
Drop rows that have any NaNs in all dataframes or series. |
Create a Woodwork structure from the given list, pandas, or numpy input, with specified types for columns. |
Saves fig to filepath if specified, or to a default location if not. |
Checks if the given DataTable contains only numeric values |
Get importable subclasses of a base class. |