API Reference¶
Demo Datasets¶
Load credit card fraud dataset. |
|
Load wine dataset. |
|
Load breast cancer dataset. |
|
Load diabetes dataset. |
|
Load credit card fraud dataset. |
Preprocessing¶
Utilities to preprocess data before using evalml.
Load features and target from file. |
|
Drops rows in X and y when row in the target y has a value of NaN. |
|
Get the target distributions. |
|
Get the number of features of each specific dtype in a DataFrame. |
|
Splits data into train and test sets. |
Data Splitter Classes¶
Data splitter classes for imbalanced classification datasets.
Data splitter for generating training and validation split using Balanced Classification Data Sampler. |
|
Data splitter for generating k-fold cross-validation split using Balanced Classification Data Sampler. |
|
Splits the data into KFold cross validation sets and balances the training data using K-Means SMOTE. |
|
Splits the data into training and validation sets and balances the training data using K-Means SMOTE. |
|
Splits the data into KFold cross validation sets and uses SMOTE + Tomek links to balance the training data. |
|
Splits the data into training and validation sets and uses SMOTE + Tomek links balance the training data. |
|
Splits the training data into KFold cross validation sets and uses RandomUnderSampler to balance the training data. |
|
Splits the data into training and validation sets and uses RandomUnderSampler to balance the training data. |
|
Splits the data into KFold cross validation sets and uses SMOTENC to balance the training data. |
|
Splits the data into training and validation sets and uses SMOTENC to balance the training data. |
Exceptions¶
Exception to raise when a class is does not have an expected method or property. |
|
An exception raised when a particular pipeline is not found in automl search results |
|
Exception to raise when specified objective does not exist. |
|
Exception to raise when a class name does not comply with EvalML standards |
|
An exception raised when a component is not found in all_components() |
|
An exception to be raised when predict/predict_proba/transform is called on a component without fitting first. |
|
An exception to be raised when predict/predict_proba/transform is called on a pipeline without fitting first. |
|
Exception raised when all pipelines in an automl batch return a score of NaN for the primary objective. |
|
An exception raised when an ensemble is missing estimators (list) as a parameter. |
|
An exception raised when a pipeline errors while scoring any objective in a list of objectives. |
|
Exception raised when a data check can’t initialize with the parameters given. |
|
Warning thrown when there are null values in the column of interest |
AutoML¶
AutoML Search Classes¶
Automated Pipeline search. |
AutoML Utils¶
Get the default primary search objective for a problem type. |
|
Given the training data and ML problem parameters, compute a data splitting method to use during AutoML search. |
AutoML Algorithm Classes¶
Base class for the automl algorithms which power evalml. |
|
An automl algorithm which first fits a base round of pipelines with default parameters, then does a round of parameter tuning on each pipeline in order of performance. |
AutoML Callbacks¶
No-op. |
|
Logs the exception thrown as an error. |
|
Raises the exception thrown by the AutoMLSearch object. |
Pipelines¶
Pipeline Base Classes¶
Base class for all pipelines. |
|
Pipeline subclass for all classification pipelines. |
|
Pipeline subclass for all binary classification pipelines. |
|
Pipeline subclass for all multiclass classification pipelines. |
|
Pipeline subclass for all regression pipelines. |
|
Pipeline base class for time series classifcation problems. |
|
Pipeline base class for time series regression problems. |
Classification Pipelines¶
Baseline Pipeline for binary classification. |
|
Baseline Pipeline for multiclass classification. |
|
Mode Baseline Pipeline for binary classification. |
|
Mode Baseline Pipeline for multiclass classification. |
Regression Pipelines¶
Baseline Pipeline for regression problems. |
|
Baseline Pipeline for regression problems. |
|
Baseline Pipeline for time series regression problems. |
Pipeline Utils¶
Given input data, target data, an estimator class and the problem type, |
|
Given a list of component instances and the problem type, an pipeline instance is generated with the component instances. |
|
Creates and returns a string that contains the Python imports and code required for running the EvalML pipeline. |
Components¶
Component Base Classes¶
Components represent a step in a pipeline.
Base class for all components. |
|
A component that may or may not need fitting that transforms data. |
|
A component that fits and predicts given data. |
Component Utils¶
List the model types allowed for a particular problem type. |
|
Returns the estimators allowed for a particular problem type. |
|
Creates and returns a string that contains the Python imports and code required for running the EvalML component. |
Transformers¶
Transformers are components that take in data as input and output transformed data.
Drops specified columns in input data. |
|
Selects specified columns in input data. |
|
One-hot encoder to encode non-numeric data. |
|
Target encoder to encode categorical data |
|
Imputes missing data according to a specified imputation strategy per column |
|
Imputes missing data according to a specified imputation strategy. |
|
Imputes missing data according to a specified imputation strategy. |
|
Standardize features: removes mean and scales to unit variance. |
|
Selects top features based on importance weights using a Random Forest regressor. |
|
Selects top features based on importance weights using a Random Forest classifier. |
|
Transformer to drop features whose percentage of NaN values exceeds a specified threshold |
|
Transformer that can automatically featurize DateTime columns. |
|
Transformer that can automatically featurize text columns. |
|
Transformer that delayes input features and target variable for time series problems. |
|
Featuretools DFS component that generates features for ww.DataTables and pd.DataFrames |
|
Removes trends from time series by fitting a polynomial to the data. |
|
Random undersampler component. |
Estimators¶
Classifiers¶
Classifiers are components that output a predicted class label.
CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. |
|
Elastic Net Classifier. |
|
Extra Trees Classifier. |
|
Random Forest Classifier. |
|
LightGBM Classifier |
|
Logistic Regression Classifier. |
|
XGBoost Classifier. |
|
Classifier that predicts using the specified strategy. |
|
Stacked Ensemble Classifier. |
|
Decision Tree Classifier. |
|
K-Nearest Neighbors Classifier. |
|
Support Vector Machine Classifier. |
Regressors¶
Regressors are components that output a predicted target value.
Autoregressive Integrated Moving Average Model. |
|
CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. |
|
Elastic Net Regressor. |
|
Linear Regressor. |
|
Extra Trees Regressor. |
|
Random Forest Regressor. |
|
XGBoost Regressor. |
|
Regressor that predicts using the specified strategy. |
|
Time series estimator that predicts using the naive forecasting approach. |
|
Stacked Ensemble Regressor. |
|
Decision Tree Regressor. |
|
LightGBM Regressor |
|
Support Vector Machine Regressor. |
Model Understanding¶
Utility Methods¶
Confusion matrix for binary and multiclass classification. |
|
Normalizes a confusion matrix. |
|
Given labels and binary classifier predicted probabilities, compute and return the data representing a precision-recall curve. |
|
Given labels and classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve. |
|
Calculates permutation importance for features. |
|
Computes objective score as a function of potential binary classification |
|
Get the data needed for the prediction_vs_actual_over_time plot. |
|
Calculates one or two-way partial dependence. |
|
Combines y_true and y_pred into a single dataframe and adds a column for outliers. |
|
Returns a dataframe showing the features with the greatest predictive power for a linear model. |
|
Get the transformed output after fitting X to the embedded space using t-SNE. |
Graph Utility Methods¶
Generate and display a precision-recall plot. |
|
Generate and display a Receiver Operating Characteristic (ROC) plot for binary and multiclass classification problems. |
|
Generate and display a confusion matrix plot. |
|
Generate a bar graph of the pipeline’s permutation importance. |
|
Generates a plot graphing objective score vs. |
|
Generate a scatter plot comparing the true and predicted values. |
|
Plot the target values and predictions against time on the x-axis. |
|
Create an one-way or two-way partial dependence plot. |
|
Plot high dimensional data into lower dimensional space using t-SNE . |
Prediction Explanations¶
Creates a report summarizing the top contributing features for each data point in the input features. |
|
Creates a report summarizing the top contributing features for the best and worst points in the dataset as measured by error to true labels. |
Objective Functions¶
Objective Base Classes¶
Base class for all objectives. |
|
Base class for all binary classification objectives. |
|
Base class for all multiclass classification objectives. |
|
Base class for all regression objectives. |
Domain-Specific Objectives¶
Score the percentage of money lost of the total transaction amount process due to fraud. |
|
Lead scoring. |
|
Score using a cost-benefit matrix. |
Classification Objectives¶
Accuracy score for binary classification. |
|
Accuracy score for multiclass classification. |
|
AUC score for binary classification. |
|
AUC score for multiclass classification using macro averaging. |
|
AUC score for multiclass classification using micro averaging. |
|
AUC Score for multiclass classification using weighted averaging. |
|
Balanced accuracy score for binary classification. |
|
Balanced accuracy score for multiclass classification. |
|
F1 score for binary classification. |
|
F1 score for multiclass classification using micro averaging. |
|
F1 score for multiclass classification using macro averaging. |
|
F1 score for multiclass classification using weighted averaging. |
|
Log Loss for binary classification. |
|
Log Loss for multiclass classification. |
|
Matthews correlation coefficient for binary classification. |
|
Matthews correlation coefficient for multiclass classification. |
|
Precision score for binary classification. |
|
Precision score for multiclass classification using micro averaging. |
|
Precision score for multiclass classification using macro averaging. |
|
Precision score for multiclass classification using weighted averaging. |
|
Recall score for binary classification. |
|
Recall score for multiclass classification using micro averaging. |
|
Recall score for multiclass classification using macro averaging. |
|
Recall score for multiclass classification using weighted averaging. |
Regression Objectives¶
Coefficient of determination for regression. |
|
Mean absolute error for regression. |
|
Mean absolute percentage error for time series regression. |
|
Mean squared error for regression. |
|
Mean squared log error for regression. |
|
Median absolute error for regression. |
|
Maximum residual error for regression. |
|
Explained variance score for regression. |
|
Root mean squared error for regression. |
|
Root mean squared log error for regression. |
Objective Utils¶
Get a list of the names of all objectives. |
|
Returns all core objective instances associated with the given problem type. |
|
Get a list of all valid core objectives. |
|
Get non-core objective classes. |
|
Returns the Objective class corresponding to a given objective name. |
Problem Types¶
Handles problem_type by either returning the ProblemTypes or converting from a str. |
|
Determine the type of problem is being solved based on the targets (binary vs multiclass classification, regression) |
|
Enum defining the supported types of machine learning problems. |
Model Family¶
Handles model_family by either returning the ModelFamily or converting from a string |
|
Enum for family of machine learning models. |
Tuners¶
Defines API for Tuners. |
|
Bayesian Optimizer. |
|
Grid Search Optimizer. |
|
Random Search Optimizer. |
Data Checks¶
Data Check Classes¶
Base class for all data checks. |
|
Checks if the target data contains missing or invalid values. |
|
Checks if there are any highly-null columns in the input. |
|
Check if any of the features are likely to be ID columns. |
|
Check if any of the features are highly correlated with the target by using mutual information or Pearson correlation. |
|
Checks if there are any outliers in input data by using IQR to determine score anomalies. |
|
Check if the target or any of the features have no variance. |
|
Checks if any target labels are imbalanced beyond a threshold. |
|
Check if any set features are likely to be multicollinear. |
|
Checks if datetime columns contain NaN values. |
|
Checks if natural language columns contain NaN values. |
A collection of data checks. |
|
A collection of basic data checks that is used by AutoML by default. |
Data Check Messages¶
Base class for all DataCheckMessages. |
|
DataCheckMessage subclass for errors returned by data checks. |
|
DataCheckMessage subclass for warnings returned by data checks. |
Data Check Message Types¶
Enum for type of data check message: WARNING or ERROR. |
Data Check Message Codes¶
Enum for data check message code. |
Utils¶
General Utils¶
Attempts to import the requested library by name. |
|
Converts a string describing a length of time to its length in seconds. |
|
Generates a numpy.random.RandomState instance using seed. |
|
Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator. |
|
Pad the beginning num_to_pad rows with nans. |
|
Drop rows that have any NaNs in all dataframes or series. |
|
Create a Woodwork structure from the given list, pandas, or numpy input, with specified types for columns. |
|
Saves fig to filepath if specified, or to a default location if not. |
|
Checks if the given DataTable contains only numeric values |
|
Get importable subclasses of a base class. |