load_fraud
Load credit card fraud dataset.
load_wine
Load wine dataset.
load_breast_cancer
Load breast cancer dataset.
load_diabetes
Load diabetes dataset.
load_churn
Utilities to preprocess data before using evalml.
load_data
Load features and target from file.
drop_nan_target_rows
Drops rows in X and y when row in the target y has a value of NaN.
target_distribution
Get the target distributions.
number_of_features
Get the number of features of each specific dtype in a DataFrame.
split_data
Splits data into train and test sets.
MethodPropertyNotFoundError
Exception to raise when a class is does not have an expected method or property.
PipelineNotFoundError
An exception raised when a particular pipeline is not found in automl search results
ObjectiveNotFoundError
Exception to raise when specified objective does not exist.
IllFormattedClassNameError
Exception to raise when a class name does not comply with EvalML standards
MissingComponentError
An exception raised when a component is not found in all_components()
ComponentNotYetFittedError
An exception to be raised when predict/predict_proba/transform is called on a component without fitting first.
PipelineNotYetFittedError
An exception to be raised when predict/predict_proba/transform is called on a pipeline without fitting first.
AutoMLSearchException
Exception raised when all pipelines in an automl batch return a score of NaN for the primary objective.
EnsembleMissingPipelinesError
An exception raised when an ensemble is missing estimators (list) as a parameter.
PipelineScoreError
An exception raised when a pipeline errors while scoring any objective in a list of objectives.
DataCheckInitError
Exception raised when a data check can’t initialize with the parameters given.
NullsInColumnWarning
Warning thrown when there are null values in the column of interest
AutoMLSearch
Automated Pipeline search.
get_default_primary_search_objective
Get the default primary search objective for a problem type.
make_data_splitter
Given the training data and ML problem parameters, compute a data splitting method to use during AutoML search.
AutoMLAlgorithm
Base class for the automl algorithms which power evalml.
IterativeAlgorithm
An automl algorithm which first fits a base round of pipelines with default parameters, then does a round of parameter tuning on each pipeline in order of performance.
silent_error_callback
No-op.
log_error_callback
Logs the exception thrown as an error.
raise_error_callback
Raises the exception thrown by the AutoMLSearch object.
log_and_save_error_callback
Logs the exception thrown by the AutoMLSearch object as a warning and adds the exception to the ‘errors’ list in AutoMLSearch object results.
raise_and_save_error_callback
Raises the exception thrown by the AutoMLSearch object, logs it as an error, and adds the exception to the ‘errors’ list in AutoMLSearch object results.
PipelineBase
Base class for all pipelines.
ClassificationPipeline
Pipeline subclass for all classification pipelines.
BinaryClassificationPipeline
Pipeline subclass for all binary classification pipelines.
MulticlassClassificationPipeline
Pipeline subclass for all multiclass classification pipelines.
RegressionPipeline
Pipeline subclass for all regression pipelines.
TimeSeriesClassificationPipeline
Pipeline base class for time series classifcation problems.
TimeSeriesBinaryClassificationPipeline
TimeSeriesMulticlassClassificationPipeline
TimeSeriesRegressionPipeline
Pipeline base class for time series regression problems.
BaselineBinaryPipeline
Baseline Pipeline for binary classification.
BaselineMulticlassPipeline
Baseline Pipeline for multiclass classification.
ModeBaselineBinaryPipeline
Mode Baseline Pipeline for binary classification.
ModeBaselineMulticlassPipeline
Mode Baseline Pipeline for multiclass classification.
BaselineRegressionPipeline
Baseline Pipeline for regression problems.
MeanBaselineRegressionPipeline
TimeSeriesBaselineRegressionPipeline
Baseline Pipeline for time series regression problems.
make_pipeline
Given input data, target data, an estimator class and the problem type,
make_pipeline_from_components
Given a list of component instances and the problem type, an pipeline instance is generated with the component instances.
generate_pipeline_code
Creates and returns a string that contains the Python imports and code required for running the EvalML pipeline.
Components represent a step in a pipeline.
ComponentBase
Base class for all components.
Transformer
A component that may or may not need fitting that transforms data.
Estimator
A component that fits and predicts given data.
allowed_model_families
List the model types allowed for a particular problem type.
get_estimators
Returns the estimators allowed for a particular problem type.
generate_component_code
Creates and returns a string that contains the Python imports and code required for running the EvalML component.
Transformers are components that take in data as input and output transformed data.
DropColumns
Drops specified columns in input data.
SelectColumns
Selects specified columns in input data.
OneHotEncoder
One-hot encoder to encode non-numeric data.
TargetEncoder
Target encoder to encode categorical data
PerColumnImputer
Imputes missing data according to a specified imputation strategy per column
Imputer
Imputes missing data according to a specified imputation strategy.
SimpleImputer
StandardScaler
Standardize features: removes mean and scales to unit variance.
RFRegressorSelectFromModel
Selects top features based on importance weights using a Random Forest regressor.
RFClassifierSelectFromModel
Selects top features based on importance weights using a Random Forest classifier.
DropNullColumns
Transformer to drop features whose percentage of NaN values exceeds a specified threshold
DateTimeFeaturizer
Transformer that can automatically featurize DateTime columns.
TextFeaturizer
Transformer that can automatically featurize text columns.
DelayedFeatureTransformer
Transformer that delayes input features and target variable for time series problems.
DFSTransformer
Featuretools DFS component that generates features for ww.DataTables and pd.DataFrames
Classifiers are components that output a predicted class label.
CatBoostClassifier
CatBoost Classifier, a classifier that uses gradient-boosting on decision trees.
ElasticNetClassifier
Elastic Net Classifier.
ExtraTreesClassifier
Extra Trees Classifier.
RandomForestClassifier
Random Forest Classifier.
LightGBMClassifier
LightGBM Classifier
LogisticRegressionClassifier
Logistic Regression Classifier.
XGBoostClassifier
XGBoost Classifier.
BaselineClassifier
Classifier that predicts using the specified strategy.
StackedEnsembleClassifier
Stacked Ensemble Classifier.
DecisionTreeClassifier
Decision Tree Classifier.
KNeighborsClassifier
K-Nearest Neighbors Classifier.
SVMClassifier
Support Vector Machine Classifier.
Regressors are components that output a predicted target value.
CatBoostRegressor
CatBoost Regressor, a regressor that uses gradient-boosting on decision trees.
ElasticNetRegressor
Elastic Net Regressor.
LinearRegressor
Linear Regressor.
ExtraTreesRegressor
Extra Trees Regressor.
RandomForestRegressor
Random Forest Regressor.
XGBoostRegressor
XGBoost Regressor.
BaselineRegressor
Regressor that predicts using the specified strategy.
TimeSeriesBaselineEstimator
Time series estimator that predicts using the naive forecasting approach.
StackedEnsembleRegressor
Stacked Ensemble Regressor.
DecisionTreeRegressor
Decision Tree Regressor.
LightGBMRegressor
LightGBM Regressor
SVMRegressor
Support Vector Machine Regressor.
confusion_matrix
Confusion matrix for binary and multiclass classification.
normalize_confusion_matrix
Normalizes a confusion matrix.
precision_recall_curve
Given labels and binary classifier predicted probabilities, compute and return the data representing a precision-recall curve.
roc_curve
Given labels and classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve.
calculate_permutation_importance
Calculates permutation importance for features.
binary_objective_vs_threshold
Computes objective score as a function of potential binary classification
get_prediction_vs_actual_over_time_data
Get the data needed for the prediction_vs_actual_over_time plot.
partial_dependence
Calculates one or two-way partial dependence.
get_prediction_vs_actual_data
Combines y_true and y_pred into a single dataframe and adds a column for outliers.
get_linear_coefficients
Returns a dataframe showing the features with the greatest predictive power for a linear model.
t_sne
Get the transformed output after fitting X to the embedded space using t-SNE.
graph_precision_recall_curve
Generate and display a precision-recall plot.
graph_roc_curve
Generate and display a Receiver Operating Characteristic (ROC) plot for binary and multiclass classification problems.
graph_confusion_matrix
Generate and display a confusion matrix plot.
graph_permutation_importance
Generate a bar graph of the pipeline’s permutation importance.
graph_binary_objective_vs_threshold
Generates a plot graphing objective score vs.
graph_prediction_vs_actual
Generate a scatter plot comparing the true and predicted values.
graph_prediction_vs_actual_over_time
Plot the target values and predictions against time on the x-axis.
graph_partial_dependence
Create an one-way or two-way partial dependence plot.
graph_t_sne
Plot high dimensional data into lower dimensional space using t-SNE .
explain_prediction
Creates table summarizing the top_k positive and top_k negative contributing features to the prediction of a single datapoint.
explain_predictions
Creates a report summarizing the top contributing features for each data point in the input features.
explain_predictions_best_worst
Creates a report summarizing the top contributing features for the best and worst points in the dataset as measured by error to true labels.
ObjectiveBase
Base class for all objectives.
BinaryClassificationObjective
Base class for all binary classification objectives.
MulticlassClassificationObjective
Base class for all multiclass classification objectives.
RegressionObjective
Base class for all regression objectives.
FraudCost
Score the percentage of money lost of the total transaction amount process due to fraud.
LeadScoring
Lead scoring.
CostBenefitMatrix
Score using a cost-benefit matrix.
AccuracyBinary
Accuracy score for binary classification.
AccuracyMulticlass
Accuracy score for multiclass classification.
AUC
AUC score for binary classification.
AUCMacro
AUC score for multiclass classification using macro averaging.
AUCMicro
AUC score for multiclass classification using micro averaging.
AUCWeighted
AUC Score for multiclass classification using weighted averaging.
BalancedAccuracyBinary
Balanced accuracy score for binary classification.
BalancedAccuracyMulticlass
Balanced accuracy score for multiclass classification.
F1
F1 score for binary classification.
F1Micro
F1 score for multiclass classification using micro averaging.
F1Macro
F1 score for multiclass classification using macro averaging.
F1Weighted
F1 score for multiclass classification using weighted averaging.
LogLossBinary
Log Loss for binary classification.
LogLossMulticlass
Log Loss for multiclass classification.
MCCBinary
Matthews correlation coefficient for binary classification.
MCCMulticlass
Matthews correlation coefficient for multiclass classification.
Precision
Precision score for binary classification.
PrecisionMicro
Precision score for multiclass classification using micro averaging.
PrecisionMacro
Precision score for multiclass classification using macro averaging.
PrecisionWeighted
Precision score for multiclass classification using weighted averaging.
Recall
Recall score for binary classification.
RecallMicro
Recall score for multiclass classification using micro averaging.
RecallMacro
Recall score for multiclass classification using macro averaging.
RecallWeighted
Recall score for multiclass classification using weighted averaging.
R2
Coefficient of determination for regression.
MAE
Mean absolute error for regression.
MAPE
Mean absolute percentage error for time series regression.
MSE
Mean squared error for regression.
MeanSquaredLogError
Mean squared log error for regression.
MedianAE
Median absolute error for regression.
MaxError
Maximum residual error for regression.
ExpVariance
Explained variance score for regression.
RootMeanSquaredError
Root mean squared error for regression.
RootMeanSquaredLogError
Root mean squared log error for regression.
get_all_objective_names
Get a list of the names of all objectives.
get_core_objectives
Returns all core objective instances associated with the given problem type.
get_core_objective_names
Get a list of all valid core objectives.
get_non_core_objectives
Get non-core objective classes.
get_objective
Returns the Objective class corresponding to a given objective name.
handle_problem_types
Handles problem_type by either returning the ProblemTypes or converting from a str.
detect_problem_type
Determine the type of problem is being solved based on the targets (binary vs multiclass classification, regression)
ProblemTypes
Enum defining the supported types of machine learning problems.
handle_model_family
Handles model_family by either returning the ModelFamily or converting from a string
ModelFamily
Enum for family of machine learning models.
Tuner
Defines API for Tuners.
SKOptTuner
Bayesian Optimizer.
GridSearchTuner
Grid Search Optimizer.
RandomSearchTuner
Random Search Optimizer.
DataCheck
Base class for all data checks.
InvalidTargetDataCheck
Checks if the target data contains missing or invalid values.
HighlyNullDataCheck
Checks if there are any highly-null columns in the input.
IDColumnsDataCheck
Check if any of the features are likely to be ID columns.
TargetLeakageDataCheck
Check if any of the features are highly correlated with the target by using mutual information or Pearson correlation.
OutliersDataCheck
Checks if there are any outliers in input data by using IQR to determine score anomalies.
NoVarianceDataCheck
Check if the target or any of the features have no variance.
ClassImbalanceDataCheck
Checks if any target labels are imbalanced beyond a threshold.
MulticollinearityDataCheck
Check if any set features are likely to be multicollinear.
DataChecks
A collection of data checks.
DefaultDataChecks
A collection of basic data checks that is used by AutoML by default.
DataCheckMessage
Base class for all DataCheckMessages.
DataCheckError
DataCheckMessage subclass for errors returned by data checks.
DataCheckWarning
DataCheckMessage subclass for warnings returned by data checks.
DataCheckMessageType
Enum for type of data check message: WARNING or ERROR.
DataCheckMessageCode
Enum for data check message code.
import_or_raise
Attempts to import the requested library by name.
convert_to_seconds
Converts a string describing a length of time to its length in seconds.
get_random_state
Generates a numpy.random.RandomState instance using seed.
get_random_seed
Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator.
pad_with_nans
Pad the beginning num_to_pad rows with nans.
drop_rows_with_nans
Drop rows that have any NaNs in all dataframes or series.
infer_feature_types
Create a Woodwork structure from the given pandas or numpy input, with specified types for columns.
save_plot
Saves fig to filepath if specified, or to a default location if not.