load_fraud
Load credit card fraud dataset.
load_wine
Load wine dataset.
load_breast_cancer
Load breast cancer dataset.
load_diabetes
Load diabetes dataset.
Utilities to preprocess data before using evalml.
drop_nan_target_rows
Drops rows in X and y when row in the target y has a value of NaN.
label_distribution
Get the label distributions.
load_data
Load features and labels from file.
number_of_features
Get the number of features for specific dtypes.
split_data
Splits data into train and test sets.
AutoMLSearch
Automated Pipeline search.
AutoMLAlgorithm
Base class for the automl algorithms which power evalml.
IterativeAlgorithm
An automl algorithm which first fits a base round of pipelines with default parameters, then does a round of parameter tuning on each pipeline in order of performance.
PipelineBase
Base class for all pipelines.
ClassificationPipeline
Pipeline subclass for all classification pipelines.
BinaryClassificationPipeline
Pipeline subclass for all binary classification pipelines.
MulticlassClassificationPipeline
Pipeline subclass for all multiclass classification pipelines.
RegressionPipeline
Pipeline subclass for all regression pipelines.
BaselineBinaryPipeline
Baseline Pipeline for binary classification.
BaselineMulticlassPipeline
Baseline Pipeline for multiclass classification.
ModeBaselineBinaryPipeline
Mode Baseline Pipeline for binary classification.
ModeBaselineMulticlassPipeline
Mode Baseline Pipeline for multiclass classification.
BaselineRegressionPipeline
Baseline Pipeline for regression problems.
MeanBaselineRegressionPipeline
precision_recall_curve
Given labels and binary classifier predicted probabilities, compute and return the data representing a precision-recall curve.
graph_precision_recall_curve
Generate and display a precision-recall plot.
roc_curve
Given labels and classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve.
graph_roc_curve
Generate and display a Receiver Operating Characteristic (ROC) plot.
confusion_matrix
Confusion matrix for binary and multiclass classification.
normalize_confusion_matrix
Normalizes a confusion matrix.
graph_confusion_matrix
Generate and display a confusion matrix plot.
calculate_permutation_importance
Calculates permutation importance for features.
graph_permutation_importance
Generate a bar graph of the pipeline’s permutation importance.
get_estimators
Returns the estimators allowed for a particular problem type.
make_pipeline
Given input data, target data, an estimator class and the problem type,
Components represent a step in a pipeline.
ComponentBase
Base class for all components.
Transformer
A component that may or may not need fitting that transforms data.
Estimator
A component that fits and predicts given data.
Transformers are components that take in data as input and output transformed data.
DropColumns
Drops specified columns in input data.
SelectColumns
Selects specified columns in input data.
OneHotEncoder
One-hot encoder to encode non-numeric data.
PerColumnImputer
Imputes missing data according to a specified imputation strategy per column
SimpleImputer
Imputes missing data according to a specified imputation strategy.
StandardScaler
Standardize features: removes mean and scales to unit variance.
RFRegressorSelectFromModel
Selects top features based on importance weights using a Random Forest regressor.
RFClassifierSelectFromModel
Selects top features based on importance weights using a Random Forest classifier.
DropNullColumns
Transformer to drop features whose percentage of NaN values exceeds a specified threshold
DateTimeFeaturization
Transformer that can automatically featurize DateTime columns.
TextFeaturizer
Transformer that can automatically featurize text columns.
Classifiers are components that output a predicted class label.
CatBoostClassifier
CatBoost Classifier, a classifier that uses gradient-boosting on decision trees.
ElasticNetClassifier
Elastic Net Classifier.
ExtraTreesClassifier
Extra Trees Classifier.
RandomForestClassifier
Random Forest Classifier.
LogisticRegressionClassifier
Logistic Regression Classifier.
XGBoostClassifier
XGBoost Classifier.
BaselineClassifier
Classifier that predicts using the specified strategy.
Regressors are components that output a predicted target value.
CatBoostRegressor
CatBoost Regressor, a regressor that uses gradient-boosting on decision trees.
ElasticNetRegressor
Elastic Net Regressor.
LinearRegressor
Linear Regressor.
ExtraTreesRegressor
Extra Trees Regressor.
RandomForestRegressor
Random Forest Regressor.
XGBoostRegressor
XGBoost Regressor.
BaselineRegressor
Regressor that predicts using the specified strategy.
ObjectiveBase
Base class for all objectives.
BinaryClassificationObjective
Base class for all binary classification objectives.
MulticlassClassificationObjective
Base class for all multiclass classification objectives.
RegressionObjective
Base class for all regression objectives.
FraudCost
Score the percentage of money lost of the total transaction amount process due to fraud.
LeadScoring
Lead scoring.
AccuracyBinary
Accuracy score for binary classification.
AccuracyMulticlass
Accuracy score for multiclass classification.
AUC
AUC score for binary classification.
AUCMacro
AUC score for multiclass classification using macro averaging.
AUCMicro
AUC score for multiclass classification using micro averaging.
AUCWeighted
AUC Score for multiclass classification using weighted averaging.
BalancedAccuracyBinary
Balanced accuracy score for binary classification.
BalancedAccuracyMulticlass
Balanced accuracy score for multiclass classification.
F1
F1 score for binary classification.
F1Micro
F1 score for multiclass classification using micro averaging.
F1Macro
F1 score for multiclass classification using macro averaging.
F1Weighted
F1 score for multiclass classification using weighted averaging.
LogLossBinary
Log Loss for binary classification.
LogLossMulticlass
Log Loss for multiclass classification.
MCCBinary
Matthews correlation coefficient for binary classification.
MCCMulticlass
Matthews correlation coefficient for multiclass classification.
Precision
Precision score for binary classification.
PrecisionMicro
Precision score for multiclass classification using micro averaging.
PrecisionMacro
Precision score for multiclass classification using macro averaging.
PrecisionWeighted
Precision score for multiclass classification using weighted averaging.
Recall
Recall score for binary classification.
RecallMicro
Recall score for multiclass classification using micro averaging.
RecallMacro
Recall score for multiclass classification using macro averaging.
RecallWeighted
Recall score for multiclass classification using weighted averaging.
R2
Coefficient of determination for regression.
MAE
Mean absolute error for regression.
MSE
Mean squared error for regression.
MeanSquaredLogError
Mean squared log error for regression.
MedianAE
Median absolute error for regression.
MaxError
Maximum residual error for regression.
ExpVariance
Explained variance score for regression.
RootMeanSquaredError
Root mean squared error for regression.
RootMeanSquaredLogError
Root mean squared log error for regression.
ProblemTypes
Enum for type of machine learning problem: BINARY, MULTICLASS, or REGRESSION.
handle_problem_types
Handles problem_type by either returning the ProblemTypes or converting from a str.
ModelFamily
Enum for family of machine learning models.
handle_model_family
Handles model_family by either returning the ModelFamily or converting from a str :param model_family: model type that needs to be handled :type model_family: str or ModelFamily
list_model_families
List model type for a particular problem type.
Tuner
Defines API for Tuners.
SKOptTuner
Bayesian Optimizer.
GridSearchTuner
Grid Search Optimizer.
RandomSearchTuner
Random Search Optimizer.
DataCheck
Base class for all data checks.
InvalidTargetDataCheck
Checks if the target labels contain missing or invalid data.
HighlyNullDataCheck
Checks if there are any highly-null columns in the input.
IDColumnsDataCheck
Check if any of the features are likely to be ID columns.
LabelLeakageDataCheck
Check if any of the features are highly correlated with the target.
OutliersDataCheck
Checks if there are any outliers in input data by using an Isolation Forest to obtain the anomaly score of each index and then using IQR to determine score anomalies.
NoVarianceDataCheck
Check if any of the features or labels have no variance.
DataChecks
A collection of data checks.
DefaultDataChecks
A collection of basic data checks that is used by AutoML by default.
DataCheckMessage
Base class for all DataCheckMessages.
DataCheckError
DataCheckMessage subclass for errors returned by data checks.
DataCheckWarning
DataCheckMessage subclass for warnings returned by data checks.
DataCheckMessageType
Enum for type of data check message: WARNING or ERROR.
import_or_raise
Attempts to import the requested library by name.
convert_to_seconds
Converts a string describing a length of time to its length in seconds.
get_random_state
Generates a numpy.random.RandomState instance using seed.
get_random_seed
Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator.