API Reference¶

Demo Datasets¶

`load_fraud`	Load credit card fraud dataset.
`load_wine`	Load wine dataset.
`load_breast_cancer`	Load breast cancer dataset.
`load_diabetes`	Load diabetes dataset.
`load_churn`	Load credit card fraud dataset.

Preprocessing¶

Utilities to preprocess data before using evalml.

`load_data`	Load features and target from file.
`drop_nan_target_rows`	Drops rows in X and y when row in the target y has a value of NaN.
`target_distribution`	Get the target distributions.
`number_of_features`	Get the number of features of each specific dtype in a DataFrame.
`split_data`	Splits data into train and test sets.

Data Splitter Classes¶

Data splitter classes for imbalanced classification datasets.

`BalancedClassificationDataTVSplit`	Data splitter for generating training and validation split using Balanced Classification Data Sampler.
`BalancedClassificationDataCVSplit`	Data splitter for generating k-fold cross-validation split using Balanced Classification Data Sampler.
`KMeansSMOTECVSplit`	Splits the data into KFold cross validation sets and balances the training data using K-Means SMOTE.
`KMeansSMOTETVSplit`	Splits the data into training and validation sets and balances the training data using K-Means SMOTE.
`SMOTETomekCVSplit`	Splits the data into KFold cross validation sets and uses SMOTE + Tomek links to balance the training data.
`SMOTETomekTVSplit`	Splits the data into training and validation sets and uses SMOTE + Tomek links balance the training data.
`RandomUnderSamplerCVSplit`	Splits the training data into KFold cross validation sets and uses RandomUnderSampler to balance the training data.
`RandomUnderSamplerTVSplit`	Splits the data into training and validation sets and uses RandomUnderSampler to balance the training data.
`SMOTENCCVSplit`	Splits the data into KFold cross validation sets and uses SMOTENC to balance the training data.
`SMOTENCTVSplit`	Splits the data into training and validation sets and uses SMOTENC to balance the training data.

Exceptions¶

`MethodPropertyNotFoundError`	Exception to raise when a class is does not have an expected method or property.
`PipelineNotFoundError`	An exception raised when a particular pipeline is not found in automl search results
`ObjectiveNotFoundError`	Exception to raise when specified objective does not exist.
`IllFormattedClassNameError`	Exception to raise when a class name does not comply with EvalML standards
`MissingComponentError`	An exception raised when a component is not found in all_components()
`ComponentNotYetFittedError`	An exception to be raised when predict/predict_proba/transform is called on a component without fitting first.
`PipelineNotYetFittedError`	An exception to be raised when predict/predict_proba/transform is called on a pipeline without fitting first.
`AutoMLSearchException`	Exception raised when all pipelines in an automl batch return a score of NaN for the primary objective.
`EnsembleMissingPipelinesError`	An exception raised when an ensemble is missing estimators (list) as a parameter.
`PipelineScoreError`	An exception raised when a pipeline errors while scoring any objective in a list of objectives.
`DataCheckInitError`	Exception raised when a data check can’t initialize with the parameters given.
`NullsInColumnWarning`	Warning thrown when there are null values in the column of interest

AutoML¶

AutoML Search Classes¶

AutoMLSearch

Automated Pipeline search.

AutoML Utils¶

`get_default_primary_search_objective`	Get the default primary search objective for a problem type.
`make_data_splitter`	Given the training data and ML problem parameters, compute a data splitting method to use during AutoML search.

AutoML Algorithm Classes¶

`AutoMLAlgorithm`	Base class for the automl algorithms which power evalml.
`IterativeAlgorithm`	An automl algorithm which first fits a base round of pipelines with default parameters, then does a round of parameter tuning on each pipeline in order of performance.

AutoML Callbacks¶

`silent_error_callback`	No-op.
`log_error_callback`	Logs the exception thrown as an error.
`raise_error_callback`	Raises the exception thrown by the AutoMLSearch object.

Pipelines¶

Pipeline Base Classes¶

`PipelineBase`	Base class for all pipelines.
`ClassificationPipeline`	Pipeline subclass for all classification pipelines.
`BinaryClassificationPipeline`	Pipeline subclass for all binary classification pipelines.
`MulticlassClassificationPipeline`	Pipeline subclass for all multiclass classification pipelines.
`RegressionPipeline`	Pipeline subclass for all regression pipelines.
`TimeSeriesClassificationPipeline`	Pipeline base class for time series classifcation problems.
`TimeSeriesBinaryClassificationPipeline`
`TimeSeriesMulticlassClassificationPipeline`
`TimeSeriesRegressionPipeline`	Pipeline base class for time series regression problems.

Classification Pipelines¶

`BaselineBinaryPipeline`	Baseline Pipeline for binary classification.
`BaselineMulticlassPipeline`	Baseline Pipeline for multiclass classification.
`ModeBaselineBinaryPipeline`	Mode Baseline Pipeline for binary classification.
`ModeBaselineMulticlassPipeline`	Mode Baseline Pipeline for multiclass classification.

Regression Pipelines¶

`BaselineRegressionPipeline`	Baseline Pipeline for regression problems.
`MeanBaselineRegressionPipeline`	Baseline Pipeline for regression problems.
`TimeSeriesBaselineRegressionPipeline`	Baseline Pipeline for time series regression problems.

Pipeline Utils¶

`make_pipeline`	Given input data, target data, an estimator class and the problem type,
`make_pipeline_from_components`	Given a list of component instances and the problem type, an pipeline instance is generated with the component instances.
`generate_pipeline_code`	Creates and returns a string that contains the Python imports and code required for running the EvalML pipeline.

Components¶

Component Base Classes¶

Components represent a step in a pipeline.

`ComponentBase`	Base class for all components.
`Transformer`	A component that may or may not need fitting that transforms data.
`Estimator`	A component that fits and predicts given data.

Component Utils¶

`allowed_model_families`	List the model types allowed for a particular problem type.
`get_estimators`	Returns the estimators allowed for a particular problem type.
`generate_component_code`	Creates and returns a string that contains the Python imports and code required for running the EvalML component.

Transformers¶

Transformers are components that take in data as input and output transformed data.

`DropColumns`	Drops specified columns in input data.
`SelectColumns`	Selects specified columns in input data.
`OneHotEncoder`	One-hot encoder to encode non-numeric data.
`TargetEncoder`	Target encoder to encode categorical data
`PerColumnImputer`	Imputes missing data according to a specified imputation strategy per column
`Imputer`	Imputes missing data according to a specified imputation strategy.
`SimpleImputer`	Imputes missing data according to a specified imputation strategy.
`StandardScaler`	Standardize features: removes mean and scales to unit variance.
`RFRegressorSelectFromModel`	Selects top features based on importance weights using a Random Forest regressor.
`RFClassifierSelectFromModel`	Selects top features based on importance weights using a Random Forest classifier.
`DropNullColumns`	Transformer to drop features whose percentage of NaN values exceeds a specified threshold
`DateTimeFeaturizer`	Transformer that can automatically featurize DateTime columns.
`TextFeaturizer`	Transformer that can automatically featurize text columns.
`DelayedFeatureTransformer`	Transformer that delayes input features and target variable for time series problems.
`DFSTransformer`	Featuretools DFS component that generates features for ww.DataTables and pd.DataFrames
`PolynomialDetrender`	Removes trends from time series by fitting a polynomial to the data.
`Undersampler`	Random undersampler component.

Estimators¶

Classifiers¶

Classifiers are components that output a predicted class label.

`CatBoostClassifier`	CatBoost Classifier, a classifier that uses gradient-boosting on decision trees.
`ElasticNetClassifier`	Elastic Net Classifier.
`ExtraTreesClassifier`	Extra Trees Classifier.
`RandomForestClassifier`	Random Forest Classifier.
`LightGBMClassifier`	LightGBM Classifier
`LogisticRegressionClassifier`	Logistic Regression Classifier.
`XGBoostClassifier`	XGBoost Classifier.
`BaselineClassifier`	Classifier that predicts using the specified strategy.
`StackedEnsembleClassifier`	Stacked Ensemble Classifier.
`DecisionTreeClassifier`	Decision Tree Classifier.
`KNeighborsClassifier`	K-Nearest Neighbors Classifier.
`SVMClassifier`	Support Vector Machine Classifier.

Regressors¶

Regressors are components that output a predicted target value.

`ARIMARegressor`	Autoregressive Integrated Moving Average Model.
`CatBoostRegressor`	CatBoost Regressor, a regressor that uses gradient-boosting on decision trees.
`ElasticNetRegressor`	Elastic Net Regressor.
`LinearRegressor`	Linear Regressor.
`ExtraTreesRegressor`	Extra Trees Regressor.
`RandomForestRegressor`	Random Forest Regressor.
`XGBoostRegressor`	XGBoost Regressor.
`BaselineRegressor`	Regressor that predicts using the specified strategy.
`TimeSeriesBaselineEstimator`	Time series estimator that predicts using the naive forecasting approach.
`StackedEnsembleRegressor`	Stacked Ensemble Regressor.
`DecisionTreeRegressor`	Decision Tree Regressor.
`LightGBMRegressor`	LightGBM Regressor
`SVMRegressor`	Support Vector Machine Regressor.

Model Understanding¶

Utility Methods¶

`confusion_matrix`	Confusion matrix for binary and multiclass classification.
`normalize_confusion_matrix`	Normalizes a confusion matrix.
`precision_recall_curve`	Given labels and binary classifier predicted probabilities, compute and return the data representing a precision-recall curve.
`roc_curve`	Given labels and classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve.
`calculate_permutation_importance`	Calculates permutation importance for features.
`binary_objective_vs_threshold`	Computes objective score as a function of potential binary classification
`get_prediction_vs_actual_over_time_data`	Get the data needed for the prediction_vs_actual_over_time plot.
`partial_dependence`	Calculates one or two-way partial dependence.
`get_prediction_vs_actual_data`	Combines y_true and y_pred into a single dataframe and adds a column for outliers.
`get_linear_coefficients`	Returns a dataframe showing the features with the greatest predictive power for a linear model.
`t_sne`	Get the transformed output after fitting X to the embedded space using t-SNE.

Graph Utility Methods¶

`graph_precision_recall_curve`	Generate and display a precision-recall plot.
`graph_roc_curve`	Generate and display a Receiver Operating Characteristic (ROC) plot for binary and multiclass classification problems.
`graph_confusion_matrix`	Generate and display a confusion matrix plot.
`graph_permutation_importance`	Generate a bar graph of the pipeline’s permutation importance.
`graph_binary_objective_vs_threshold`	Generates a plot graphing objective score vs.
`graph_prediction_vs_actual`	Generate a scatter plot comparing the true and predicted values.
`graph_prediction_vs_actual_over_time`	Plot the target values and predictions against time on the x-axis.
`graph_partial_dependence`	Create an one-way or two-way partial dependence plot.
`graph_t_sne`	Plot high dimensional data into lower dimensional space using t-SNE .

Prediction Explanations¶

`explain_predictions`	Creates a report summarizing the top contributing features for each data point in the input features.
`explain_predictions_best_worst`	Creates a report summarizing the top contributing features for the best and worst points in the dataset as measured by error to true labels.

Objective Functions¶

Objective Base Classes¶

`ObjectiveBase`	Base class for all objectives.
`BinaryClassificationObjective`	Base class for all binary classification objectives.
`MulticlassClassificationObjective`	Base class for all multiclass classification objectives.
`RegressionObjective`	Base class for all regression objectives.

Domain-Specific Objectives¶

`FraudCost`	Score the percentage of money lost of the total transaction amount process due to fraud.
`LeadScoring`	Lead scoring.
`CostBenefitMatrix`	Score using a cost-benefit matrix.

Classification Objectives¶

`AccuracyBinary`	Accuracy score for binary classification.
`AccuracyMulticlass`	Accuracy score for multiclass classification.
`AUC`	AUC score for binary classification.
`AUCMacro`	AUC score for multiclass classification using macro averaging.
`AUCMicro`	AUC score for multiclass classification using micro averaging.
`AUCWeighted`	AUC Score for multiclass classification using weighted averaging.
`BalancedAccuracyBinary`	Balanced accuracy score for binary classification.
`BalancedAccuracyMulticlass`	Balanced accuracy score for multiclass classification.
`F1`	F1 score for binary classification.
`F1Micro`	F1 score for multiclass classification using micro averaging.
`F1Macro`	F1 score for multiclass classification using macro averaging.
`F1Weighted`	F1 score for multiclass classification using weighted averaging.
`LogLossBinary`	Log Loss for binary classification.
`LogLossMulticlass`	Log Loss for multiclass classification.
`MCCBinary`	Matthews correlation coefficient for binary classification.
`MCCMulticlass`	Matthews correlation coefficient for multiclass classification.
`Precision`	Precision score for binary classification.
`PrecisionMicro`	Precision score for multiclass classification using micro averaging.
`PrecisionMacro`	Precision score for multiclass classification using macro averaging.
`PrecisionWeighted`	Precision score for multiclass classification using weighted averaging.
`Recall`	Recall score for binary classification.
`RecallMicro`	Recall score for multiclass classification using micro averaging.
`RecallMacro`	Recall score for multiclass classification using macro averaging.
`RecallWeighted`	Recall score for multiclass classification using weighted averaging.

Regression Objectives¶

`R2`	Coefficient of determination for regression.
`MAE`	Mean absolute error for regression.
`MAPE`	Mean absolute percentage error for time series regression.
`MSE`	Mean squared error for regression.
`MeanSquaredLogError`	Mean squared log error for regression.
`MedianAE`	Median absolute error for regression.
`MaxError`	Maximum residual error for regression.
`ExpVariance`	Explained variance score for regression.
`RootMeanSquaredError`	Root mean squared error for regression.
`RootMeanSquaredLogError`	Root mean squared log error for regression.

Objective Utils¶

`get_all_objective_names`	Get a list of the names of all objectives.
`get_core_objectives`	Returns all core objective instances associated with the given problem type.
`get_core_objective_names`	Get a list of all valid core objectives.
`get_non_core_objectives`	Get non-core objective classes.
`get_objective`	Returns the Objective class corresponding to a given objective name.

Problem Types¶

`handle_problem_types`	Handles problem_type by either returning the ProblemTypes or converting from a str.
`detect_problem_type`	Determine the type of problem is being solved based on the targets (binary vs multiclass classification, regression)
`ProblemTypes`	Enum defining the supported types of machine learning problems.

Model Family¶

`handle_model_family`	Handles model_family by either returning the ModelFamily or converting from a string
`ModelFamily`	Enum for family of machine learning models.

Tuners¶

`Tuner`	Defines API for Tuners.
`SKOptTuner`	Bayesian Optimizer.
`GridSearchTuner`	Grid Search Optimizer.
`RandomSearchTuner`	Random Search Optimizer.

Data Checks¶

Data Check Classes¶

`DataCheck`	Base class for all data checks.
`InvalidTargetDataCheck`	Checks if the target data contains missing or invalid values.
`HighlyNullDataCheck`	Checks if there are any highly-null columns in the input.
`IDColumnsDataCheck`	Check if any of the features are likely to be ID columns.
`TargetLeakageDataCheck`	Check if any of the features are highly correlated with the target by using mutual information or Pearson correlation.
`OutliersDataCheck`	Checks if there are any outliers in input data by using IQR to determine score anomalies.
`NoVarianceDataCheck`	Check if the target or any of the features have no variance.
`ClassImbalanceDataCheck`	Checks if any target labels are imbalanced beyond a threshold.
`MulticollinearityDataCheck`	Check if any set features are likely to be multicollinear.
`DateTimeNaNDataCheck`	Checks if datetime columns contain NaN values.
`NaturalLanguageNaNDataCheck`	Checks if natural language columns contain NaN values.

`DataChecks`	A collection of data checks.
`DefaultDataChecks`	A collection of basic data checks that is used by AutoML by default.

Data Check Messages¶

`DataCheckMessage`	Base class for all DataCheckMessages.
`DataCheckError`	DataCheckMessage subclass for errors returned by data checks.
`DataCheckWarning`	DataCheckMessage subclass for warnings returned by data checks.

Data Check Message Types¶

DataCheckMessageType

Enum for type of data check message: WARNING or ERROR.

Data Check Message Codes¶

DataCheckMessageCode

Enum for data check message code.

Utils¶

General Utils¶

`import_or_raise`	Attempts to import the requested library by name.
`convert_to_seconds`	Converts a string describing a length of time to its length in seconds.
`get_random_state`	Generates a numpy.random.RandomState instance using seed.
`get_random_seed`	Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator.
`pad_with_nans`	Pad the beginning num_to_pad rows with nans.
`drop_rows_with_nans`	Drop rows that have any NaNs in all dataframes or series.
`infer_feature_types`	Create a Woodwork structure from the given list, pandas, or numpy input, with specified types for columns.
`save_plot`	Saves fig to filepath if specified, or to a default location if not.
`is_all_numeric`	Checks if the given DataTable contains only numeric values
`get_importable_subclasses`	Get importable subclasses of a base class.

FAQ evalml.demos.load_fraud