# API Reference#

## Demo Datasets#

 load_breast_cancer Load breast cancer dataset. Binary classification problem. load_diabetes Load diabetes dataset. Used for regression problem. load_fraud Load credit card fraud dataset. load_wine Load wine dataset. Multiclass problem. load_churn Load churn dataset, which can be used for binary classification problems.

## Preprocessing#

Utilities to preprocess data before using evalml.

 load_data Load features and target from file. target_distribution Get the target distributions. number_of_features Get the number of features of each specific dtype in a DataFrame. split_data Split data into train and test sets.

## Exceptions#

 MethodPropertyNotFoundError Exception to raise when a class is does not have an expected method or property. PipelineNotFoundError An exception raised when a particular pipeline is not found in automl search results. ObjectiveNotFoundError Exception to raise when specified objective does not exist. MissingComponentError An exception raised when a component is not found in all_components(). ComponentNotYetFittedError An exception to be raised when predict/predict_proba/transform is called on a component without fitting first. PipelineNotYetFittedError An exception to be raised when predict/predict_proba/transform is called on a pipeline without fitting first. AutoMLSearchException Exception raised when all pipelines in an automl batch return a score of NaN for the primary objective. PipelineScoreError An exception raised when a pipeline errors while scoring any objective in a list of objectives. DataCheckInitError Exception raised when a data check can't initialize with the parameters given. NullsInColumnWarning Warning thrown when there are null values in the column of interest.

## AutoML#

### AutoML Search Interface#

 AutoMLSearch Automated Pipeline search.

### AutoML Utils#

 search Given data and configuration, run an automl search. get_default_primary_search_objective Get the default primary search objective for a problem type. make_data_splitter Given the training data and ML problem parameters, compute a data splitting method to use during AutoML search.

### AutoML Algorithm Classes#

 AutoMLAlgorithm Base class for the AutoML algorithms which power EvalML. IterativeAlgorithm An automl algorithm which first fits a base round of pipelines with default parameters, then does a round of parameter tuning on each pipeline in order of performance.

### AutoML Callbacks#

 silent_error_callback No-op. log_error_callback Logs the exception thrown as an error. raise_error_callback Raises the exception thrown by the AutoMLSearch object.

### AutoML Engines#

 SequentialEngine The default engine for the AutoML search. CFEngine The concurrent.futures (CF) engine. DaskEngine The dask engine.

## Pipelines#

### Pipeline Base Classes#

 PipelineBase Machine learning pipeline. ClassificationPipeline Pipeline subclass for all classification pipelines. BinaryClassificationPipeline Pipeline subclass for all binary classification pipelines. MulticlassClassificationPipeline Pipeline subclass for all multiclass classification pipelines. RegressionPipeline Pipeline subclass for all regression pipelines. TimeSeriesClassificationPipeline Pipeline base class for time series classification problems. TimeSeriesBinaryClassificationPipeline Pipeline base class for time series binary classification problems. TimeSeriesMulticlassClassificationPipeline Pipeline base class for time series multiclass classification problems. TimeSeriesRegressionPipeline Pipeline base class for time series regression problems.

### Pipeline Utils#

 make_pipeline Given input data, target data, an estimator class and the problem type, generates a pipeline class with a preprocessing chain which was recommended based on the inputs. The pipeline will be a subclass of the appropriate pipeline base class for the specified problem_type. generate_pipeline_code Creates and returns a string that contains the Python imports and code required for running the EvalML pipeline. rows_of_interest Get the row indices of the data that are closest to the threshold. Works only for binary classification problems and pipelines.

## Component Graphs#

 ComponentGraph Component graph for a pipeline as a directed acyclic graph (DAG).

## Components#

### Component Base Classes#

Components represent a step in a pipeline.

 ComponentBase Base class for all components. Transformer A component that may or may not need fitting that transforms data. These components are used before an estimator. Estimator A component that fits and predicts given data.

### Component Utils#

 allowed_model_families List the model types allowed for a particular problem type. get_estimators Returns the estimators allowed for a particular problem type. generate_component_code Creates and returns a string that contains the Python imports and code required for running the EvalML component.

### Transformers#

Transformers are components that take in data as input and output transformed data.

 DropColumns Drops specified columns in input data. SelectColumns Selects specified columns in input data. SelectByType Selects columns by specified Woodwork logical type or semantic tag in input data. OneHotEncoder A transformer that encodes categorical features in a one-hot numeric array. TargetEncoder A transformer that encodes categorical features into target encodings. PerColumnImputer Imputes missing data according to a specified imputation strategy per column. Imputer Imputes missing data according to a specified imputation strategy. SimpleImputer Imputes missing data according to a specified imputation strategy. Natural language columns are ignored. TimeSeriesImputer Imputes missing data according to a specified timeseries-specific imputation strategy. StandardScaler A transformer that standardizes input features by removing the mean and scaling to unit variance. RFRegressorSelectFromModel Selects top features based on importance weights using a Random Forest regressor. RFClassifierSelectFromModel Selects top features based on importance weights using a Random Forest classifier. DropNullColumns Transformer to drop features whose percentage of NaN values exceeds a specified threshold. DateTimeFeaturizer Transformer that can automatically extract features from datetime columns. NaturalLanguageFeaturizer Transformer that can automatically featurize text columns using featuretools' nlp_primitives. TimeSeriesFeaturizer Transformer that delays input features and target variable for time series problems. TimeSeriesRegularizer Transformer that regularizes an inconsistently spaced datetime column. DFSTransformer Featuretools DFS component that generates features for the input features. PolynomialDetrender Removes trends from time series by fitting a polynomial to the data. Undersampler Initializes an undersampling transformer to downsample the majority classes in the dataset. Oversampler SMOTE Oversampler component. Will automatically select whether to use SMOTE, SMOTEN, or SMOTENC based on inputs to the component.

### Estimators#

#### Classifiers#

Classifiers are components that output a predicted class label.

 CatBoostClassifier CatBoost Classifier, a classifier that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features. ElasticNetClassifier Elastic Net Classifier. Uses Logistic Regression with elasticnet penalty as the base estimator. ExtraTreesClassifier Extra Trees Classifier. RandomForestClassifier Random Forest Classifier. LightGBMClassifier LightGBM Classifier. LogisticRegressionClassifier Logistic Regression Classifier. XGBoostClassifier XGBoost Classifier. BaselineClassifier Classifier that predicts using the specified strategy. StackedEnsembleClassifier Stacked Ensemble Classifier. DecisionTreeClassifier Decision Tree Classifier. KNeighborsClassifier K-Nearest Neighbors Classifier. SVMClassifier Support Vector Machine Classifier. VowpalWabbitBinaryClassifier Vowpal Wabbit Binary Classifier. VowpalWabbitMulticlassClassifier Vowpal Wabbit Multiclass Classifier.

#### Regressors#

Regressors are components that output a predicted target value.

 ARIMARegressor Autoregressive Integrated Moving Average Model. The three parameters (p, d, q) are the AR order, the degree of differencing, and the MA order. More information here: https://www.statsmodels.org/devel/generated/statsmodels.tsa.arima.model.ARIMA.html. CatBoostRegressor CatBoost Regressor, a regressor that uses gradient-boosting on decision trees. CatBoost is an open-source library and natively supports categorical features. ElasticNetRegressor Elastic Net Regressor. ExponentialSmoothingRegressor Holt-Winters Exponential Smoothing Forecaster. LinearRegressor Linear Regressor. ExtraTreesRegressor Extra Trees Regressor. RandomForestRegressor Random Forest Regressor. XGBoostRegressor XGBoost Regressor. BaselineRegressor Baseline regressor that uses a simple strategy to make predictions. This is useful as a simple baseline regressor to compare with other regressors. TimeSeriesBaselineEstimator Time series estimator that predicts using the naive forecasting approach. StackedEnsembleRegressor Stacked Ensemble Regressor. DecisionTreeRegressor Decision Tree Regressor. LightGBMRegressor LightGBM Regressor. SVMRegressor Support Vector Machine Regressor. VowpalWabbitRegressor Vowpal Wabbit Regressor.

## Model Understanding#

### Utility Methods#

 confusion_matrix Confusion matrix for binary and multiclass classification. normalize_confusion_matrix Normalizes a confusion matrix. precision_recall_curve Given labels and binary classifier predicted probabilities, compute and return the data representing a precision-recall curve. roc_curve Given labels and classifier predicted probabilities, compute and return the data representing a Receiver Operating Characteristic (ROC) curve. Works with binary or multiclass problems. calculate_permutation_importance Calculates permutation importance for features. calculate_permutation_importance_one_column Calculates permutation importance for one column in the original dataframe. binary_objective_vs_threshold Computes objective score as a function of potential binary classification decision thresholds for a fitted binary classification pipeline. get_prediction_vs_actual_over_time_data Get the data needed for the prediction_vs_actual_over_time plot. partial_dependence Calculates one or two-way partial dependence. get_prediction_vs_actual_data Combines y_true and y_pred into a single dataframe and adds a column for outliers. Used in graph_prediction_vs_actual(). get_linear_coefficients Returns a dataframe showing the features with the greatest predictive power for a linear model. t_sne Get the transformed output after fitting X to the embedded space using t-SNE. find_confusion_matrix_per_thresholds Gets the confusion matrix and histogram bins for each threshold as well as the best threshold per objective. Only works with Binary Classification Pipelines.

### Graph Utility Methods#

 graph_precision_recall_curve Generate and display a precision-recall plot. graph_roc_curve Generate and display a Receiver Operating Characteristic (ROC) plot for binary and multiclass classification problems. graph_confusion_matrix Generate and display a confusion matrix plot. graph_permutation_importance Generate a bar graph of the pipeline's permutation importance. graph_binary_objective_vs_threshold Generates a plot graphing objective score vs. decision thresholds for a fitted binary classification pipeline. graph_prediction_vs_actual Generate a scatter plot comparing the true and predicted values. Used for regression plotting. graph_prediction_vs_actual_over_time Plot the target values and predictions against time on the x-axis. graph_partial_dependence Create an one-way or two-way partial dependence plot. graph_t_sne Plot high dimensional data into lower dimensional space using t-SNE.

### Prediction Explanations#

 explain_predictions Creates a report summarizing the top contributing features for each data point in the input features. explain_predictions_best_worst Creates a report summarizing the top contributing features for the best and worst points in the dataset as measured by error to true labels.

## Objectives#

### Objective Base Classes#

 ObjectiveBase Base class for all objectives. BinaryClassificationObjective Base class for all binary classification objectives. MulticlassClassificationObjective Base class for all multiclass classification objectives. RegressionObjective Base class for all regression objectives.

### Domain-Specific Objectives#

 FraudCost Score the percentage of money lost of the total transaction amount process due to fraud. LeadScoring Lead scoring. CostBenefitMatrix Score using a cost-benefit matrix. Scores quantify the benefits of a given value, so greater numeric scores represents a better score. Costs and scores can be negative, indicating that a value is not beneficial. For example, in the case of monetary profit, a negative cost and/or score represents loss of cash flow.

### Classification Objectives#

 AccuracyBinary Accuracy score for binary classification. AccuracyMulticlass Accuracy score for multiclass classification. AUC AUC score for binary classification. AUCMacro AUC score for multiclass classification using macro averaging. AUCMicro AUC score for multiclass classification using micro averaging. AUCWeighted AUC Score for multiclass classification using weighted averaging. Gini Gini coefficient for binary classification. BalancedAccuracyBinary Balanced accuracy score for binary classification. BalancedAccuracyMulticlass Balanced accuracy score for multiclass classification. F1 F1 score for binary classification. F1Micro F1 score for multiclass classification using micro averaging. F1Macro F1 score for multiclass classification using macro averaging. F1Weighted F1 score for multiclass classification using weighted averaging. LogLossBinary Log Loss for binary classification. LogLossMulticlass Log Loss for multiclass classification. MCCBinary Matthews correlation coefficient for binary classification. MCCMulticlass Matthews correlation coefficient for multiclass classification. Precision Precision score for binary classification. PrecisionMicro Precision score for multiclass classification using micro averaging. PrecisionMacro Precision score for multiclass classification using macro-averaging. PrecisionWeighted Precision score for multiclass classification using weighted averaging. Recall Recall score for binary classification. RecallMicro Recall score for multiclass classification using micro averaging. RecallMacro Recall score for multiclass classification using macro averaging. RecallWeighted Recall score for multiclass classification using weighted averaging.

### Regression Objectives#

 R2 Coefficient of determination for regression. MAE Mean absolute error for regression. MAPE Mean absolute percentage error for time series regression. Scaled by 100 to return a percentage. MSE Mean squared error for regression. MeanSquaredLogError Mean squared log error for regression. MedianAE Median absolute error for regression. MaxError Maximum residual error for regression. ExpVariance Explained variance score for regression. RootMeanSquaredError Root mean squared error for regression. RootMeanSquaredLogError Root mean squared log error for regression.

### Objective Utils#

 get_all_objective_names Get a list of the names of all objectives. get_core_objectives Returns all core objective instances associated with the given problem type. get_core_objective_names Get a list of all valid core objectives. get_non_core_objectives Get non-core objective classes. get_objective Returns the Objective class corresponding to a given objective name.

## Problem Types#

 handle_problem_types Handles problem_type by either returning the ProblemTypes or converting from a str. detect_problem_type Determine the type of problem is being solved based on the targets (binary vs multiclass classification, regression). Ignores missing and null data. ProblemTypes Enum defining the supported types of machine learning problems.

## Model Family#

 handle_model_family Handles model_family by either returning the ModelFamily or converting from a string. ModelFamily Enum for family of machine learning models.

## Tuners#

 Tuner Base Tuner class. SKOptTuner Bayesian Optimizer. GridSearchTuner Grid Search Optimizer, which generates all of the possible points to search for using a grid. RandomSearchTuner Random Search Optimizer.

## Data Checks#

### Data Check Classes#

 DataCheck Base class for all data checks. InvalidTargetDataCheck Check if the target data is considered invalid. NullDataCheck Check if there are any highly-null numerical, boolean, categorical, natural language, and unknown columns and rows in the input. IDColumnsDataCheck Check if any of the features are likely to be ID columns. TargetLeakageDataCheck Check if any of the features are highly correlated with the target by using mutual information or Pearson correlation. OutliersDataCheck Checks if there are any outliers in input data by using IQR to determine score anomalies. NoVarianceDataCheck Check if the target or any of the features have no variance. ClassImbalanceDataCheck Check if any of the target labels are imbalanced, or if the number of values for each target are below 2 times the number of CV folds. Use for classification problems. MulticollinearityDataCheck Check if any set features are likely to be multicollinear. DateTimeFormatDataCheck Check if the datetime column has equally spaced intervals and is monotonically increasing or decreasing in order to be supported by time series estimators. TimeSeriesParametersDataCheck Checks whether the time series parameters are compatible with data splitting. TimeSeriesSplittingDataCheck Checks whether the time series target data is compatible with splitting. DataChecks A collection of data checks. DefaultDataChecks A collection of basic data checks that is used by AutoML by default.

### Data Check Messages#

 DataCheckMessage Base class for a message returned by a DataCheck, tagged by name. DataCheckError DataCheckMessage subclass for errors returned by data checks. DataCheckWarning DataCheckMessage subclass for warnings returned by data checks.

### Data Check Message Types#

 DataCheckMessageType Enum for type of data check message: WARNING or ERROR.

### Data Check Message Codes#

 DataCheckMessageCode Enum for data check message code.

## Utils#

### General Utils#

 import_or_raise Attempts to import the requested library by name. If the import fails, raises an ImportError or warning. convert_to_seconds Converts a string describing a length of time to its length in seconds. get_random_state Generates a numpy.random.RandomState instance using seed. get_random_seed Given a numpy.random.RandomState object, generate an int representing a seed value for another random number generator. Or, if given an int, return that int. pad_with_nans Pad the beginning num_to_pad rows with nans. drop_rows_with_nans Drop rows that have any NaNs in all dataframes or series. infer_feature_types Create a Woodwork structure from the given list, pandas, or numpy input, with specified types for columns. If a column's type is not specified, it will be inferred by Woodwork. save_plot Saves fig to filepath if specified, or to a default location if not. is_all_numeric Checks if the given DataFrame contains only numeric values. get_importable_subclasses Get importable subclasses of a base class. Used to list all of our estimators, transformers, components and pipelines dynamically.