FAQ

Q: What is the difference between EvalML and other AutoML libraries?

EvalML optimizes machine learning pipelines on custom practical objectives instead of vague machine learning loss functions so that it will find the best pipelines for your specific needs. Furthermore, EvalML pipelines are able to take in all kinds of data (missing values, categorical, etc.) as long as the data are in a single table. EvalML also allows you to build your own pipelines with existing or custom components so you can have more control over the AutoML process. Moreover, EvalML also provides you with support in the form of data checks to ensure that you are aware of potential issues your data may cause with machine learning algorithms.

Q: How does EvalML handle missing values?

EvalML contains imputation components in its pipelines so that missing values are taken care of. EvalML optimizes over different types of imputation to search for the best possible pipeline. You can find more information about components here and in the API reference here.

Q: How does EvalML handle categorical encoding?

EvalML provides a one-hot-encoding component in its pipelines for categorical variables. EvalML plans to support other encoders in the future.

Q: How does EvalML handle feature selection?

EvalML currently utilizes scikit-learn’s SelectFromModel with a Random Forest classifier/regressor to handle feature selection. EvalML plans on supporting more feature selectors in the future. You can find more information in the API reference here.

Q: How is feature importance calculated?

Feature importance depends on the estimator used. Variable coefficients are used for regression-based estimators (Logistic Regression and Linear Regression) and Gini importance is used for tree-based estimators (Random Forest and XGBoost).

Q: How does hyperparameter tuning work?

EvalML tunes hyperparameters for its pipelines through Bayesian optimization. In the future we plan to support more optimization techniques such as random search.

Q: Can I create my own objective metric?

Yes you can! You can create your own custom objective so that EvalML optimizes the best model for your needs.

Q: How does EvalML avoid overfitting?

EvalML provides data checks to combat overfitting. Such data checks include detecting label leakage, unstable pipelines, hold-out datasets and cross validation. EvalML defaults to using Stratified K-Fold cross-validation for classification problems and K-Fold cross-validation for regression problems but allows you to utilize your own cross-validation methods as well.

Q: Can I create my own pipeline for EvalML?

Yes! EvalML allows you to create custom pipelines using modular components. This allows you to customize EvalML pipelines for your own needs or for AutoML.

Q: Does EvalML work with X algorithm?

EvalML is constantly improving and adding new components and will allow your own algorithms to be used as components in our pipelines.