Changelog

Future Releases
  • Enhancements

  • Fixes

  • Changes

  • Documentation Changes

  • Testing Changes

v0.10.0 May 29, 2020
  • Enhancements
    • Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML #746

    • Port over highly-null guardrail as a data check and define DefaultDataChecks and DisableDataChecks classes #745

    • Update Tuner classes to work directly with pipeline parameters dicts instead of flat parameter lists #779

    • Add Elastic Net as a pipeline option #812

    • Added new Pipeline option ExtraTrees #790

    • Added precicion-recall curve metrics and plot for binary classification problems in evalml.pipeline.graph_utils #794

    • Update the default automl algorithm to search in batches, starting with default parameters for each pipeline and iterating from there #793

    • Added AutoMLAlgorithm class and IterativeAlgorithm impl, separated from AutoSearchBase #793

  • Fixes
    • Update pipeline score to return nan score for any objective which throws an exception during scoring #787

    • Fixed bug introduced in #787 where binary classification metrics requiring predicted probabilities error in scoring #798

    • CatBoost and XGBoost classifiers and regressors can no longer have a learning rate of 0 #795

  • Changes
    • Cleanup pipeline score code, and cleanup codecov #711

    • Remove pass for abstract methods for codecov #730

    • Added __str__ for AutoSearch object #675

    • Add util methods to graph ROC and confusion matrix #720

    • Refactor AutoBase to AutoSearchBase #758

    • Updated AutoBase with data_checks parameter, removed previous detect_label_leakage parameter, and added functionality to run data checks before search in AutoML #765

    • Updated our logger to use Python’s logging utils #763

    • Refactor most of AutoSearchBase._do_iteration impl into AutoSearchBase._evaluate #762

    • Port over all guardrails to use the new DataCheck API #789

    • Expanded import_or_raise to catch all exceptions #759

    • Adds RMSE, MSLE, RMSLE as standard metrics #788

    • Don’t allow Recall to be used as an objective for AutoML #784

    • Removed feature selection from pipelines #819

    • Update default estimator parameters to make automl search faster and more accurate #793

  • Documentation Changes
    • Add instructions to freeze master on release.md #726

    • Update release instructions with more details #727 #733

    • Add objective base classes to API reference #736

    • Fix components API to match other modules #747

  • Testing Changes
    • Delete codecov yml, use codecov.io’s default #732

    • Added unit tests for fraud cost, lead scoring, and standard metric objectives #741

    • Update codecov client #782

    • Updated AutoBase __str__ test to include no parameters case #783

    • Added unit tests for ExtraTrees pipeline #790

    • If codecov fails to upload, fail build #810

    • Updated Python version of dependency action #816

    • Update the dependency update bot to use a suffix when creating branches #817

Warning

Breaking Changes
  • The detect_label_leakage parameter for AutoML classes has been removed and replaced by a data_checks parameter #765

  • Moved ROC and confusion matrix methods from evalml.pipeline.plot_utils to evalml.pipeline.graph_utils #720

  • Tuner classes require a pipeline hyperparameter range dict as an init arg instead of a space definition #779

  • Tuner.propose and Tuner.add work directly with pipeline parameters dicts instead of flat parameter lists #779

  • PipelineBase.hyperparameters and custom_hyperparameters use pipeline parameters dict format instead of being represented as a flat list #779

  • All guardrail functions previously under evalml.guardrails.utils will be removed and replaced by data checks #789

  • Recall disallowed as an objective for AutoML #784

  • AutoSearchBase parameter tuner has been renamed to tuner_class #793

  • AutoSearchBase parameter possible_pipelines and possible_model_families have been renamed to allowed_pipelines and allowed_model_families #793

v0.9.0 Apr. 27, 2020
  • Enhancements
    • Added accuracy as an standard objective #624

    • Added verbose parameter to load_fraud #560

    • Added Balanced Accuracy metric for binary, multiclass #612 #661

    • Added XGBoost regressor and XGBoost regression pipeline #666

    • Added Accuracy metric for multiclass #672

    • Added objective name in AutoBase.describe_pipeline #686

    • Added DataCheck and DataChecks, Message classes and relevant subclasses #739

  • Fixes
    • Removed direct access to cls.component_graph #595

    • Add testing files to .gitignore #625

    • Remove circular dependencies from Makefile #637

    • Add error case for normalize_confusion_matrix() #640

    • Fixed XGBoostClassifier and XGBoostRegressor bug with feature names that contain [, ], or < #659

    • Update make_pipeline_graph to not accidentally create empty file when testing if path is valid #649

    • Fix pip installation warning about docsutils version, from boto dependency #664

    • Removed zero division warning for F1/precision/recall metrics #671

    • Fixed summary for pipelines without estimators #707

  • Changes
    • Updated default objective for binary/multiseries classification to log loss #613

    • Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes #405

    • Changed the output of score to return one dictionary #429

    • Created binary and multiclass objective subclasses #504

    • Updated objectives API #445

    • Removed call to get_plot_data from AutoML #615

    • Set raise_error to default to True for AutoML classes #638

    • Remove unnecessary “u” prefixes on some unicode strings #641

    • Changed one-hot encoder to return uint8 dtypes instead of ints #653

    • Pipeline _name field changed to custom_name #650

    • Removed graphs.py and moved methods into PipelineBase #657, #665

    • Remove s3fs as a dev dependency #664

    • Changed requirements-parser to be a core dependency #673

    • Replace supported_problem_types field on pipelines with problem_type attribute on base classes #678

    • Changed AutoML to only show best results for a given pipeline template in rankings, added full_rankings property to show all #682

    • Update ModelFamily values: don’t list xgboost/catboost as classifiers now that we have regression pipelines for them #677

    • Changed AutoML’s describe_pipeline to get problem type from pipeline instead #685

    • Standardize import_or_raise error messages #683

    • Updated argument order of objectives to align with sklearn’s #698

    • Renamed pipeline.feature_importance_graph to pipeline.graph_feature_importances #700

    • Moved ROC and confusion matrix methods to evalml.pipelines.plot_utils #704

    • Renamed MultiClassificationObjective to MulticlassClassificationObjective, to align with pipeline naming scheme #715

  • Documentation Changes
    • Fixed some sphinx warnings #593

    • Fixed docstring for AutoClassificationSearch with correct command #599

    • Limit readthedocs formats to pdf, not htmlzip and epub #594 #600

    • Clean up objectives API documentation #605

    • Fixed function on Exploring search results page #604

    • Update release process doc #567

    • AutoClassificationSearch and AutoRegressionSearch show inherited methods in API reference #651

    • Fixed improperly formatted code in breaking changes for changelog #655

    • Added configuration to treat Sphinx warnings as errors #660

    • Removed separate plotting section for pipelines in API reference #657, #665

    • Have leads example notebook load S3 files using https, so we can delete s3fs dev dependency #664

    • Categorized components in API reference and added descriptions for each category #663

    • Fixed Sphinx warnings about BalancedAccuracy objective #669

    • Updated API reference to include missing components and clean up pipeline docstrings #689

    • Reorganize API ref, and clarify pipeline sub-titles #688

    • Add and update preprocessing utils in API reference #687

    • Added inheritance diagrams to API reference #695

    • Documented which default objective AutoML optimizes for #699

    • Create seperate install page #701

    • Include more utils in API ref, like import_or_raise #704

    • Add more color to pipeline documentation #705

  • Testing Changes
    • Matched install commands of check_latest_dependencies test and it’s GitHub action #578

    • Added Github app to auto assign PR author as assignee #477

    • Removed unneeded conda installation of xgboost in windows checkin tests #618

    • Update graph tests to always use tmpfile dir #649

    • Changelog checkin test workaround for release PRs: If ‘future release’ section is empty of PR refs, pass check #658

    • Add changelog checkin test exception for dep-update branch #723

Warning

Breaking Changes

  • Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.

  • fit() and predict() now use an optional objective parameter, which is only used in binary classification pipelines to fit for a specific objective.

  • score() will now use a required objectives parameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline’s objective was scored on regardless.

  • score() will now return one dictionary of all objective scores.

  • ROC and ConfusionMatrix plot methods via Auto(*).plot have been removed by #615 and are replaced by roc_curve and confusion_matrix in evamlm.pipelines.plot_utils` in #704

  • normalize_confusion_matrix has been moved to evalml.pipelines.plot_utils #704

  • Pipelines _name field changed to custom_name

  • Pipelines supported_problem_types field is removed because it is no longer necessary #678

  • Updated argument order of objectives’ objective_function to align with sklearn #698

  • pipeline.feature_importance_graph has been renamed to pipeline.graph_feature_importances in #700

  • Removed unsupported MSLE objective #704

v0.8.0 Apr. 1, 2020
  • Enhancements
    • Add normalization option and information to confusion matrix #484

    • Add util function to drop rows with NaN values #487

    • Renamed PipelineBase.name as PipelineBase.summary and redefined PipelineBase.name as class property #491

    • Added access to parameters in Pipelines with PipelineBase.parameters (used to be return of PipelineBase.describe) #501

    • Added fill_value parameter for SimpleImputer #509

    • Added functionality to override component hyperparameters and made pipelines take hyperparemeters from components #516

    • Allow numpy.random.RandomState for random_state parameters #556

  • Fixes
    • Removed unused dependency matplotlib, and move category_encoders to test reqs #572

  • Changes
    • Undo version cap in XGBoost placed in #402 and allowed all released of XGBoost #407

    • Support pandas 1.0.0 #486

    • Made all references to the logger static #503

    • Refactored model_type parameter for components and pipelines to model_family #507

    • Refactored problem_types for pipelines and components into supported_problem_types #515

    • Moved pipelines/utils.save_pipeline and pipelines/utils.load_pipeline to PipelineBase.save and PipelineBase.load #526

    • Limit number of categories encoded by OneHotEncoder #517

  • Documentation Changes
    • Updated API reference to remove PipelinePlot and added moved PipelineBase plotting methods #483

    • Add code style and github issue guides #463 #512

    • Updated API reference for to surface class variables for pipelines and components #537

    • Fixed README documentation link #535

    • Unhid PR references in changelog #656

  • Testing Changes
    • Added automated dependency check PR #482, #505

    • Updated automated dependency check comment #497

    • Have build_docs job use python executor, so that env vars are set properly #547

    • Added simple test to make sure OneHotEncoder’s top_n works with large number of categories #552

    • Run windows unit tests on PRs #557

Warning

Breaking Changes

  • AutoClassificationSearch and AutoRegressionSearch’s model_types parameter has been refactored into allowed_model_families

  • ModelTypes enum has been changed to ModelFamily

  • Components and Pipelines now have a model_family field instead of model_type

  • get_pipelines utility function now accepts model_families as an argument instead of model_types

  • PipelineBase.name no longer returns structure of pipeline and has been replaced by PipelineBase.summary

  • PipelineBase.problem_types and Estimator.problem_types has been renamed to supported_problem_types

  • pipelines/utils.save_pipeline and pipelines/utils.load_pipeline moved to PipelineBase.save and PipelineBase.load

v0.7.0 Mar. 9, 2020
  • Enhancements
    • Added emacs buffers to .gitignore #350

    • Add CatBoost (gradient-boosted trees) classification and regression components and pipelines #247

    • Added Tuner abstract base class #351

    • Added n_jobs as parameter for AutoClassificationSearch and AutoRegressionSearch #403

    • Changed colors of confusion matrix to shades of blue and updated axis order to match scikit-learn’s #426

    • Added PipelineBase graph and feature_importance_graph methods, moved from previous location #423

    • Added support for python 3.8 #462

  • Fixes
    • Fixed ROC and confusion matrix plots not being calculated if user passed own additional_objectives #276

    • Fixed ReadtheDocs FileNotFoundError exception for fraud dataset #439

  • Changes
    • Added n_estimators as a tunable parameter for XGBoost #307

    • Remove unused parameter ObjectiveBase.fit_needs_proba #320

    • Remove extraneous parameter component_type from all components #361

    • Remove unused rankings.csv file #397

    • Downloaded demo and test datasets so unit tests can run offline #408

    • Remove _needs_fitting attribute from Components #398

    • Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all #413

    • Refactored PipelineBase to take in parameter dictionary and moved pipeline metadata to class attribute #421

    • Dropped support for Python 3.5 #438

    • Removed unused apply.py file #449

    • Clean up requirements.txt to remove unused deps #451

    • Support installation without all required dependencies #459

  • Documentation Changes
    • Update release.md with instructions to release to internal license key #354

  • Testing Changes
    • Added tests for utils (and moved current utils to gen_utils) #297

    • Moved XGBoost install into it’s own separate step on Windows using Conda #313

    • Rewind pandas version to before 1.0.0, to diagnose test failures for that version #325

    • Added dependency update checkin test #324

    • Rewind XGBoost version to before 1.0.0 to diagnose test failures for that version #402

    • Update dependency check to use a whitelist #417

    • Update unit test jobs to not install dev deps #455

Warning

Breaking Changes

  • Python 3.5 will not be actively supported.

v0.6.0 Dec. 16, 2019
  • Enhancements
    • Added ability to create a plot of feature importances #133

    • Add early stopping to AutoML using patience and tolerance parameters #241

    • Added ROC and confusion matrix metrics and plot for classification problems and introduce PipelineSearchPlots class #242

    • Enhanced AutoML results with search order #260

    • Added utility function to show system and environment information #300

  • Fixes
    • Lower botocore requirement #235

    • Fixed decision_function calculation for FraudCost objective #254

    • Fixed return value of Recall metrics #264

    • Components return self on fit #289

  • Changes
    • Renamed automl classes to AutoRegressionSearch and AutoClassificationSearch #287

    • Updating demo datasets to retain column names #223

    • Moving pipeline visualization to PipelinePlots class #228

    • Standarizing inputs as pd.Dataframe / pd.Series #130

    • Enforcing that pipelines must have an estimator as last component #277

    • Added ipywidgets as a dependency in requirements.txt #278

    • Added Random and Grid Search Tuners #240

  • Documentation Changes
    • Adding class properties to API reference #244

    • Fix and filter FutureWarnings from scikit-learn #249, #257

    • Adding Linear Regression to API reference and cleaning up some Sphinx warnings #227

  • Testing Changes
    • Added support for testing on Windows with CircleCI #226

    • Added support for doctests #233

Warning

Breaking Changes

  • The fit() method for AutoClassifier and AutoRegressor has been renamed to search().

  • AutoClassifier has been renamed to AutoClassificationSearch

  • AutoRegressor has been renamed to AutoRegressionSearch

  • AutoClassificationSearch.results and AutoRegressionSearch.results now is a dictionary with pipeline_results and search_order keys. pipeline_results can be used to access a dictionary that is identical to the old .results dictionary. Whereas, search_order returns a list of the search order in terms of pipeline_id.

  • Pipelines now require an estimator as the last component in component_list. Slicing pipelines now throws an NotImplementedError to avoid returning pipelines without an estimator.

v0.5.2 Nov. 18, 2019
  • Enhancements
    • Adding basic pipeline structure visualization #211

  • Documentation Changes
    • Added notebooks to build process #212

v0.5.1 Nov. 15, 2019
  • Enhancements
    • Added basic outlier detection guardrail #151

    • Added basic ID column guardrail #135

    • Added support for unlimited pipelines with a max_time limit #70

    • Updated .readthedocs.yaml to successfully build #188

  • Fixes
    • Removed MSLE from default additional objectives #203

    • Fixed random_state passed in pipelines #204

    • Fixed slow down in RFRegressor #206

  • Changes
    • Pulled information for describe_pipeline from pipeline’s new describe method #190

    • Refactored pipelines #108

    • Removed guardrails from Auto(*) #202, #208

  • Documentation Changes
    • Updated documentation to show max_time enhancements #189

    • Updated release instructions for RTD #193

    • Added notebooks to build process #212

    • Added contributing instructions #213

    • Added new content #222

v0.5.0 Oct. 29, 2019
  • Enhancements
    • Added basic one hot encoding #73

    • Use enums for model_type #110

    • Support for splitting regression datasets #112

    • Auto-infer multiclass classification #99

    • Added support for other units in max_time #125

    • Detect highly null columns #121

    • Added additional regression objectives #100

    • Show an interactive iteration vs. score plot when using fit() #134

  • Fixes
    • Reordered describe_pipeline #94

    • Added type check for model_type #109

    • Fixed s units when setting string max_time #132

    • Fix objectives not appearing in API documentation #150

  • Changes
    • Reorganized tests #93

    • Moved logging to its own module #119

    • Show progress bar history #111

    • Using cloudpickle instead of pickle to allow unloading of custom objectives #113

    • Removed render.py #154

  • Documentation Changes
    • Update release instructions #140

    • Include additional_objectives parameter #124

    • Added Changelog #136

  • Testing Changes
    • Code coverage #90

    • Added CircleCI tests for other Python versions #104

    • Added doc notebooks as tests #139

    • Test metadata for CircleCI and 2 core parallelism #137

v0.4.1 Sep. 16, 2019
  • Enhancements
    • Added AutoML for classification and regressor using Autobase and Skopt #7 #9

    • Implemented standard classification and regression metrics #7

    • Added logistic regression, random forest, and XGBoost pipelines #7

    • Implemented support for custom objectives #15

    • Feature importance for pipelines #18

    • Serialization for pipelines #19

    • Allow fitting on objectives for optimal threshold #27

    • Added detect label leakage #31

    • Implemented callbacks #42

    • Allow for multiclass classification #21

    • Added support for additional objectives #79

  • Fixes
    • Fixed feature selection in pipelines #13

    • Made random_seed usage consistent #45

  • Documentation Changes
    • Documentation Changes

    • Added docstrings #6

    • Created notebooks for docs #6

    • Initialized readthedocs EvalML #6

    • Added favicon #38

  • Testing Changes
    • Added testing for loading data #39

v0.2.0 Aug. 13, 2019
  • Enhancements
    • Created fraud detection objective #4

v0.1.0 July. 31, 2019
  • First Release

  • Enhancements
    • Added lead scoring objecitve #1

    • Added basic classifier #1

  • Documentation Changes
    • Initialized Sphinx for docs #1