Release Notes¶
- Future Releases
Enhancements
Fixes
Changes
Documentation Changes
Testing Changes
- v0.21.0 Mar. 24, 2021
- Enhancements
Changed
AutoMLSearchto defaultoptimize_thresholdsto True #1943Added multiple oversampling and undersampling sampling methods as data splitters for imbalanced classification #1775
Added params to balanced classification data splitters for visibility #1966
Updated
make_pipelineto not addImputerif input data does not have numeric or categorical columns #1967Updated
ClassImbalanceDataCheckto better handle multiclass imbalances #1986Added recommended actions for the output of data check’s
validatemethod #1968Added error message for
partial_dependencewhen features are mostly the same value #1994Updated
OneHotEncoderto drop one redundant feature by default for features with two categories #1997Added a
PolynomialDetrendercomponent #1992
- Fixes
Updated binary classification pipelines to use objective decision function during scoring of custom objectives #1934
Documentation Changes
Warning
- Breaking Changes
Changed
AutoMLSearchto defaultoptimize_thresholdsto True #1943Removed
data_checksparameter,data_check_resultsand data checks logic fromAutoMLSearch. To run the data checks which were previously run by default inAutoMLSearch, please callDefaultDataChecks().validate(X_train, y_train)or take a look at our documentation for more examples. #1935Deleted
random_stateargument #1985
- v0.20.0 Mar. 10, 2021
- Enhancements
Added a GitHub Action for Detecting dependency changes #1933
Create a separate CV split to train stacked ensembler on for AutoMLSearch #1814
Added a GitHub Action for Linux unit tests #1846
Added
ARIMARegressorestimator #1894Added
DataCheckActionclass andDataCheckActionCodeenum #1896Updated
Woodworkrequirement tov0.0.10#1900Added
BalancedClassificationDataCVSplitandBalancedClassificationDataTVSplitto AutoMLSearch #1875Update default classification data splitter to use downsampling for highly imbalanced data #1875
Updated
describe_pipelineto return more information, includingidof pipelines used for ensemble models #1909Added utility method to create list of components from a list of
DataCheckAction#1907Updated
validatemethod to include aactionkey in returned dictionary for allDataCheck``and ``DataChecks#1916Aggregating the shap values for predictions that we know the provenance of, e.g. OHE, text, and date-time. #1901
Improved error message when custom objective is passed as a string in
pipeline.score#1941Added
score_pipelinesandtrain_pipelinesmethods toAutoMLSearch#1913Added support for
pandasversion 1.2.0 #1708Added
score_batchandtrain_batchabstact methods toEngineBaseand implementations inSequentialEngine#1913
- Fixes
Removed CI check for
check_dependencies_updated_linux#1950Added metaclass for time series pipelines and fix binary classification pipeline
predictnot using objective if it is passed as a named argument #1874Fixed stack trace in prediction explanation functions caused by mixed string/numeric pandas column names #1871
Fixed stack trace caused by passing pipelines with duplicate names to
AutoMLSearch#1932Fixed
AutoMLSearch.get_pipelinesreturning pipelines with the same attributes #1958
- Changes
Reversed GitHub Action for Linux unit tests until a fix for report generation is found #1920
Updated
add_resultsinAutoMLAlgorithmto take in entire pipeline results dictionary fromAutoMLSearch#1891Updated
ClassImbalanceDataCheckto look for severe class imbalance scenarios #1905Deleted the
explain_predictionfunction #1915Removed
HighVarianceCVDataCheckand convered it to anAutoMLSearchmethod instead #1928Removed warning in
InvalidTargetDataCheckreturned when numeric binary classification targets are not (0, 1) #1959
- Documentation Changes
Updated
model_understanding.ipynbto demo the two-way partial dependence capability #1919
Testing Changes
Warning
- v0.19.0 Feb. 23, 2021
- Enhancements
Added a GitHub Action for Python windows unit tests #1844
Added a GitHub Action for checking updated release notes #1849
Added a GitHub Action for Python lint checks #1837
Adjusted
explain_prediction,explain_predictionsandexplain_predictions_best_worstto handle timeseries problems. #1818Updated
InvalidTargetDataCheckto check for mismatched indices in target and features #1816Updated
Woodworkstructures returned from components to supportWoodworklogical type overrides set by the user #1784Updated estimators to keep track of input feature names during
fit()#1794Updated
visualize_decision_treeto include feature names in output #1813Added
is_bounded_like_percentageproperty for objectives. If true, thecalculate_percent_differencemethod will return the absolute difference rather than relative difference #1809Added full error traceback to AutoMLSearch logger file #1840
Changed
TargetEncoderto preserve custom indices in the data #1836Refactored
explain_predictionsandexplain_predictions_best_worstto only compute features once for all rows that need to be explained #1843Added custom random undersampler data splitter for classification #1857
Updated
OutliersDataCheckimplementation to calculate the probability of having no outliers #1855Added
Enginespipeline processing API #1838
- Fixes
Changed EngineBase random_state arg to random_seed and same for user guide docs #1889
- Changes
Modified
calculate_percent_differenceso that division by 0 is now inf rather than nan #1809Removed
text_columnsparameter fromLSAandTextFeaturizercomponents #1652Added
random_seedas an argument to our automl/pipeline/component API. Usingrandom_statewill raise a warning #1798Added
DataCheckErrormessage inInvalidTargetDataCheckif input target is None and removed exception raised #1866
Documentation Changes
Warning
- Breaking Changes
Added a deprecation warning to
explain_prediction. It will be deleted in the next release. #1860
- v0.18.2 Feb. 10, 2021
- Enhancements
Added uniqueness score data check #1785
Added “dataframe” output format for prediction explanations #1781
Updated LightGBM estimators to handle
pandas.MultiIndex#1770Sped up permutation importance for some pipelines #1762
Added sparsity data check #1797
Confirmed support for threshold tuning for binary time series classification problems #1803
Fixes
Changes
- Documentation Changes
Added section on conda to the contributing guide #1771
Updated release process to reflect freezing main before perf tests #1787
Moving some prs to the right section of the release notes #1789
Tweak README.md. #1800
Fixed back arrow on install page docs #1795
Fixed docstring for ClassImbalanceDataCheck.validate() #1817
Testing Changes
- v0.18.1 Feb. 1, 2021
- Enhancements
Added
graph_t_sneas a visualization tool for high dimensional data #1731Added the ability to see the linear coefficients of features in linear models terms #1738
Added support for
scikit-learnv0.24.0#1733Added support for
scipyv1.6.0#1752Added SVM Classifier and Regressor to estimators #1714 #1761
- Fixes
Testing Changes
Warning
- v0.18.0 Jan. 26, 2021
- Enhancements
Added RMSLE, MSLE, and MAPE to core objectives while checking for negative target values in
invalid_targets_data_check#1574Added validation checks for binary problems with regression-like datasets and multiclass problems without true multiclass targets in
invalid_targets_data_check#1665Added time series support for
make_pipeline#1566Added target name for output of pipeline
predictmethod #1578Added multiclass check to
InvalidTargetDataCheckfor two examples per class #1596Added support for
graphvizv0.16#1657Enhanced time series pipelines to accept empty features #1651
Added KNN Classifier to estimators. #1650
Added support for list inputs for objectives #1663
Added support for
AutoMLSearchto handle time series classification pipelines #1666Enhanced
DelayedFeaturesTransformerto encode categorical features and targets before delaying them #1691Added 2-way dependence plots. #1690
Added ability to directly iterate through components within Pipelines #1583
- Fixes
Fixed inconsistent attributes and added Exceptions to docs #1673
Fixed
TargetLeakageDataCheckto use Woodworkmutual_informationrather than using Pandas’ Pearson Correlation #1616Fixed thresholding for pipelines in
AutoMLSearchto only threshold binary classification pipelines #1622 #1626Updated
load_datato return Woodwork structures and update default parameter value forindextoNone#1610Pinned scipy at < 1.6.0 while we work on adding support #1629
Fixed data check message formatting in
AutoMLSearch#1633Addressed stacked ensemble component for
scikit-learnv0.24 support by settingshuffle=Truefor default CV #1613Fixed bug where
Imputerreset the index onX#1590Fixed
AutoMLSearchstacktrace when a cutom objective was passed in as a primary objective or additional objective #1575Fixed custom index bug for
MAPEobjective #1641Fixed index bug for
TextFeaturizerandLSAcomponents #1644Limited
load_frauddataset loaded intoautoml.ipynb#1646add_to_rankingsupdatesAutoMLSearch.best_pipelinewhen necessary #1647Fixed bug where time series baseline estimators were not receiving
gapandmax_delayinAutoMLSearch#1645Fixed jupyter notebooks to help the RTD buildtime #1654
Added
positive_onlyobjectives tonon_core_objectives#1661Fixed stacking argument
n_jobsfor IterativeAlgorithm #1706Updated CatBoost estimators to return self in
.fit()rather than the underlying model for consistency #1701Added ability to initialize pipeline parameters in
AutoMLSearchconstructor #1676
- Changes
Added labeling to
graph_confusion_matrix#1632Rerunning search for
AutoMLSearchresults in a message thrown rather than failing the search, and removedhas_searchedproperty #1647Changed tuner class to allow and ignore single parameter values as input #1686
Capped LightGBM version limit to remove bug in docs #1711
Removed support for np.random.RandomState in EvalML #1727
- Documentation Changes
Update Model Understanding in the user guide to include
visualize_decision_tree#1678Updated docs to include information about
AutoMLSearchcallback parameters and methods #1577Updated docs to prompt users to install graphiz on Mac #1656
Added
infer_feature_typesto thestart.ipynbguide #1700Added multicollinearity data check to API reference and docs #1707
Testing Changes
Warning
- Breaking Changes
Removed
has_searchedproperty fromAutoMLSearch#1647Components and pipelines return
Woodworkdata structures instead ofpandasdata structures #1668Removed support for np.random.RandomState in EvalML. Rather than passing
np.random.RandomStateas component and pipeline random_state values, we use int random_seed #1727
- v0.17.0 Dec. 29, 2020
- Enhancements
Added
save_plotthat allows for saving figures from different backends #1588Added
LightGBM Regressorto regression components #1459Added
visualize_decision_treefor tree visualization withdecision_tree_data_from_estimatoranddecision_tree_data_from_pipelineto reformat tree structure output #1511Added DFS Transformer component into transformer components #1454
Added
MAPEto the standard metrics for time series problems and update objectives #1510Added
graph_prediction_vs_actual_over_timeandget_prediction_vs_actual_over_time_datato the model understanding module for time series problems #1483Added a
ComponentGraphclass that will support future pipelines as directed acyclic graphs #1415Updated data checks to accept
Woodworkdata structures #1481Added parameter to
InvalidTargetDataCheckto show only top unique values rather than all unique values #1485Added multicollinearity data check #1515
Added baseline pipeline and components for time series regression problems #1496
Added more information to users about ensembling behavior in
AutoMLSearch#1527Add woodwork support for more utility and graph methods #1544
Changed
DateTimeFeaturizerto encode features as int #1479Return trained pipelines from
AutoMLSearch.best_pipeline#1547Added utility method so that users can set feature types without having to learn about Woodwork directly #1555
Added Linear Discriminant Analysis transformer for dimensionality reduction #1331
Added multiclass support for
partial_dependenceandgraph_partial_dependence#1554Added
TimeSeriesBinaryClassificationPipelineandTimeSeriesMulticlassClassificationPipelineclasses #1528Added
make_data_splittermethod for easier automl data split customization #1568Integrated
ComponentGraphclass into Pipelines for full non-linear pipeline support #1543Update
AutoMLSearchconstructor to take training data instead ofsearchandadd_to_leaderboard#1597Update
split_datahelper args #1597Add problem type utils
is_regression,is_classification,is_timeseries#1597Rename
AutoMLSearchdata_splitarg todata_splitter#1569
- Fixes
Fix AutoML not passing CV folds to
DefaultDataChecksfor usage byClassImbalanceDataCheck#1619Fix Windows CI jobs: install
numbavia conda, required forshap#1490Added custom-index support for reset-index-get_prediction_vs_actual_over_time_data #1494
Fix
generate_pipeline_codeto account for boolean and None differences between Python and JSON #1524 #1531Set max value for plotly and xgboost versions while we debug CI failures with newer versions #1532
Undo version pinning for plotly #1533
Fix ReadTheDocs build by updating the version of
setuptools#1561Set
random_stateof data splitter in AutoMLSearch to take int to keep consistency in the resulting splits #1579Pin sklearn version while we work on adding support #1594
Pin pandas at <1.2.0 while we work on adding support #1609
Pin graphviz at < 0.16 while we work on adding support #1609
- Changes
Reverting
save_graph#1550 to resolve kaleido build issues #1585Update circleci badge to apply to
main#1489Added script to generate github markdown for releases #1487
Updated selection using pandas
dtypesto selecting using Woodwork logical types #1551Updated dependencies to fix
ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes'error and to address Woodwork and Featuretool dependencies #1540Made
get_prediction_vs_actual_data()a public method #1553Updated
Woodworkversion requirement to v0.0.7 #1560Move data splitters from
evalml.automl.data_splitterstoevalml.preprocessing.data_splitters#1597Rename “# Testing” in automl log output to “# Validation” #1597
- Testing Changes
Set
n_jobs=1in most unit tests to reduce memory #1505
Warning
- Breaking Changes
Updated minimal dependencies:
numpy>=1.19.1,pandas>=1.1.0,scikit-learn>=0.23.1,scikit-optimize>=0.8.1Updated
AutoMLSearch.best_pipelineto return a trained pipeline. Pass intrain_best_pipeline=Falseto AutoMLSearch in order to return an untrained pipeline.Pipeline component instances can no longer be iterated through using
Pipeline.component_graph#1543Update
AutoMLSearchconstructor to take training data instead ofsearchandadd_to_leaderboard#1597Update
split_datahelper args #1597Move data splitters from
evalml.automl.data_splitterstoevalml.preprocessing.data_splitters#1597Rename
AutoMLSearchdata_splitarg todata_splitter#1569
- v0.16.1 Dec. 1, 2020
- v0.16.0 Nov. 24, 2020
- Enhancements
Updated pipelines and
make_pipelineto acceptWoodworkinputs #1393Updated components to accept
Woodworkinputs #1423Added ability to freeze hyperparameters for
AutoMLSearch#1284Added
Target Encoderinto transformer components #1401Added callback for error handling in
AutoMLSearch#1403Added the index id to the
explain_predictions_best_worstoutput to help users identify which rows in their data are included #1365The top_k features displayed in
explain_predictions_*functions are now determined by the magnitude of shap values as opposed to thetop_klargest and smallest shap values. #1374Added a problem type for time series regression #1386
Added a
is_defined_for_problem_typemethod toObjectiveBase#1386Added a
random_stateparameter tomake_pipeline_from_componentsfunction #1411Added
DelayedFeaturesTransformer#1396Added a
TimeSeriesRegressionPipelineclass #1418Removed
core-requirements.txtfrom the package distribution #1429Updated data check messages to include a “code” and “details” fields #1451, #1462
Added a
TimeSeriesSplitdata splitter for time series problems #1441Added a
problem_configurationparameter to AutoMLSearch #1457
- Fixes
Fixed
IndexErrorraised inAutoMLSearchwhenensembling = Truebut only one pipeline to iterate over #1397Fixed stacked ensemble input bug and LightGBM warning and bug in
AutoMLSearch#1388Updated enum classes to show possible enum values as attributes #1391
Updated calls to
Woodwork’sto_pandas()toto_series()andto_dataframe()#1428Fixed bug in OHE where column names were not guaranteed to be unique #1349
Fixed bug with percent improvement of
ExpVarianceobjective on data with highly skewed target #1467Fix SimpleImputer error which occurs when all features are bool type #1215
- Changes
Changed
OutliersDataCheckto return the list of columns, rather than rows, that contain outliers #1377Simplified and cleaned output for Code Generation #1371
Updated data checks to return dictionary of warnings and errors instead of a list #1448
Updated
AutoMLSearchto passWoodworkdata structures to every pipeline (instead of pandas DataFrames) #1450Update
AutoMLSearchto default tomax_batches=1instead ofmax_iterations=5#1452Updated _evaluate_pipelines to consolidate side effects #1410
- Documentation Changes
Added description of CLA to contributing guide, updated description of draft PRs #1402
Updated documentation to include all data checks,
DataChecks, and usage of data checks in AutoML #1412Updated docstrings from
np.arraytonp.ndarray#1417Added section on stacking ensembles in AutoMLSearch documentation #1425
- Testing Changes
Removed
category_encodersfrom test-requirements.txt #1373Tweak codecov.io settings again to avoid flakes #1413
Modified
make lintto check notebook versions in the docs #1431Modified
make lint-fixto standardize notebook versions in the docs #1431Use new version of pull request Github Action for dependency check (#1443)
Reduced number of workers for tests to 4 #1447
Warning
- Breaking Changes
The
top_kandtop_k_featuresparameters inexplain_predictions_*functions now returnkfeatures as opposed to2 * kfeatures #1374Renamed
problem_typetoproblem_typesinRegressionObjective,BinaryClassificationObjective, andMulticlassClassificationObjective#1319Data checks now return a dictionary of warnings and errors instead of a list #1448
- v0.15.0 Oct. 29, 2020
- Enhancements
Added stacked ensemble component classes (
StackedEnsembleClassifier,StackedEnsembleRegressor) #1134Added stacked ensemble components to
AutoMLSearch#1253Added
DecisionTreeClassifierandDecisionTreeRegressorto AutoML #1255Added
graph_prediction_vs_actualinmodel_understandingfor regression problems #1252Added parameter to
OneHotEncoderto enable filtering for features to encode for #1249Added percent-better-than-baseline for all objectives to automl.results #1244
Added
HighVarianceCVDataCheckand replaced synonymous warning inAutoMLSearch#1254Added PCA Transformer component for dimensionality reduction #1270
Added
generate_pipeline_codeandgenerate_component_codeto allow for code generation given a pipeline or component instance #1306Added
PCA Transformercomponent for dimensionality reduction #1270Updated
AutoMLSearchto supportWoodworkdata structures #1299Added cv_folds to
ClassImbalanceDataCheckand added this check toDefaultDataChecks#1333Make
max_batchesargument toAutoMLSearch.searchpublic #1320Added text support to automl search #1062
Added
_pipelines_per_batchas a private argument toAutoMLSearch#1355
- Fixes
Fixed ML performance issue with ordered datasets: always shuffle data in automl’s default CV splits #1265
Fixed broken
evalml infoCLI command #1293Fixed
boosting type='rf'for LightGBM Classifier, as well asnum_leaveserror #1302Fixed bug in
explain_predictions_best_worstwhere a custom index in the target variable would cause aValueError#1318Added stacked ensemble estimators to to
evalml.pipelines.__init__file #1326Fixed bug in OHE where calls to transform were not deterministic if
top_nwas less than the number of categories in a column #1324Fixed LightGBM warning messages during AutoMLSearch #1342
Fix warnings thrown during AutoMLSearch in
HighVarianceCVDataCheck#1346Fixed bug where TrainingValidationSplit would return invalid location indices for dataframes with a custom index #1348
Fixed bug where the AutoMLSearch
random_statewas not being passed to the created pipelines #1321
- Changes
Allow
add_to_rankingsto be called before AutoMLSearch is called #1250Removed Graphviz from test-requirements to add to requirements.txt #1327
Removed
max_pipelinesparameter fromAutoMLSearch#1264Include editable installs in all install make targets #1335
Made pip dependencies featuretools and nlp_primitives core dependencies #1062
Removed PartOfSpeechCount from TextFeaturizer transform primitives #1062
Added warning for
partial_dependencywhen the feature includes null values #1352
- Documentation Changes
Fixed and updated code blocks in Release Notes #1243
Added DecisionTree estimators to API Reference #1246
Changed class inheritance display to flow vertically #1248
Updated cost-benefit tutorial to use a holdout/test set #1159
Added
evalml infocommand to documentation #1293Miscellaneous doc updates #1269
Removed conda pre-release testing from the release process document #1282
Updates to contributing guide #1310
Added Alteryx footer to docs with Twitter and Github link #1312
Added documentation for evalml installation for Python 3.6 #1322
Added documentation changes to make the API Docs easier to understand #1323
Fixed documentation for
feature_importance#1353Added tutorial for running AutoML with text data #1357
Added documentation for woodwork integration with automl search #1361
- Testing Changes
Added tests for
jupyter_checkto handle IPython #1256Cleaned up
make_pipelinetests to test for all estimators #1257Added a test to check conda build after merge to main #1247
Removed code that was lacking codecov for
__main__.pyand unnecessary #1293Codecov: round coverage up instead of down #1334
Add DockerHub credentials to CI testing environment #1356
Add DockerHub credentials to conda testing environment #1363
Warning
- Breaking Changes
Renamed
LabelLeakageDataChecktoTargetLeakageDataCheck#1319max_pipelinesparameter has been removed fromAutoMLSearch. Please usemax_iterationsinstead. #1264AutoMLSearch.search()will now log a warning if the input is not aWoodworkdata structure (pandas,numpy) #1299Make
max_batchesargument toAutoMLSearch.searchpublic #1320Removed unused argument feature_types from AutoMLSearch.search #1062
- v0.14.1 Sep. 29, 2020
- Enhancements
Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns #1150
Added
get_feature_namesonOneHotEncoder#1193Added
detect_problem_typetoproblem_type/utils.pyto automatically detect the problem type given targets #1194Added LightGBM to
AutoMLSearch#1199Updated
scikit-learnandscikit-optimizeto use latest versions - 0.23.2 and 0.8.1 respectively #1141Added
__str__and__repr__for pipelines and components #1218Included internal target check for both training and validation data in
AutoMLSearch#1226Added
ProblemTypes.all_problem_typeshelper to get list of supported problem types #1219Added
DecisionTreeClassifierandDecisionTreeRegressorclasses #1223Added
ProblemTypes.all_problem_typeshelper to get list of supported problem types #1219DataCheckscan now be parametrized by passing a list ofDataCheckclasses and a parameter dictionary #1167Added first CV fold score as validation score in
AutoMLSearch.rankings#1221Updated
flake8configuration to enable linting on__init__.pyfiles #1234Refined
make_pipeline_from_componentsimplementation #1204
- Changes
Added
allow_writing_filesas a named argument to CatBoost estimators. #1202Added
solverandmulti_classas named arguments toLogisticRegressionClassifier#1202Replaced pipeline’s
._transformmethod to evaluate all the preprocessing steps of a pipeline with.compute_estimator_features#1231Changed default large dataset train/test splitting behavior #1205
- Documentation Changes
Included description of how to access the component instances and features for pipeline user guide #1163
Updated API docs to refer to target as “target” instead of “labels” for non-classification tasks and minor docs cleanup #1160
Added Class Imbalance Data Check to
api_reference.rst#1190 #1200Added pipeline properties to API reference #1209
Clarified what the objective parameter in AutoML is used for in AutoML API reference and AutoML user guide #1222
Updated API docs to include
skopt.space.Categoricaloption for component hyperparameter range definition #1228Added install documentation for
libompin order to use LightGBM on Mac #1233Improved description of
max_iterationsin documentation #1212Removed unused code from sphinx conf #1235
Testing Changes
Warning
- Breaking Changes
DefaultDataChecksnow accepts aproblem_typeparameter that must be specified #1167Pipeline’s
._transformmethod to evaluate all the preprocessing steps of a pipeline has been replaced with.compute_estimator_features#1231get_objectiveshas been renamed toget_core_objectives. This function will now return a list of valid objective instances #1230
- v0.13.2 Sep. 17, 2020
- Enhancements
Added
output_formatfield to explain predictions functions #1107Modified
get_objectiveandget_objectivesto be able to return any objective inevalml.objectives#1132Added a
return_instanceboolean parameter toget_objective#1132Added
ClassImbalanceDataCheckto determine whether target imbalance falls below a given threshold #1135Added label encoder to LightGBM for binary classification #1152
Added labels for the row index of confusion matrix #1154
Added
AutoMLSearchobject as another parameter in search callbacks #1156Added the corresponding probability threshold for each point displayed in
graph_roc_curve#1161Added
__eq__forComponentBaseandPipelineBase#1178Added support for multiclass classification for
roc_curve#1164Added
categoriesaccessor toOneHotEncoderfor listing the categories associated with a feature #1182Added utility function to create pipeline instances from a list of component instances #1176
- Fixes
Fixed XGBoost column names for partial dependence methods #1104
Removed dead code validating column type from
TextFeaturizer#1122Fixed issue where
Imputercannot fit when there is None in a categorical or boolean column #1144OneHotEncoderpreserves the custom index in the input data #1146Fixed representation for
ModelFamily#1165Removed duplicate
nbsphinxdependency indev-requirements.txt#1168Users can now pass in any valid kwargs to all estimators #1157
Remove broken accessor
OneHotEncoder.get_feature_namesand unneeded base class #1179Removed LightGBM Estimator from AutoML models #1186
- Documentation Changes
Fixed API docs for
AutoMLSearchadd_result_callback#1113Added a step to our release process for pushing our latest version to conda-forge #1118
Added warning for missing ipywidgets dependency for using
PipelineSearchPlotson Jupyterlab #1145Updated
README.mdexample to load demo dataset #1151Swapped mapping of breast cancer targets in
model_understanding.ipynb#1170
Warning
- Breaking Changes
get_objectivewill now return a class definition rather than an instance by default #1132Deleted
OPTIONSdictionary inevalml.objectives.utils.py#1132If specifying an objective by string, the string must now match the objective’s name field, case-insensitive #1132
- Passing “Cost Benefit Matrix”, “Fraud Cost”, “Lead Scoring”, “Mean Squared Log Error”,
“Recall”, “Recall Macro”, “Recall Micro”, “Recall Weighted”, or “Root Mean Squared Log Error” to
AutoMLSearchwill now result in aValueErrorrather than anObjectiveNotFoundError#1132
Search callbacks
start_iteration_callbackandadd_results_callbackhave changed to include a copy of the AutoMLSearch object as a third parameter #1156Deleted
OneHotEncoder.get_feature_namesmethod which had been broken for a while, in favor of pipelines’input_feature_names#1179Deleted empty base class
CategoricalEncoderwhichOneHotEncodercomponent was inheriting from #1176Results from
roc_curvewill now return as a list of dictionaries with each dictionary representing a class #1164max_pipelinesnow raises aDeprecationWarningand will be removed in the next release.max_iterationsshould be used instead. #1169
- v0.13.1 Aug. 25, 2020
- Enhancements
Added Cost-Benefit Matrix objective for binary classification #1038
Split
fill_valueintocategorical_fill_valueandnumeric_fill_valuefor Imputer #1019Added
explain_predictionsandexplain_predictions_best_worstfor explaining multiple predictions with SHAP #1016Added new LSA component for text featurization #1022
Added guide on installing with conda #1041
Added a “cost-benefit curve” util method to graph cost-benefit matrix scores vs. binary classification thresholds #1081
Standardized error when calling transform/predict before fit for pipelines #1048
Added
percent_better_than_baselineto AutoML search rankings and full rankings table #1050Added one-way partial dependence and partial dependence plots #1079
Added “Feature Value” column to prediction explanation reports. #1064
Added
max_batchesparameter toAutoMLSearch#1087
- Fixes
Updated
TextFeaturizercomponent to no longer require an internet connection to run #1022Fixed non-deterministic element of
TextFeaturizertransformations #1022Added a StandardScaler to all ElasticNet pipelines #1065
Updated cost-benefit matrix to normalize score #1099
Fixed logic in
calculate_percent_differenceso that it can handle negative values #1100
- Changes
Added
needs_fittingproperty toComponentBase#1044Updated references to data types to use datatype lists defined in
evalml.utils.gen_utils#1039Remove maximum version limit for SciPy dependency #1051
Moved
all_componentsand other component importers into runtime methods #1045Consolidated graphing utility methods under
evalml.utils.graph_utils#1060Made slight tweaks to how
TextFeaturizerusesfeaturetools, and did some refactoring of that and of LSA #1090Changed
show_all_featuresparameter intoimportance_threshold, which allows for thresholding feature importance #1097, #1103
Warning
- v0.12.2 Aug. 6, 2020
- v0.12.0 Aug. 3, 2020
- Enhancements
Added string and categorical targets support for binary and multiclass pipelines and check for numeric targets for
DetectLabelLeakagedata check #932Added clear exception for regression pipelines if target datatype is string or categorical #960
Added target column names and class labels in
predictandpredict_probaoutput for pipelines #951Added
_compute_shap_valuesandnormalize_valuestopipelines/explanationsmodule #958Added
explain_predictionfeature which explains single predictions with SHAP #974Added Imputer to allow different imputation strategies for numerical and categorical dtypes #991
Added support for configuring logfile path using env var, and don’t create logger if there are filesystem errors #975
Updated catboost estimators’ default parameters and automl hyperparameter ranges to speed up fit time #998
- Fixes
Fixed ReadtheDocs warning failure regarding embedded gif #943
Removed incorrect parameter passed to pipeline classes in
_add_baseline_pipelines#941Added universal error for calling
predict,predict_proba,transform, andfeature_importancesbefore fitting #969, #994Made
TextFeaturizercomponent and pip dependenciesfeaturetoolsandnlp_primitivesoptional #976Updated imputation strategy in automl to no longer limit impute strategy to
most_frequentfor all features if there are any categorical columns #991Fixed
UnboundLocalErrorforcv_pipelinewhen automl search errors #996Fixed
Imputerto reset dataframe index to preserve behavior expected fromSimpleImputer#1009
- Changes
Moved
get_estimatorstoevalml.pipelines.components.utils#934Modified Pipelines to raise
PipelineScoreErrorwhen they encounter an error during scoring #936Moved
evalml.model_families.list_model_familiestoevalml.pipelines.components.allowed_model_families#959Renamed
DateTimeFeaturizationtoDateTimeFeaturizer#977Added check to stop search and raise an error if all pipelines in a batch return NaN scores #1015
- Documentation Changes
Updated
README.md#963Reworded message when errors are returned from data checks in search #982
Added section on understanding model predictions with
explain_predictionto User Guide #981Added a section to the user guide and api reference about how XGBoost and CatBoost are not fully supported. #992
Added custom components section in user guide #993
Updated FAQ section formatting #997
Updated release process documentation #1003
Warning
- Breaking Changes
get_estimatorshas been moved toevalml.pipelines.components.utils(previously was underevalml.pipelines.utils) #934Removed the
raise_errorsflag in AutoML search. All errors during pipeline evaluation will be caught and logged. #936evalml.model_families.list_model_familieshas been moved toevalml.pipelines.components.allowed_model_families#959TextFeaturizer: thefeaturetoolsandnlp_primitivespackages must be installed after installing evalml in order to use this component #976Renamed
DateTimeFeaturizationtoDateTimeFeaturizer#977
- v0.11.2 July 16, 2020
- Enhancements
Added
NoVarianceDataChecktoDefaultDataChecks#893Added text processing and featurization component
TextFeaturizer#913, #924Added additional checks to
InvalidTargetDataCheckto handle invalid target data types #929AutoMLSearchwill now handleKeyboardInterruptand prompt user for confirmation #915
- Fixes
Makes automl results a read-only property #919
- Changes
Deleted static pipelines and refactored tests involving static pipelines, removed
all_pipelines()andget_pipelines()#904Moved
list_model_familiestoevalml.model_family.utils#903Updated
all_pipelines,all_estimators,all_componentsto use the same mechanism for dynamically generating their elements #898Rename
masterbranch tomain#918Add pypi release github action #923
Updated
AutoMLSearch.searchstdout output and logging and removed tqdm progress bar #921Moved automl config checks previously in
search()to init #933
- Testing Changes
Cleaned up fixture names and usages in tests #895
Warning
- Breaking Changes
list_model_familieshas been moved toevalml.model_family.utils(previously was underevalml.pipelines.utils) #903get_estimatorshas been moved toevalml.pipelines.components.utils(previously was underevalml.pipelines.utils) #934Static pipeline definitions have been removed, but similar pipelines can still be constructed via creating an instance of
PipelineBase#904all_pipelines()andget_pipelines()utility methods have been removed #904
- v0.11.0 June 30, 2020
- Enhancements
Added multiclass support for ROC curve graphing #832
Added preprocessing component to drop features whose percentage of NaN values exceeds a specified threshold #834
Added data check to check for problematic target labels #814
Added PerColumnImputer that allows imputation strategies per column #824
Added transformer to drop specific columns #827
Added support for
categories,handle_error, anddropparameters inOneHotEncoder#830 #897Added preprocessing component to handle DateTime columns featurization #838
Added ability to clone pipelines and components #842
Define getter method for component
parameters#847Added utility methods to calculate and graph permutation importances #860, #880
Added new utility functions necessary for generating dynamic preprocessing pipelines #852
Added kwargs to all components #863
Updated
AutoSearchBaseto use dynamically generated preprocessing pipelines #870Added SelectColumns transformer #873
Added ability to evaluate additional pipelines for automl search #874
Added
default_parametersclass property to components and pipelines #879Added better support for disabling data checks in automl search #892
Added ability to save and load AutoML objects to file #888
Updated
AutoSearchBase.get_pipelinesto return an untrained pipeline instance #876Saved learned binary classification thresholds in automl results cv data dict #876
- Fixes
Fixed bug where SimpleImputer cannot handle dropped columns #846
Fixed bug where PerColumnImputer cannot handle dropped columns #855
Enforce requirement that builtin components save all inputted values in their parameters dict #847
Don’t list base classes in
all_componentsoutput #847Standardize all components to output pandas data structures, and accept either pandas or numpy #853
Fixed rankings and full_rankings error when search has not been run #894
- Changes
Update
all_pipelinesandall_componentsto try initializing pipelines/components, and on failure exclude them #849Refactor
handle_componentstohandle_components_class, standardize toComponentBasesubclass instead of instance #850Refactor “blacklist”/”whitelist” to “allow”/”exclude” lists #854
Replaced
AutoClassificationSearchandAutoRegressionSearchwithAutoMLSearch#871Renamed feature_importances and permutation_importances methods to use singular names (feature_importance and permutation_importance) #883
Updated
automldefault data splitter to train/validation split for large datasets #877Added open source license, update some repo metadata #887
Removed dead code in
_get_preprocessing_components#896
- Documentation Changes
Fix some typos and update the EvalML logo #872
Warning
- Breaking Changes
Pipelines’ static
component_graphfield must contain eitherComponentBasesubclasses orstr, instead ofComponentBasesubclass instances #850Rename
handle_componenttohandle_component_class. Now standardizes toComponentBasesubclasses instead ofComponentBasesubclass instances #850Renamed automl’s
cvargument todata_split#877Pipelines’ and classifiers’
feature_importancesis renamedfeature_importance,graph_feature_importancesis renamedgraph_feature_importance#883Passing
data_checks=Noneto automl search will not perform any data checks as opposed to default checks. #892Pipelines to search for in AutoML are now determined automatically, rather than using the statically-defined pipeline classes. #870
Updated
AutoSearchBase.get_pipelinesto return an untrained pipeline instance, instead of one which happened to be trained on the final cross-validation fold #876
- v0.10.0 May 29, 2020
- Enhancements
Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML #746
Port over highly-null guardrail as a data check and define
DefaultDataChecksandDisableDataChecksclasses #745Update
Tunerclasses to work directly with pipeline parameters dicts instead of flat parameter lists #779Add Elastic Net as a pipeline option #812
Added new Pipeline option
ExtraTrees#790Added precicion-recall curve metrics and plot for binary classification problems in
evalml.pipeline.graph_utils#794Update the default automl algorithm to search in batches, starting with default parameters for each pipeline and iterating from there #793
Added
AutoMLAlgorithmclass andIterativeAlgorithmimpl, separated fromAutoSearchBase#793
- Fixes
Update pipeline
scoreto returnnanscore for any objective which throws an exception during scoring #787Fixed bug introduced in #787 where binary classification metrics requiring predicted probabilities error in scoring #798
CatBoost and XGBoost classifiers and regressors can no longer have a learning rate of 0 #795
- Changes
Cleanup pipeline
scorecode, and cleanup codecov #711Remove
passfor abstract methods for codecov #730Added __str__ for AutoSearch object #675
Add util methods to graph ROC and confusion matrix #720
Refactor
AutoBasetoAutoSearchBase#758Updated AutoBase with
data_checksparameter, removed previousdetect_label_leakageparameter, and added functionality to run data checks before search in AutoML #765Updated our logger to use Python’s logging utils #763
Refactor most of
AutoSearchBase._do_iterationimpl intoAutoSearchBase._evaluate#762Port over all guardrails to use the new DataCheck API #789
Expanded
import_or_raiseto catch all exceptions #759Adds RMSE, MSLE, RMSLE as standard metrics #788
Don’t allow
Recallto be used as an objective for AutoML #784Removed feature selection from pipelines #819
Update default estimator parameters to make automl search faster and more accurate #793
- Testing Changes
Delete codecov yml, use codecov.io’s default #732
Added unit tests for fraud cost, lead scoring, and standard metric objectives #741
Update codecov client #782
Updated AutoBase __str__ test to include no parameters case #783
Added unit tests for
ExtraTreespipeline #790If codecov fails to upload, fail build #810
Updated Python version of dependency action #816
Update the dependency update bot to use a suffix when creating branches #817
Warning
- Breaking Changes
The
detect_label_leakageparameter for AutoML classes has been removed and replaced by adata_checksparameter #765Moved ROC and confusion matrix methods from
evalml.pipeline.plot_utilstoevalml.pipeline.graph_utils#720Tunerclasses require a pipeline hyperparameter range dict as an init arg instead of a space definition #779Tuner.proposeandTuner.addwork directly with pipeline parameters dicts instead of flat parameter lists #779PipelineBase.hyperparametersandcustom_hyperparametersuse pipeline parameters dict format instead of being represented as a flat list #779All guardrail functions previously under
evalml.guardrails.utilswill be removed and replaced by data checks #789Recalldisallowed as an objective for AutoML #784AutoSearchBaseparametertunerhas been renamed totuner_class#793AutoSearchBaseparameterpossible_pipelinesandpossible_model_familieshave been renamed toallowed_pipelinesandallowed_model_families#793
- v0.9.0 Apr. 27, 2020
- Enhancements
Added
Accuracyas an standard objective #624Added verbose parameter to load_fraud #560
Added Balanced Accuracy metric for binary, multiclass #612 #661
Added XGBoost regressor and XGBoost regression pipeline #666
Added
Accuracymetric for multiclass #672Added objective name in
AutoBase.describe_pipeline#686Added
DataCheckandDataChecks,Messageclasses and relevant subclasses #739
- Fixes
Removed direct access to
cls.component_graph#595Add testing files to .gitignore #625
Remove circular dependencies from
Makefile#637Add error case for
normalize_confusion_matrix()#640Fixed
XGBoostClassifierandXGBoostRegressorbug with feature names that contain [, ], or < #659Update
make_pipeline_graphto not accidentally create empty file when testing if path is valid #649Fix pip installation warning about docsutils version, from boto dependency #664
Removed zero division warning for F1/precision/recall metrics #671
Fixed
summaryfor pipelines without estimators #707
- Changes
Updated default objective for binary/multiclass classification to log loss #613
Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes #405
Changed the output of
scoreto return one dictionary #429Created binary and multiclass objective subclasses #504
Updated objectives API #445
Removed call to
get_plot_datafrom AutoML #615Set
raise_errorto default to True for AutoML classes #638Remove unnecessary “u” prefixes on some unicode strings #641
Changed one-hot encoder to return uint8 dtypes instead of ints #653
Pipeline
_namefield changed tocustom_name#650Removed
graphs.pyand moved methods intoPipelineBase#657, #665Remove s3fs as a dev dependency #664
Changed requirements-parser to be a core dependency #673
Replace
supported_problem_typesfield on pipelines withproblem_typeattribute on base classes #678Changed AutoML to only show best results for a given pipeline template in
rankings, addedfull_rankingsproperty to show all #682Update
ModelFamilyvalues: don’t list xgboost/catboost as classifiers now that we have regression pipelines for them #677Changed AutoML’s
describe_pipelineto get problem type from pipeline instead #685Standardize
import_or_raiseerror messages #683Updated argument order of objectives to align with sklearn’s #698
Renamed
pipeline.feature_importance_graphtopipeline.graph_feature_importances#700Moved ROC and confusion matrix methods to
evalml.pipelines.plot_utils#704Renamed
MultiClassificationObjectivetoMulticlassClassificationObjective, to align with pipeline naming scheme #715
- Documentation Changes
Fixed some sphinx warnings #593
Fixed docstring for
AutoClassificationSearchwith correct command #599Limit readthedocs formats to pdf, not htmlzip and epub #594 #600
Clean up objectives API documentation #605
Fixed function on Exploring search results page #604
Update release process doc #567
AutoClassificationSearchandAutoRegressionSearchshow inherited methods in API reference #651Fixed improperly formatted code in breaking changes for changelog #655
Added configuration to treat Sphinx warnings as errors #660
Removed separate plotting section for pipelines in API reference #657, #665
Have leads example notebook load S3 files using https, so we can delete s3fs dev dependency #664
Categorized components in API reference and added descriptions for each category #663
Fixed Sphinx warnings about
BalancedAccuracyobjective #669Updated API reference to include missing components and clean up pipeline docstrings #689
Reorganize API ref, and clarify pipeline sub-titles #688
Add and update preprocessing utils in API reference #687
Added inheritance diagrams to API reference #695
Documented which default objective AutoML optimizes for #699
Create seperate install page #701
Include more utils in API ref, like
import_or_raise#704Add more color to pipeline documentation #705
- Testing Changes
Matched install commands of
check_latest_dependenciestest and it’s GitHub action #578Added Github app to auto assign PR author as assignee #477
Removed unneeded conda installation of xgboost in windows checkin tests #618
Update graph tests to always use tmpfile dir #649
Changelog checkin test workaround for release PRs: If ‘future release’ section is empty of PR refs, pass check #658
Add changelog checkin test exception for
dep-updatebranch #723
Warning
Breaking Changes
Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.
fit()andpredict()now use an optionalobjectiveparameter, which is only used in binary classification pipelines to fit for a specific objective.score()will now use a requiredobjectivesparameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline’s objective was scored on regardless.score()will now return one dictionary of all objective scores.ROCandConfusionMatrixplot methods viaAuto(*).plothave been removed by #615 and are replaced byroc_curveandconfusion_matrixinevamlm.pipelines.plot_utilsin #704normalize_confusion_matrixhas been moved toevalml.pipelines.plot_utils#704Pipelines
_namefield changed tocustom_namePipelines
supported_problem_typesfield is removed because it is no longer necessary #678Updated argument order of objectives’
objective_functionto align with sklearn #698pipeline.feature_importance_graphhas been renamed topipeline.graph_feature_importancesin #700Removed unsupported
MSLEobjective #704
- v0.8.0 Apr. 1, 2020
- Enhancements
Add normalization option and information to confusion matrix #484
Add util function to drop rows with NaN values #487
Renamed
PipelineBase.nameasPipelineBase.summaryand redefinedPipelineBase.nameas class property #491Added access to parameters in Pipelines with
PipelineBase.parameters(used to be return ofPipelineBase.describe) #501Added
fill_valueparameter forSimpleImputer#509Added functionality to override component hyperparameters and made pipelines take hyperparemeters from components #516
Allow
numpy.random.RandomStatefor random_state parameters #556
- Fixes
Removed unused dependency
matplotlib, and movecategory_encodersto test reqs #572
- Changes
Undo version cap in XGBoost placed in #402 and allowed all released of XGBoost #407
Support pandas 1.0.0 #486
Made all references to the logger static #503
Refactored
model_typeparameter for components and pipelines tomodel_family#507Refactored
problem_typesfor pipelines and components intosupported_problem_types#515Moved
pipelines/utils.save_pipelineandpipelines/utils.load_pipelinetoPipelineBase.saveandPipelineBase.load#526Limit number of categories encoded by
OneHotEncoder#517
Warning
Breaking Changes
AutoClassificationSearchandAutoRegressionSearch’smodel_typesparameter has been refactored intoallowed_model_familiesModelTypesenum has been changed toModelFamilyComponents and Pipelines now have a
model_familyfield instead ofmodel_typeget_pipelinesutility function now acceptsmodel_familiesas an argument instead ofmodel_typesPipelineBase.nameno longer returns structure of pipeline and has been replaced byPipelineBase.summaryPipelineBase.problem_typesandEstimator.problem_typeshas been renamed tosupported_problem_typespipelines/utils.save_pipelineandpipelines/utils.load_pipelinemoved toPipelineBase.saveandPipelineBase.load
- v0.7.0 Mar. 9, 2020
- Enhancements
Added emacs buffers to .gitignore #350
Add CatBoost (gradient-boosted trees) classification and regression components and pipelines #247
Added Tuner abstract base class #351
Added
n_jobsas parameter forAutoClassificationSearchandAutoRegressionSearch#403Changed colors of confusion matrix to shades of blue and updated axis order to match scikit-learn’s #426
Added
PipelineBase.graphand.feature_importance_graphmethods, moved from previous location #423Added support for python 3.8 #462
- Changes
Added
n_estimatorsas a tunable parameter for XGBoost #307Remove unused parameter
ObjectiveBase.fit_needs_proba#320Remove extraneous parameter
component_typefrom all components #361Remove unused
rankings.csvfile #397Downloaded demo and test datasets so unit tests can run offline #408
Remove
_needs_fittingattribute from Components #398Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all #413
Refactored
PipelineBaseto take in parameter dictionary and moved pipeline metadata to class attribute #421Dropped support for Python 3.5 #438
Removed unused
apply.pyfile #449Clean up
requirements.txtto remove unused deps #451Support installation without all required dependencies #459
- Documentation Changes
Update release.md with instructions to release to internal license key #354
- Testing Changes
Added tests for utils (and moved current utils to gen_utils) #297
Moved XGBoost install into it’s own separate step on Windows using Conda #313
Rewind pandas version to before 1.0.0, to diagnose test failures for that version #325
Added dependency update checkin test #324
Rewind XGBoost version to before 1.0.0 to diagnose test failures for that version #402
Update dependency check to use a whitelist #417
Update unit test jobs to not install dev deps #455
Warning
Breaking Changes
Python 3.5 will not be actively supported.
- v0.6.0 Dec. 16, 2019
- Enhancements
Added ability to create a plot of feature importances #133
Add early stopping to AutoML using patience and tolerance parameters #241
Added ROC and confusion matrix metrics and plot for classification problems and introduce PipelineSearchPlots class #242
Enhanced AutoML results with search order #260
Added utility function to show system and environment information #300
- Changes
Renamed automl classes to
AutoRegressionSearchandAutoClassificationSearch#287Updating demo datasets to retain column names #223
Moving pipeline visualization to
PipelinePlotclass #228Standarizing inputs as
pd.Dataframe/pd.Series#130Enforcing that pipelines must have an estimator as last component #277
Added
ipywidgetsas a dependency inrequirements.txt#278Added Random and Grid Search Tuners #240
Warning
Breaking Changes
The
fit()method forAutoClassifierandAutoRegressorhas been renamed tosearch().AutoClassifierhas been renamed toAutoClassificationSearchAutoRegressorhas been renamed toAutoRegressionSearchAutoClassificationSearch.resultsandAutoRegressionSearch.resultsnow is a dictionary withpipeline_resultsandsearch_orderkeys.pipeline_resultscan be used to access a dictionary that is identical to the old.resultsdictionary. Whereas,search_orderreturns a list of the search order in terms ofpipeline_id.Pipelines now require an estimator as the last component in
component_list. Slicing pipelines now throws anNotImplementedErrorto avoid returning pipelines without an estimator.
- v0.5.2 Nov. 18, 2019
- v0.5.1 Nov. 15, 2019
- v0.5.0 Oct. 29, 2019
- Enhancements
Added basic one hot encoding #73
Use enums for model_type #110
Support for splitting regression datasets #112
Auto-infer multiclass classification #99
Added support for other units in
max_time#125Detect highly null columns #121
Added additional regression objectives #100
Show an interactive iteration vs. score plot when using fit() #134
- v0.4.1 Sep. 16, 2019
- Enhancements
Added AutoML for classification and regressor using Autobase and Skopt #7 #9
Implemented standard classification and regression metrics #7
Added logistic regression, random forest, and XGBoost pipelines #7
Implemented support for custom objectives #15
Feature importance for pipelines #18
Serialization for pipelines #19
Allow fitting on objectives for optimal threshold #27
Added detect label leakage #31
Implemented callbacks #42
Allow for multiclass classification #21
Added support for additional objectives #79
- Testing Changes
Added testing for loading data #39
- v0.2.0 Aug. 13, 2019
- Enhancements
Created fraud detection objective #4
- v0.1.0 July. 31, 2019