Release Notes¶
- Future Releases
Enhancements
Fixes
Changes
Documentation Changes
Testing Changes
Warning
Breaking Changes
- v0.24.0 May. 04, 2021
- Enhancements
Added date_index as a required parameter for TimeSeries problems #2217
Have the
OneHotEncoder
return the transformed columns as booleans rather than floats #2170Added Oversampler transformer component to EvalML #2079
Added Undersampler to AutoMLSearch, as well as arguments
_sampler_method
andsampler_balanced_ratio
#2128Updated prediction explanations functions to allow pipelines with XGBoost estimators #2162
Added partial dependence for datetime columns #2180
Update precision-recall curve with positive label index argument, and fix for 2d predicted probabilities #2090
Add pct_null_rows to
HighlyNullDataCheck
#2211Added a standalone AutoML search method for convenience, which runs data checks and then runs automl #2152
Make the first batch of AutoML have a predefined order, with linear models first and complex models last #2223
- Changes
Deleted baseline pipeline classes #2202
Reverting user specified date feature PR #2155 until pmdarima installation fix is found #2214
Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. #2091
Removed all old datasplitters from EvalML #2193
Deleted
make_pipeline_from_components
#2218
- Documentation Changes
- Testing Changes
Use machineFL user token for dependency update bot, and add more reviewers #2189
Warning
- Breaking Changes
All baseline pipeline classes (
BaselineBinaryPipeline
,BaselineMulticlassPipeline
,BaselineRegressionPipeline
, etc.) have been deleted #2202Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. Pipelines can now be initialized by specifying the component graph as the first parameter, and then passing in optional arguments such as
custom_name
,parameters
, etc. For example,BinaryClassificationPipeline(["Random Forest Classifier"], parameters={})
. #2091Removed all old datasplitters from EvalML #2193
Deleted utility method
make_pipeline_from_components
#2218
- v0.23.0 Apr. 20, 2021
- Enhancements
Refactored
EngineBase
andSequentialEngine
api. AddingDaskEngine
#1975.Added optional
engine
argument toAutoMLSearch
#1975Added a warning about how time series support is still in beta when a user passes in a time series problem to
AutoMLSearch
#2118Added
NaturalLanguageNaNDataCheck
data check #2122Added ValueError to
partial_dependence
to prevent users from computing partial dependence on columns with all NaNs #2120Added standard deviation of cv scores to rankings table #2154
- Fixes
Fixed
BalancedClassificationDataCVSplit
,BalancedClassificationDataTVSplit
, andBalancedClassificationSampler
to useminority:majority
ratio instead ofmajority:minority
#2077Fixed bug where two-way partial dependence plots with categorical variables were not working correctly #2117
Fixed bug where
hyperparameters
were not displaying properly for pipelines with a listcomponent_graph
and duplicate components #2133Fixed bug where
pipeline_parameters
argument inAutoMLSearch
was not applied to pipelines passed in asallowed_pipelines
#2133Fixed bug where
AutoMLSearch
was not applying custom hyperparameters to pipelines with a listcomponent_graph
and duplicate components #2133
- Changes
Removed
hyperparameter_ranges
from Undersampler and renamedbalanced_ratio
tosampling_ratio
for samplers #2113Renamed
TARGET_BINARY_NOT_TWO_EXAMPLES_PER_CLASS
data check message code toTARGET_MULTICLASS_NOT_TWO_EXAMPLES_PER_CLASS
#2126Modified one-way partial dependence plots of categorical features to display data with a bar plot #2117
Renamed
score
column forautoml.rankings
asmean_cv_score
#2135Remove ‘warning’ from docs tool output #2031
Warning
- Breaking Changes
Renamed
balanced_ratio
tosampling_ratio
for theBalancedClassificationDataCVSplit
,BalancedClassificationDataTVSplit
,BalancedClassficationSampler
, and Undersampler #2113Deleted the “errors” key from automl results #1975
Deleted the
raise_and_save_error_callback
and thelog_and_save_error_callback
#1975Fixed
BalancedClassificationDataCVSplit
,BalancedClassificationDataTVSplit
, andBalancedClassificationSampler
to use minority:majority ratio instead of majority:minority #2077
- v0.22.0 Apr. 06, 2021
- Enhancements
Added a GitHub Action for
linux_unit_tests
#2013Added recommended actions for
InvalidTargetDataCheck
, updated_make_component_list_from_actions
to address new action, and addedTargetImputer
component #1989Updated
AutoMLSearch._check_for_high_variance
to not emitRuntimeWarning
#2024Added exception when pipeline passed to
explain_predictions
is aStacked Ensemble
pipeline #2033Added sensitivity at low alert rates as an objective #2001
Added
Undersampler
transformer component #2030
- Fixes
Updated Engine’s
train_batch
to apply undersampling #2038Fixed bug in where Time Series Classification pipelines were not encoding targets in
predict
andpredict_proba
#2040Fixed data splitting errors if target is float for classification problems #2050
Pinned
docutils
to <0.17 to fix ReadtheDocs warning issues #2088
Testing Changes
- v0.21.0 Mar. 24, 2021
- Enhancements
Changed
AutoMLSearch
to defaultoptimize_thresholds
to True #1943Added multiple oversampling and undersampling sampling methods as data splitters for imbalanced classification #1775
Added params to balanced classification data splitters for visibility #1966
Updated
make_pipeline
to not addImputer
if input data does not have numeric or categorical columns #1967Updated
ClassImbalanceDataCheck
to better handle multiclass imbalances #1986Added recommended actions for the output of data check’s
validate
method #1968Added error message for
partial_dependence
when features are mostly the same value #1994Updated
OneHotEncoder
to drop one redundant feature by default for features with two categories #1997Added a
PolynomialDetrender
component #1992Added
DateTimeNaNDataCheck
data check #2039
Documentation Changes
Warning
- Breaking Changes
Changed
AutoMLSearch
to defaultoptimize_thresholds
to True #1943Removed
data_checks
parameter,data_check_results
and data checks logic fromAutoMLSearch
. To run the data checks which were previously run by default inAutoMLSearch
, please callDefaultDataChecks().validate(X_train, y_train)
or take a look at our documentation for more examples. #1935Deleted
random_state
argument #1985
- v0.20.0 Mar. 10, 2021
- Enhancements
Added a GitHub Action for Detecting dependency changes #1933
Create a separate CV split to train stacked ensembler on for AutoMLSearch #1814
Added a GitHub Action for Linux unit tests #1846
Added
ARIMARegressor
estimator #1894Added
DataCheckAction
class andDataCheckActionCode
enum #1896Updated
Woodwork
requirement tov0.0.10
#1900Added
BalancedClassificationDataCVSplit
andBalancedClassificationDataTVSplit
to AutoMLSearch #1875Update default classification data splitter to use downsampling for highly imbalanced data #1875
Updated
describe_pipeline
to return more information, includingid
of pipelines used for ensemble models #1909Added utility method to create list of components from a list of
DataCheckAction
#1907Updated
validate
method to include aaction
key in returned dictionary for allDataCheck``and ``DataChecks
#1916Aggregating the shap values for predictions that we know the provenance of, e.g. OHE, text, and date-time. #1901
Improved error message when custom objective is passed as a string in
pipeline.score
#1941Added
score_pipelines
andtrain_pipelines
methods toAutoMLSearch
#1913Added support for
pandas
version 1.2.0 #1708Added
score_batch
andtrain_batch
abstact methods toEngineBase
and implementations inSequentialEngine
#1913Added ability to handle index columns in
AutoMLSearch
andDataChecks
#2138
- Fixes
Removed CI check for
check_dependencies_updated_linux
#1950Added metaclass for time series pipelines and fix binary classification pipeline
predict
not using objective if it is passed as a named argument #1874Fixed stack trace in prediction explanation functions caused by mixed string/numeric pandas column names #1871
Fixed stack trace caused by passing pipelines with duplicate names to
AutoMLSearch
#1932Fixed
AutoMLSearch.get_pipelines
returning pipelines with the same attributes #1958
- Changes
Reversed GitHub Action for Linux unit tests until a fix for report generation is found #1920
Updated
add_results
inAutoMLAlgorithm
to take in entire pipeline results dictionary fromAutoMLSearch
#1891Updated
ClassImbalanceDataCheck
to look for severe class imbalance scenarios #1905Deleted the
explain_prediction
function #1915Removed
HighVarianceCVDataCheck
and convered it to anAutoMLSearch
method instead #1928Removed warning in
InvalidTargetDataCheck
returned when numeric binary classification targets are not (0, 1) #1959
- Documentation Changes
Updated
model_understanding.ipynb
to demo the two-way partial dependence capability #1919
Testing Changes
Warning
- v0.19.0 Feb. 23, 2021
- Enhancements
Added a GitHub Action for Python windows unit tests #1844
Added a GitHub Action for checking updated release notes #1849
Added a GitHub Action for Python lint checks #1837
Adjusted
explain_prediction
,explain_predictions
andexplain_predictions_best_worst
to handle timeseries problems. #1818Updated
InvalidTargetDataCheck
to check for mismatched indices in target and features #1816Updated
Woodwork
structures returned from components to supportWoodwork
logical type overrides set by the user #1784Updated estimators to keep track of input feature names during
fit()
#1794Updated
visualize_decision_tree
to include feature names in output #1813Added
is_bounded_like_percentage
property for objectives. If true, thecalculate_percent_difference
method will return the absolute difference rather than relative difference #1809Added full error traceback to AutoMLSearch logger file #1840
Changed
TargetEncoder
to preserve custom indices in the data #1836Refactored
explain_predictions
andexplain_predictions_best_worst
to only compute features once for all rows that need to be explained #1843Added custom random undersampler data splitter for classification #1857
Updated
OutliersDataCheck
implementation to calculate the probability of having no outliers #1855Added
Engines
pipeline processing API #1838
- Fixes
Changed EngineBase random_state arg to random_seed and same for user guide docs #1889
- Changes
Modified
calculate_percent_difference
so that division by 0 is now inf rather than nan #1809Removed
text_columns
parameter fromLSA
andTextFeaturizer
components #1652Added
random_seed
as an argument to our automl/pipeline/component API. Usingrandom_state
will raise a warning #1798Added
DataCheckError
message inInvalidTargetDataCheck
if input target is None and removed exception raised #1866
Documentation Changes
Warning
- Breaking Changes
Added a deprecation warning to
explain_prediction
. It will be deleted in the next release. #1860
- v0.18.2 Feb. 10, 2021
- Enhancements
Added uniqueness score data check #1785
Added “dataframe” output format for prediction explanations #1781
Updated LightGBM estimators to handle
pandas.MultiIndex
#1770Sped up permutation importance for some pipelines #1762
Added sparsity data check #1797
Confirmed support for threshold tuning for binary time series classification problems #1803
Fixes
Changes
- Documentation Changes
Added section on conda to the contributing guide #1771
Updated release process to reflect freezing main before perf tests #1787
Moving some prs to the right section of the release notes #1789
Tweak README.md. #1800
Fixed back arrow on install page docs #1795
Fixed docstring for ClassImbalanceDataCheck.validate() #1817
Testing Changes
- v0.18.1 Feb. 1, 2021
- Enhancements
Added
graph_t_sne
as a visualization tool for high dimensional data #1731Added the ability to see the linear coefficients of features in linear models terms #1738
Added support for
scikit-learn
v0.24.0
#1733Added support for
scipy
v1.6.0
#1752Added SVM Classifier and Regressor to estimators #1714 #1761
- Fixes
Testing Changes
Warning
- v0.18.0 Jan. 26, 2021
- Enhancements
Added RMSLE, MSLE, and MAPE to core objectives while checking for negative target values in
invalid_targets_data_check
#1574Added validation checks for binary problems with regression-like datasets and multiclass problems without true multiclass targets in
invalid_targets_data_check
#1665Added time series support for
make_pipeline
#1566Added target name for output of pipeline
predict
method #1578Added multiclass check to
InvalidTargetDataCheck
for two examples per class #1596Added support for
graphviz
v0.16
#1657Enhanced time series pipelines to accept empty features #1651
Added KNN Classifier to estimators. #1650
Added support for list inputs for objectives #1663
Added support for
AutoMLSearch
to handle time series classification pipelines #1666Enhanced
DelayedFeaturesTransformer
to encode categorical features and targets before delaying them #1691Added 2-way dependence plots. #1690
Added ability to directly iterate through components within Pipelines #1583
- Fixes
Fixed inconsistent attributes and added Exceptions to docs #1673
Fixed
TargetLeakageDataCheck
to use Woodworkmutual_information
rather than using Pandas’ Pearson Correlation #1616Fixed thresholding for pipelines in
AutoMLSearch
to only threshold binary classification pipelines #1622 #1626Updated
load_data
to return Woodwork structures and update default parameter value forindex
toNone
#1610Pinned scipy at < 1.6.0 while we work on adding support #1629
Fixed data check message formatting in
AutoMLSearch
#1633Addressed stacked ensemble component for
scikit-learn
v0.24 support by settingshuffle=True
for default CV #1613Fixed bug where
Imputer
reset the index onX
#1590Fixed
AutoMLSearch
stacktrace when a cutom objective was passed in as a primary objective or additional objective #1575Fixed custom index bug for
MAPE
objective #1641Fixed index bug for
TextFeaturizer
andLSA
components #1644Limited
load_fraud
dataset loaded intoautoml.ipynb
#1646add_to_rankings
updatesAutoMLSearch.best_pipeline
when necessary #1647Fixed bug where time series baseline estimators were not receiving
gap
andmax_delay
inAutoMLSearch
#1645Fixed jupyter notebooks to help the RTD buildtime #1654
Added
positive_only
objectives tonon_core_objectives
#1661Fixed stacking argument
n_jobs
for IterativeAlgorithm #1706Updated CatBoost estimators to return self in
.fit()
rather than the underlying model for consistency #1701Added ability to initialize pipeline parameters in
AutoMLSearch
constructor #1676
- Changes
Added labeling to
graph_confusion_matrix
#1632Rerunning search for
AutoMLSearch
results in a message thrown rather than failing the search, and removedhas_searched
property #1647Changed tuner class to allow and ignore single parameter values as input #1686
Capped LightGBM version limit to remove bug in docs #1711
Removed support for np.random.RandomState in EvalML #1727
- Documentation Changes
Update Model Understanding in the user guide to include
visualize_decision_tree
#1678Updated docs to include information about
AutoMLSearch
callback parameters and methods #1577Updated docs to prompt users to install graphiz on Mac #1656
Added
infer_feature_types
to thestart.ipynb
guide #1700Added multicollinearity data check to API reference and docs #1707
Testing Changes
Warning
- Breaking Changes
Removed
has_searched
property fromAutoMLSearch
#1647Components and pipelines return
Woodwork
data structures instead ofpandas
data structures #1668Removed support for np.random.RandomState in EvalML. Rather than passing
np.random.RandomState
as component and pipeline random_state values, we use int random_seed #1727
- v0.17.0 Dec. 29, 2020
- Enhancements
Added
save_plot
that allows for saving figures from different backends #1588Added
LightGBM Regressor
to regression components #1459Added
visualize_decision_tree
for tree visualization withdecision_tree_data_from_estimator
anddecision_tree_data_from_pipeline
to reformat tree structure output #1511Added DFS Transformer component into transformer components #1454
Added
MAPE
to the standard metrics for time series problems and update objectives #1510Added
graph_prediction_vs_actual_over_time
andget_prediction_vs_actual_over_time_data
to the model understanding module for time series problems #1483Added a
ComponentGraph
class that will support future pipelines as directed acyclic graphs #1415Updated data checks to accept
Woodwork
data structures #1481Added parameter to
InvalidTargetDataCheck
to show only top unique values rather than all unique values #1485Added multicollinearity data check #1515
Added baseline pipeline and components for time series regression problems #1496
Added more information to users about ensembling behavior in
AutoMLSearch
#1527Add woodwork support for more utility and graph methods #1544
Changed
DateTimeFeaturizer
to encode features as int #1479Return trained pipelines from
AutoMLSearch.best_pipeline
#1547Added utility method so that users can set feature types without having to learn about Woodwork directly #1555
Added Linear Discriminant Analysis transformer for dimensionality reduction #1331
Added multiclass support for
partial_dependence
andgraph_partial_dependence
#1554Added
TimeSeriesBinaryClassificationPipeline
andTimeSeriesMulticlassClassificationPipeline
classes #1528Added
make_data_splitter
method for easier automl data split customization #1568Integrated
ComponentGraph
class into Pipelines for full non-linear pipeline support #1543Update
AutoMLSearch
constructor to take training data instead ofsearch
andadd_to_leaderboard
#1597Update
split_data
helper args #1597Add problem type utils
is_regression
,is_classification
,is_timeseries
#1597Rename
AutoMLSearch
data_split
arg todata_splitter
#1569
- Fixes
Fix AutoML not passing CV folds to
DefaultDataChecks
for usage byClassImbalanceDataCheck
#1619Fix Windows CI jobs: install
numba
via conda, required forshap
#1490Added custom-index support for reset-index-get_prediction_vs_actual_over_time_data #1494
Fix
generate_pipeline_code
to account for boolean and None differences between Python and JSON #1524 #1531Set max value for plotly and xgboost versions while we debug CI failures with newer versions #1532
Undo version pinning for plotly #1533
Fix ReadTheDocs build by updating the version of
setuptools
#1561Set
random_state
of data splitter in AutoMLSearch to take int to keep consistency in the resulting splits #1579Pin sklearn version while we work on adding support #1594
Pin pandas at <1.2.0 while we work on adding support #1609
Pin graphviz at < 0.16 while we work on adding support #1609
- Changes
Reverting
save_graph
#1550 to resolve kaleido build issues #1585Update circleci badge to apply to
main
#1489Added script to generate github markdown for releases #1487
Updated selection using pandas
dtypes
to selecting using Woodwork logical types #1551Updated dependencies to fix
ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes'
error and to address Woodwork and Featuretool dependencies #1540Made
get_prediction_vs_actual_data()
a public method #1553Updated
Woodwork
version requirement to v0.0.7 #1560Move data splitters from
evalml.automl.data_splitters
toevalml.preprocessing.data_splitters
#1597Rename “# Testing” in automl log output to “# Validation” #1597
- Testing Changes
Set
n_jobs=1
in most unit tests to reduce memory #1505
Warning
- Breaking Changes
Updated minimal dependencies:
numpy>=1.19.1
,pandas>=1.1.0
,scikit-learn>=0.23.1
,scikit-optimize>=0.8.1
Updated
AutoMLSearch.best_pipeline
to return a trained pipeline. Pass intrain_best_pipeline=False
to AutoMLSearch in order to return an untrained pipeline.Pipeline component instances can no longer be iterated through using
Pipeline.component_graph
#1543Update
AutoMLSearch
constructor to take training data instead ofsearch
andadd_to_leaderboard
#1597Update
split_data
helper args #1597Move data splitters from
evalml.automl.data_splitters
toevalml.preprocessing.data_splitters
#1597Rename
AutoMLSearch
data_split
arg todata_splitter
#1569
- v0.16.1 Dec. 1, 2020
- v0.16.0 Nov. 24, 2020
- Enhancements
Updated pipelines and
make_pipeline
to acceptWoodwork
inputs #1393Updated components to accept
Woodwork
inputs #1423Added ability to freeze hyperparameters for
AutoMLSearch
#1284Added
Target Encoder
into transformer components #1401Added callback for error handling in
AutoMLSearch
#1403Added the index id to the
explain_predictions_best_worst
output to help users identify which rows in their data are included #1365The top_k features displayed in
explain_predictions_*
functions are now determined by the magnitude of shap values as opposed to thetop_k
largest and smallest shap values. #1374Added a problem type for time series regression #1386
Added a
is_defined_for_problem_type
method toObjectiveBase
#1386Added a
random_state
parameter tomake_pipeline_from_components
function #1411Added
DelayedFeaturesTransformer
#1396Added a
TimeSeriesRegressionPipeline
class #1418Removed
core-requirements.txt
from the package distribution #1429Updated data check messages to include a “code” and “details” fields #1451, #1462
Added a
TimeSeriesSplit
data splitter for time series problems #1441Added a
problem_configuration
parameter to AutoMLSearch #1457
- Fixes
Fixed
IndexError
raised inAutoMLSearch
whenensembling = True
but only one pipeline to iterate over #1397Fixed stacked ensemble input bug and LightGBM warning and bug in
AutoMLSearch
#1388Updated enum classes to show possible enum values as attributes #1391
Updated calls to
Woodwork
’sto_pandas()
toto_series()
andto_dataframe()
#1428Fixed bug in OHE where column names were not guaranteed to be unique #1349
Fixed bug with percent improvement of
ExpVariance
objective on data with highly skewed target #1467Fix SimpleImputer error which occurs when all features are bool type #1215
- Changes
Changed
OutliersDataCheck
to return the list of columns, rather than rows, that contain outliers #1377Simplified and cleaned output for Code Generation #1371
Updated data checks to return dictionary of warnings and errors instead of a list #1448
Updated
AutoMLSearch
to passWoodwork
data structures to every pipeline (instead of pandas DataFrames) #1450Update
AutoMLSearch
to default tomax_batches=1
instead ofmax_iterations=5
#1452Updated _evaluate_pipelines to consolidate side effects #1410
- Documentation Changes
Added description of CLA to contributing guide, updated description of draft PRs #1402
Updated documentation to include all data checks,
DataChecks
, and usage of data checks in AutoML #1412Updated docstrings from
np.array
tonp.ndarray
#1417Added section on stacking ensembles in AutoMLSearch documentation #1425
- Testing Changes
Removed
category_encoders
from test-requirements.txt #1373Tweak codecov.io settings again to avoid flakes #1413
Modified
make lint
to check notebook versions in the docs #1431Modified
make lint-fix
to standardize notebook versions in the docs #1431Use new version of pull request Github Action for dependency check (#1443)
Reduced number of workers for tests to 4 #1447
Warning
- Breaking Changes
The
top_k
andtop_k_features
parameters inexplain_predictions_*
functions now returnk
features as opposed to2 * k
features #1374Renamed
problem_type
toproblem_types
inRegressionObjective
,BinaryClassificationObjective
, andMulticlassClassificationObjective
#1319Data checks now return a dictionary of warnings and errors instead of a list #1448
- v0.15.0 Oct. 29, 2020
- Enhancements
Added stacked ensemble component classes (
StackedEnsembleClassifier
,StackedEnsembleRegressor
) #1134Added stacked ensemble components to
AutoMLSearch
#1253Added
DecisionTreeClassifier
andDecisionTreeRegressor
to AutoML #1255Added
graph_prediction_vs_actual
inmodel_understanding
for regression problems #1252Added parameter to
OneHotEncoder
to enable filtering for features to encode for #1249Added percent-better-than-baseline for all objectives to automl.results #1244
Added
HighVarianceCVDataCheck
and replaced synonymous warning inAutoMLSearch
#1254Added PCA Transformer component for dimensionality reduction #1270
Added
generate_pipeline_code
andgenerate_component_code
to allow for code generation given a pipeline or component instance #1306Added
PCA Transformer
component for dimensionality reduction #1270Updated
AutoMLSearch
to supportWoodwork
data structures #1299Added cv_folds to
ClassImbalanceDataCheck
and added this check toDefaultDataChecks
#1333Make
max_batches
argument toAutoMLSearch.search
public #1320Added text support to automl search #1062
Added
_pipelines_per_batch
as a private argument toAutoMLSearch
#1355
- Fixes
Fixed ML performance issue with ordered datasets: always shuffle data in automl’s default CV splits #1265
Fixed broken
evalml info
CLI command #1293Fixed
boosting type='rf'
for LightGBM Classifier, as well asnum_leaves
error #1302Fixed bug in
explain_predictions_best_worst
where a custom index in the target variable would cause aValueError
#1318Added stacked ensemble estimators to to
evalml.pipelines.__init__
file #1326Fixed bug in OHE where calls to transform were not deterministic if
top_n
was less than the number of categories in a column #1324Fixed LightGBM warning messages during AutoMLSearch #1342
Fix warnings thrown during AutoMLSearch in
HighVarianceCVDataCheck
#1346Fixed bug where TrainingValidationSplit would return invalid location indices for dataframes with a custom index #1348
Fixed bug where the AutoMLSearch
random_state
was not being passed to the created pipelines #1321
- Changes
Allow
add_to_rankings
to be called before AutoMLSearch is called #1250Removed Graphviz from test-requirements to add to requirements.txt #1327
Removed
max_pipelines
parameter fromAutoMLSearch
#1264Include editable installs in all install make targets #1335
Made pip dependencies featuretools and nlp_primitives core dependencies #1062
Removed PartOfSpeechCount from TextFeaturizer transform primitives #1062
Added warning for
partial_dependency
when the feature includes null values #1352
- Documentation Changes
Fixed and updated code blocks in Release Notes #1243
Added DecisionTree estimators to API Reference #1246
Changed class inheritance display to flow vertically #1248
Updated cost-benefit tutorial to use a holdout/test set #1159
Added
evalml info
command to documentation #1293Miscellaneous doc updates #1269
Removed conda pre-release testing from the release process document #1282
Updates to contributing guide #1310
Added Alteryx footer to docs with Twitter and Github link #1312
Added documentation for evalml installation for Python 3.6 #1322
Added documentation changes to make the API Docs easier to understand #1323
Fixed documentation for
feature_importance
#1353Added tutorial for running AutoML with text data #1357
Added documentation for woodwork integration with automl search #1361
- Testing Changes
Added tests for
jupyter_check
to handle IPython #1256Cleaned up
make_pipeline
tests to test for all estimators #1257Added a test to check conda build after merge to main #1247
Removed code that was lacking codecov for
__main__.py
and unnecessary #1293Codecov: round coverage up instead of down #1334
Add DockerHub credentials to CI testing environment #1356
Add DockerHub credentials to conda testing environment #1363
Warning
- Breaking Changes
Renamed
LabelLeakageDataCheck
toTargetLeakageDataCheck
#1319max_pipelines
parameter has been removed fromAutoMLSearch
. Please usemax_iterations
instead. #1264AutoMLSearch.search()
will now log a warning if the input is not aWoodwork
data structure (pandas
,numpy
) #1299Make
max_batches
argument toAutoMLSearch.search
public #1320Removed unused argument feature_types from AutoMLSearch.search #1062
- v0.14.1 Sep. 29, 2020
- Enhancements
Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns #1150
Added
get_feature_names
onOneHotEncoder
#1193Added
detect_problem_type
toproblem_type/utils.py
to automatically detect the problem type given targets #1194Added LightGBM to
AutoMLSearch
#1199Updated
scikit-learn
andscikit-optimize
to use latest versions - 0.23.2 and 0.8.1 respectively #1141Added
__str__
and__repr__
for pipelines and components #1218Included internal target check for both training and validation data in
AutoMLSearch
#1226Added
ProblemTypes.all_problem_types
helper to get list of supported problem types #1219Added
DecisionTreeClassifier
andDecisionTreeRegressor
classes #1223Added
ProblemTypes.all_problem_types
helper to get list of supported problem types #1219DataChecks
can now be parametrized by passing a list ofDataCheck
classes and a parameter dictionary #1167Added first CV fold score as validation score in
AutoMLSearch.rankings
#1221Updated
flake8
configuration to enable linting on__init__.py
files #1234Refined
make_pipeline_from_components
implementation #1204
- Changes
Added
allow_writing_files
as a named argument to CatBoost estimators. #1202Added
solver
andmulti_class
as named arguments toLogisticRegressionClassifier
#1202Replaced pipeline’s
._transform
method to evaluate all the preprocessing steps of a pipeline with.compute_estimator_features
#1231Changed default large dataset train/test splitting behavior #1205
- Documentation Changes
Included description of how to access the component instances and features for pipeline user guide #1163
Updated API docs to refer to target as “target” instead of “labels” for non-classification tasks and minor docs cleanup #1160
Added Class Imbalance Data Check to
api_reference.rst
#1190 #1200Added pipeline properties to API reference #1209
Clarified what the objective parameter in AutoML is used for in AutoML API reference and AutoML user guide #1222
Updated API docs to include
skopt.space.Categorical
option for component hyperparameter range definition #1228Added install documentation for
libomp
in order to use LightGBM on Mac #1233Improved description of
max_iterations
in documentation #1212Removed unused code from sphinx conf #1235
Testing Changes
Warning
- Breaking Changes
DefaultDataChecks
now accepts aproblem_type
parameter that must be specified #1167Pipeline’s
._transform
method to evaluate all the preprocessing steps of a pipeline has been replaced with.compute_estimator_features
#1231get_objectives
has been renamed toget_core_objectives
. This function will now return a list of valid objective instances #1230
- v0.13.2 Sep. 17, 2020
- Enhancements
Added
output_format
field to explain predictions functions #1107Modified
get_objective
andget_objectives
to be able to return any objective inevalml.objectives
#1132Added a
return_instance
boolean parameter toget_objective
#1132Added
ClassImbalanceDataCheck
to determine whether target imbalance falls below a given threshold #1135Added label encoder to LightGBM for binary classification #1152
Added labels for the row index of confusion matrix #1154
Added
AutoMLSearch
object as another parameter in search callbacks #1156Added the corresponding probability threshold for each point displayed in
graph_roc_curve
#1161Added
__eq__
forComponentBase
andPipelineBase
#1178Added support for multiclass classification for
roc_curve
#1164Added
categories
accessor toOneHotEncoder
for listing the categories associated with a feature #1182Added utility function to create pipeline instances from a list of component instances #1176
- Fixes
Fixed XGBoost column names for partial dependence methods #1104
Removed dead code validating column type from
TextFeaturizer
#1122Fixed issue where
Imputer
cannot fit when there is None in a categorical or boolean column #1144OneHotEncoder
preserves the custom index in the input data #1146Fixed representation for
ModelFamily
#1165Removed duplicate
nbsphinx
dependency indev-requirements.txt
#1168Users can now pass in any valid kwargs to all estimators #1157
Remove broken accessor
OneHotEncoder.get_feature_names
and unneeded base class #1179Removed LightGBM Estimator from AutoML models #1186
- Documentation Changes
Fixed API docs for
AutoMLSearch
add_result_callback
#1113Added a step to our release process for pushing our latest version to conda-forge #1118
Added warning for missing ipywidgets dependency for using
PipelineSearchPlots
on Jupyterlab #1145Updated
README.md
example to load demo dataset #1151Swapped mapping of breast cancer targets in
model_understanding.ipynb
#1170
Warning
- Breaking Changes
get_objective
will now return a class definition rather than an instance by default #1132Deleted
OPTIONS
dictionary inevalml.objectives.utils.py
#1132If specifying an objective by string, the string must now match the objective’s name field, case-insensitive #1132
- Passing “Cost Benefit Matrix”, “Fraud Cost”, “Lead Scoring”, “Mean Squared Log Error”,
“Recall”, “Recall Macro”, “Recall Micro”, “Recall Weighted”, or “Root Mean Squared Log Error” to
AutoMLSearch
will now result in aValueError
rather than anObjectiveNotFoundError
#1132
Search callbacks
start_iteration_callback
andadd_results_callback
have changed to include a copy of the AutoMLSearch object as a third parameter #1156Deleted
OneHotEncoder.get_feature_names
method which had been broken for a while, in favor of pipelines’input_feature_names
#1179Deleted empty base class
CategoricalEncoder
whichOneHotEncoder
component was inheriting from #1176Results from
roc_curve
will now return as a list of dictionaries with each dictionary representing a class #1164max_pipelines
now raises aDeprecationWarning
and will be removed in the next release.max_iterations
should be used instead. #1169
- v0.13.1 Aug. 25, 2020
- Enhancements
Added Cost-Benefit Matrix objective for binary classification #1038
Split
fill_value
intocategorical_fill_value
andnumeric_fill_value
for Imputer #1019Added
explain_predictions
andexplain_predictions_best_worst
for explaining multiple predictions with SHAP #1016Added new LSA component for text featurization #1022
Added guide on installing with conda #1041
Added a “cost-benefit curve” util method to graph cost-benefit matrix scores vs. binary classification thresholds #1081
Standardized error when calling transform/predict before fit for pipelines #1048
Added
percent_better_than_baseline
to AutoML search rankings and full rankings table #1050Added one-way partial dependence and partial dependence plots #1079
Added “Feature Value” column to prediction explanation reports. #1064
Added
max_batches
parameter toAutoMLSearch
#1087
- Fixes
Updated
TextFeaturizer
component to no longer require an internet connection to run #1022Fixed non-deterministic element of
TextFeaturizer
transformations #1022Added a StandardScaler to all ElasticNet pipelines #1065
Updated cost-benefit matrix to normalize score #1099
Fixed logic in
calculate_percent_difference
so that it can handle negative values #1100
- Changes
Added
needs_fitting
property toComponentBase
#1044Updated references to data types to use datatype lists defined in
evalml.utils.gen_utils
#1039Remove maximum version limit for SciPy dependency #1051
Moved
all_components
and other component importers into runtime methods #1045Consolidated graphing utility methods under
evalml.utils.graph_utils
#1060Made slight tweaks to how
TextFeaturizer
usesfeaturetools
, and did some refactoring of that and of LSA #1090Changed
show_all_features
parameter intoimportance_threshold
, which allows for thresholding feature importance #1097, #1103
Warning
- v0.12.2 Aug. 6, 2020
- v0.12.0 Aug. 3, 2020
- Enhancements
Added string and categorical targets support for binary and multiclass pipelines and check for numeric targets for
DetectLabelLeakage
data check #932Added clear exception for regression pipelines if target datatype is string or categorical #960
Added target column names and class labels in
predict
andpredict_proba
output for pipelines #951Added
_compute_shap_values
andnormalize_values
topipelines/explanations
module #958Added
explain_prediction
feature which explains single predictions with SHAP #974Added Imputer to allow different imputation strategies for numerical and categorical dtypes #991
Added support for configuring logfile path using env var, and don’t create logger if there are filesystem errors #975
Updated catboost estimators’ default parameters and automl hyperparameter ranges to speed up fit time #998
- Fixes
Fixed ReadtheDocs warning failure regarding embedded gif #943
Removed incorrect parameter passed to pipeline classes in
_add_baseline_pipelines
#941Added universal error for calling
predict
,predict_proba
,transform
, andfeature_importances
before fitting #969, #994Made
TextFeaturizer
component and pip dependenciesfeaturetools
andnlp_primitives
optional #976Updated imputation strategy in automl to no longer limit impute strategy to
most_frequent
for all features if there are any categorical columns #991Fixed
UnboundLocalError
forcv_pipeline
when automl search errors #996Fixed
Imputer
to reset dataframe index to preserve behavior expected fromSimpleImputer
#1009
- Changes
Moved
get_estimators
toevalml.pipelines.components.utils
#934Modified Pipelines to raise
PipelineScoreError
when they encounter an error during scoring #936Moved
evalml.model_families.list_model_families
toevalml.pipelines.components.allowed_model_families
#959Renamed
DateTimeFeaturization
toDateTimeFeaturizer
#977Added check to stop search and raise an error if all pipelines in a batch return NaN scores #1015
- Documentation Changes
Updated
README.md
#963Reworded message when errors are returned from data checks in search #982
Added section on understanding model predictions with
explain_prediction
to User Guide #981Added a section to the user guide and api reference about how XGBoost and CatBoost are not fully supported. #992
Added custom components section in user guide #993
Updated FAQ section formatting #997
Updated release process documentation #1003
Warning
- Breaking Changes
get_estimators
has been moved toevalml.pipelines.components.utils
(previously was underevalml.pipelines.utils
) #934Removed the
raise_errors
flag in AutoML search. All errors during pipeline evaluation will be caught and logged. #936evalml.model_families.list_model_families
has been moved toevalml.pipelines.components.allowed_model_families
#959TextFeaturizer
: thefeaturetools
andnlp_primitives
packages must be installed after installing evalml in order to use this component #976Renamed
DateTimeFeaturization
toDateTimeFeaturizer
#977
- v0.11.2 July 16, 2020
- Enhancements
Added
NoVarianceDataCheck
toDefaultDataChecks
#893Added text processing and featurization component
TextFeaturizer
#913, #924Added additional checks to
InvalidTargetDataCheck
to handle invalid target data types #929AutoMLSearch
will now handleKeyboardInterrupt
and prompt user for confirmation #915
- Fixes
Makes automl results a read-only property #919
- Changes
Deleted static pipelines and refactored tests involving static pipelines, removed
all_pipelines()
andget_pipelines()
#904Moved
list_model_families
toevalml.model_family.utils
#903Updated
all_pipelines
,all_estimators
,all_components
to use the same mechanism for dynamically generating their elements #898Rename
master
branch tomain
#918Add pypi release github action #923
Updated
AutoMLSearch.search
stdout output and logging and removed tqdm progress bar #921Moved automl config checks previously in
search()
to init #933
- Testing Changes
Cleaned up fixture names and usages in tests #895
Warning
- Breaking Changes
list_model_families
has been moved toevalml.model_family.utils
(previously was underevalml.pipelines.utils
) #903get_estimators
has been moved toevalml.pipelines.components.utils
(previously was underevalml.pipelines.utils
) #934Static pipeline definitions have been removed, but similar pipelines can still be constructed via creating an instance of
PipelineBase
#904all_pipelines()
andget_pipelines()
utility methods have been removed #904
- v0.11.0 June 30, 2020
- Enhancements
Added multiclass support for ROC curve graphing #832
Added preprocessing component to drop features whose percentage of NaN values exceeds a specified threshold #834
Added data check to check for problematic target labels #814
Added PerColumnImputer that allows imputation strategies per column #824
Added transformer to drop specific columns #827
Added support for
categories
,handle_error
, anddrop
parameters inOneHotEncoder
#830 #897Added preprocessing component to handle DateTime columns featurization #838
Added ability to clone pipelines and components #842
Define getter method for component
parameters
#847Added utility methods to calculate and graph permutation importances #860, #880
Added new utility functions necessary for generating dynamic preprocessing pipelines #852
Added kwargs to all components #863
Updated
AutoSearchBase
to use dynamically generated preprocessing pipelines #870Added SelectColumns transformer #873
Added ability to evaluate additional pipelines for automl search #874
Added
default_parameters
class property to components and pipelines #879Added better support for disabling data checks in automl search #892
Added ability to save and load AutoML objects to file #888
Updated
AutoSearchBase.get_pipelines
to return an untrained pipeline instance #876Saved learned binary classification thresholds in automl results cv data dict #876
- Fixes
Fixed bug where SimpleImputer cannot handle dropped columns #846
Fixed bug where PerColumnImputer cannot handle dropped columns #855
Enforce requirement that builtin components save all inputted values in their parameters dict #847
Don’t list base classes in
all_components
output #847Standardize all components to output pandas data structures, and accept either pandas or numpy #853
Fixed rankings and full_rankings error when search has not been run #894
- Changes
Update
all_pipelines
andall_components
to try initializing pipelines/components, and on failure exclude them #849Refactor
handle_components
tohandle_components_class
, standardize toComponentBase
subclass instead of instance #850Refactor “blacklist”/”whitelist” to “allow”/”exclude” lists #854
Replaced
AutoClassificationSearch
andAutoRegressionSearch
withAutoMLSearch
#871Renamed feature_importances and permutation_importances methods to use singular names (feature_importance and permutation_importance) #883
Updated
automl
default data splitter to train/validation split for large datasets #877Added open source license, update some repo metadata #887
Removed dead code in
_get_preprocessing_components
#896
- Documentation Changes
Fix some typos and update the EvalML logo #872
Warning
- Breaking Changes
Pipelines’ static
component_graph
field must contain eitherComponentBase
subclasses orstr
, instead ofComponentBase
subclass instances #850Rename
handle_component
tohandle_component_class
. Now standardizes toComponentBase
subclasses instead ofComponentBase
subclass instances #850Renamed automl’s
cv
argument todata_split
#877Pipelines’ and classifiers’
feature_importances
is renamedfeature_importance
,graph_feature_importances
is renamedgraph_feature_importance
#883Passing
data_checks=None
to automl search will not perform any data checks as opposed to default checks. #892Pipelines to search for in AutoML are now determined automatically, rather than using the statically-defined pipeline classes. #870
Updated
AutoSearchBase.get_pipelines
to return an untrained pipeline instance, instead of one which happened to be trained on the final cross-validation fold #876
- v0.10.0 May 29, 2020
- Enhancements
Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML #746
Port over highly-null guardrail as a data check and define
DefaultDataChecks
andDisableDataChecks
classes #745Update
Tuner
classes to work directly with pipeline parameters dicts instead of flat parameter lists #779Add Elastic Net as a pipeline option #812
Added new Pipeline option
ExtraTrees
#790Added precicion-recall curve metrics and plot for binary classification problems in
evalml.pipeline.graph_utils
#794Update the default automl algorithm to search in batches, starting with default parameters for each pipeline and iterating from there #793
Added
AutoMLAlgorithm
class andIterativeAlgorithm
impl, separated fromAutoSearchBase
#793
- Fixes
Update pipeline
score
to returnnan
score for any objective which throws an exception during scoring #787Fixed bug introduced in #787 where binary classification metrics requiring predicted probabilities error in scoring #798
CatBoost and XGBoost classifiers and regressors can no longer have a learning rate of 0 #795
- Changes
Cleanup pipeline
score
code, and cleanup codecov #711Remove
pass
for abstract methods for codecov #730Added __str__ for AutoSearch object #675
Add util methods to graph ROC and confusion matrix #720
Refactor
AutoBase
toAutoSearchBase
#758Updated AutoBase with
data_checks
parameter, removed previousdetect_label_leakage
parameter, and added functionality to run data checks before search in AutoML #765Updated our logger to use Python’s logging utils #763
Refactor most of
AutoSearchBase._do_iteration
impl intoAutoSearchBase._evaluate
#762Port over all guardrails to use the new DataCheck API #789
Expanded
import_or_raise
to catch all exceptions #759Adds RMSE, MSLE, RMSLE as standard metrics #788
Don’t allow
Recall
to be used as an objective for AutoML #784Removed feature selection from pipelines #819
Update default estimator parameters to make automl search faster and more accurate #793
- Testing Changes
Delete codecov yml, use codecov.io’s default #732
Added unit tests for fraud cost, lead scoring, and standard metric objectives #741
Update codecov client #782
Updated AutoBase __str__ test to include no parameters case #783
Added unit tests for
ExtraTrees
pipeline #790If codecov fails to upload, fail build #810
Updated Python version of dependency action #816
Update the dependency update bot to use a suffix when creating branches #817
Warning
- Breaking Changes
The
detect_label_leakage
parameter for AutoML classes has been removed and replaced by adata_checks
parameter #765Moved ROC and confusion matrix methods from
evalml.pipeline.plot_utils
toevalml.pipeline.graph_utils
#720Tuner
classes require a pipeline hyperparameter range dict as an init arg instead of a space definition #779Tuner.propose
andTuner.add
work directly with pipeline parameters dicts instead of flat parameter lists #779PipelineBase.hyperparameters
andcustom_hyperparameters
use pipeline parameters dict format instead of being represented as a flat list #779All guardrail functions previously under
evalml.guardrails.utils
will be removed and replaced by data checks #789Recall
disallowed as an objective for AutoML #784AutoSearchBase
parametertuner
has been renamed totuner_class
#793AutoSearchBase
parameterpossible_pipelines
andpossible_model_families
have been renamed toallowed_pipelines
andallowed_model_families
#793
- v0.9.0 Apr. 27, 2020
- Enhancements
Added
Accuracy
as an standard objective #624Added verbose parameter to load_fraud #560
Added Balanced Accuracy metric for binary, multiclass #612 #661
Added XGBoost regressor and XGBoost regression pipeline #666
Added
Accuracy
metric for multiclass #672Added objective name in
AutoBase.describe_pipeline
#686Added
DataCheck
andDataChecks
,Message
classes and relevant subclasses #739
- Fixes
Removed direct access to
cls.component_graph
#595Add testing files to .gitignore #625
Remove circular dependencies from
Makefile
#637Add error case for
normalize_confusion_matrix()
#640Fixed
XGBoostClassifier
andXGBoostRegressor
bug with feature names that contain [, ], or < #659Update
make_pipeline_graph
to not accidentally create empty file when testing if path is valid #649Fix pip installation warning about docsutils version, from boto dependency #664
Removed zero division warning for F1/precision/recall metrics #671
Fixed
summary
for pipelines without estimators #707
- Changes
Updated default objective for binary/multiclass classification to log loss #613
Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes #405
Changed the output of
score
to return one dictionary #429Created binary and multiclass objective subclasses #504
Updated objectives API #445
Removed call to
get_plot_data
from AutoML #615Set
raise_error
to default to True for AutoML classes #638Remove unnecessary “u” prefixes on some unicode strings #641
Changed one-hot encoder to return uint8 dtypes instead of ints #653
Pipeline
_name
field changed tocustom_name
#650Removed
graphs.py
and moved methods intoPipelineBase
#657, #665Remove s3fs as a dev dependency #664
Changed requirements-parser to be a core dependency #673
Replace
supported_problem_types
field on pipelines withproblem_type
attribute on base classes #678Changed AutoML to only show best results for a given pipeline template in
rankings
, addedfull_rankings
property to show all #682Update
ModelFamily
values: don’t list xgboost/catboost as classifiers now that we have regression pipelines for them #677Changed AutoML’s
describe_pipeline
to get problem type from pipeline instead #685Standardize
import_or_raise
error messages #683Updated argument order of objectives to align with sklearn’s #698
Renamed
pipeline.feature_importance_graph
topipeline.graph_feature_importances
#700Moved ROC and confusion matrix methods to
evalml.pipelines.plot_utils
#704Renamed
MultiClassificationObjective
toMulticlassClassificationObjective
, to align with pipeline naming scheme #715
- Documentation Changes
Fixed some sphinx warnings #593
Fixed docstring for
AutoClassificationSearch
with correct command #599Limit readthedocs formats to pdf, not htmlzip and epub #594 #600
Clean up objectives API documentation #605
Fixed function on Exploring search results page #604
Update release process doc #567
AutoClassificationSearch
andAutoRegressionSearch
show inherited methods in API reference #651Fixed improperly formatted code in breaking changes for changelog #655
Added configuration to treat Sphinx warnings as errors #660
Removed separate plotting section for pipelines in API reference #657, #665
Have leads example notebook load S3 files using https, so we can delete s3fs dev dependency #664
Categorized components in API reference and added descriptions for each category #663
Fixed Sphinx warnings about
BalancedAccuracy
objective #669Updated API reference to include missing components and clean up pipeline docstrings #689
Reorganize API ref, and clarify pipeline sub-titles #688
Add and update preprocessing utils in API reference #687
Added inheritance diagrams to API reference #695
Documented which default objective AutoML optimizes for #699
Create seperate install page #701
Include more utils in API ref, like
import_or_raise
#704Add more color to pipeline documentation #705
- Testing Changes
Matched install commands of
check_latest_dependencies
test and it’s GitHub action #578Added Github app to auto assign PR author as assignee #477
Removed unneeded conda installation of xgboost in windows checkin tests #618
Update graph tests to always use tmpfile dir #649
Changelog checkin test workaround for release PRs: If ‘future release’ section is empty of PR refs, pass check #658
Add changelog checkin test exception for
dep-update
branch #723
Warning
Breaking Changes
Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.
fit()
andpredict()
now use an optionalobjective
parameter, which is only used in binary classification pipelines to fit for a specific objective.score()
will now use a requiredobjectives
parameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline’s objective was scored on regardless.score()
will now return one dictionary of all objective scores.ROC
andConfusionMatrix
plot methods viaAuto(*).plot
have been removed by #615 and are replaced byroc_curve
andconfusion_matrix
inevamlm.pipelines.plot_utils
in #704normalize_confusion_matrix
has been moved toevalml.pipelines.plot_utils
#704Pipelines
_name
field changed tocustom_name
Pipelines
supported_problem_types
field is removed because it is no longer necessary #678Updated argument order of objectives’
objective_function
to align with sklearn #698pipeline.feature_importance_graph
has been renamed topipeline.graph_feature_importances
in #700Removed unsupported
MSLE
objective #704
- v0.8.0 Apr. 1, 2020
- Enhancements
Add normalization option and information to confusion matrix #484
Add util function to drop rows with NaN values #487
Renamed
PipelineBase.name
asPipelineBase.summary
and redefinedPipelineBase.name
as class property #491Added access to parameters in Pipelines with
PipelineBase.parameters
(used to be return ofPipelineBase.describe
) #501Added
fill_value
parameter forSimpleImputer
#509Added functionality to override component hyperparameters and made pipelines take hyperparemeters from components #516
Allow
numpy.random.RandomState
for random_state parameters #556
- Fixes
Removed unused dependency
matplotlib
, and movecategory_encoders
to test reqs #572
- Changes
Undo version cap in XGBoost placed in #402 and allowed all released of XGBoost #407
Support pandas 1.0.0 #486
Made all references to the logger static #503
Refactored
model_type
parameter for components and pipelines tomodel_family
#507Refactored
problem_types
for pipelines and components intosupported_problem_types
#515Moved
pipelines/utils.save_pipeline
andpipelines/utils.load_pipeline
toPipelineBase.save
andPipelineBase.load
#526Limit number of categories encoded by
OneHotEncoder
#517
Warning
Breaking Changes
AutoClassificationSearch
andAutoRegressionSearch
’smodel_types
parameter has been refactored intoallowed_model_families
ModelTypes
enum has been changed toModelFamily
Components and Pipelines now have a
model_family
field instead ofmodel_type
get_pipelines
utility function now acceptsmodel_families
as an argument instead ofmodel_types
PipelineBase.name
no longer returns structure of pipeline and has been replaced byPipelineBase.summary
PipelineBase.problem_types
andEstimator.problem_types
has been renamed tosupported_problem_types
pipelines/utils.save_pipeline
andpipelines/utils.load_pipeline
moved toPipelineBase.save
andPipelineBase.load
- v0.7.0 Mar. 9, 2020
- Enhancements
Added emacs buffers to .gitignore #350
Add CatBoost (gradient-boosted trees) classification and regression components and pipelines #247
Added Tuner abstract base class #351
Added
n_jobs
as parameter forAutoClassificationSearch
andAutoRegressionSearch
#403Changed colors of confusion matrix to shades of blue and updated axis order to match scikit-learn’s #426
Added
PipelineBase
.graph
and.feature_importance_graph
methods, moved from previous location #423Added support for python 3.8 #462
- Changes
Added
n_estimators
as a tunable parameter for XGBoost #307Remove unused parameter
ObjectiveBase.fit_needs_proba
#320Remove extraneous parameter
component_type
from all components #361Remove unused
rankings.csv
file #397Downloaded demo and test datasets so unit tests can run offline #408
Remove
_needs_fitting
attribute from Components #398Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all #413
Refactored
PipelineBase
to take in parameter dictionary and moved pipeline metadata to class attribute #421Dropped support for Python 3.5 #438
Removed unused
apply.py
file #449Clean up
requirements.txt
to remove unused deps #451Support installation without all required dependencies #459
- Documentation Changes
Update release.md with instructions to release to internal license key #354
- Testing Changes
Added tests for utils (and moved current utils to gen_utils) #297
Moved XGBoost install into it’s own separate step on Windows using Conda #313
Rewind pandas version to before 1.0.0, to diagnose test failures for that version #325
Added dependency update checkin test #324
Rewind XGBoost version to before 1.0.0 to diagnose test failures for that version #402
Update dependency check to use a whitelist #417
Update unit test jobs to not install dev deps #455
Warning
Breaking Changes
Python 3.5 will not be actively supported.
- v0.6.0 Dec. 16, 2019
- Enhancements
Added ability to create a plot of feature importances #133
Add early stopping to AutoML using patience and tolerance parameters #241
Added ROC and confusion matrix metrics and plot for classification problems and introduce PipelineSearchPlots class #242
Enhanced AutoML results with search order #260
Added utility function to show system and environment information #300
- Changes
Renamed automl classes to
AutoRegressionSearch
andAutoClassificationSearch
#287Updating demo datasets to retain column names #223
Moving pipeline visualization to
PipelinePlot
class #228Standarizing inputs as
pd.Dataframe
/pd.Series
#130Enforcing that pipelines must have an estimator as last component #277
Added
ipywidgets
as a dependency inrequirements.txt
#278Added Random and Grid Search Tuners #240
Warning
Breaking Changes
The
fit()
method forAutoClassifier
andAutoRegressor
has been renamed tosearch()
.AutoClassifier
has been renamed toAutoClassificationSearch
AutoRegressor
has been renamed toAutoRegressionSearch
AutoClassificationSearch.results
andAutoRegressionSearch.results
now is a dictionary withpipeline_results
andsearch_order
keys.pipeline_results
can be used to access a dictionary that is identical to the old.results
dictionary. Whereas,search_order
returns a list of the search order in terms ofpipeline_id
.Pipelines now require an estimator as the last component in
component_list
. Slicing pipelines now throws anNotImplementedError
to avoid returning pipelines without an estimator.
- v0.5.2 Nov. 18, 2019
- v0.5.1 Nov. 15, 2019
- v0.5.0 Oct. 29, 2019
- Enhancements
Added basic one hot encoding #73
Use enums for model_type #110
Support for splitting regression datasets #112
Auto-infer multiclass classification #99
Added support for other units in
max_time
#125Detect highly null columns #121
Added additional regression objectives #100
Show an interactive iteration vs. score plot when using fit() #134
- v0.4.1 Sep. 16, 2019
- Enhancements
Added AutoML for classification and regressor using Autobase and Skopt #7 #9
Implemented standard classification and regression metrics #7
Added logistic regression, random forest, and XGBoost pipelines #7
Implemented support for custom objectives #15
Feature importance for pipelines #18
Serialization for pipelines #19
Allow fitting on objectives for optimal threshold #27
Added detect label leakage #31
Implemented callbacks #42
Allow for multiclass classification #21
Added support for additional objectives #79
- Testing Changes
Added testing for loading data #39
- v0.2.0 Aug. 13, 2019
- Enhancements
Created fraud detection objective #4
- v0.1.0 July. 31, 2019