Release Notes#
- Future Releases
Enhancements
- Fixes
Switched windows nightly tests to run serially instead of in parallel #4452
Changes
Documentation Changes
Testing Changes
Warning
Breaking Changes
- v0.84.0 Jun 6, 2024
- v0.83.0 Feb 2, 2024
Warning
Breaking Changes
- v0.82.0 Nov 3, 2023
Warning
Breaking Changes
- v0.81.1 Oct 16, 2023
Warning
Breaking Changes
- v0.81.0 Oct 5, 2023
- Enhancements
Extended STLDecomposer to support multiseries #4253
Extended TimeSeriesImputer to support multiseries #4291
Added datacheck to check for mismatched series length in multiseries #4296
Added STLDecomposer to multiseries pipelines #4299
Extended DateTimeFormatCheck data check to support multiseries #4300
Extended TimeSeriesRegularizer to support multiseries #4303
- Documentation Changes
Removed LightGBM’s excessive amount of warnings #4308
- Testing Changes
Removed old performance testing workflow #4318
Warning
Breaking Changes
- v0.80.0 Aug. 30, 2023
Warning
Breaking Changes
- v0.79.0 Aug. 11, 2023
- Enhancements
Updated regression metrics to handle multioutput dataframes as well as single output series #4233
Added baseline regressor for multiseries time series problems #4246
Added stacking and unstacking utility functions to work with multiseries data #4250
Added multiseries regression pipeline class #4256
Added multiseries VARMAX regressor #4238
Documentation Changes
Testing Changes
Warning
Breaking Changes
- v0.78.0 Jul. 10, 2023
Warning
- v0.77.0 Jun. 07, 2023
- Enhancements
Added
check_distribution
function for determining if the predicted distribution matches the true one #4184Added
get_recommendation_score_breakdown
function for insight on the recommendation score #4188Added excluded_model_families parameter to AutoMLSearch() #4196
Added option to exclude time index in
IDColumnsDataCheck
#4194
Changes
Documentation Changes
- Testing Changes
Run looking glass performance tests on merge via Airflow #4198
- v0.76.0 May. 09, 2023
- v0.75.0 May. 01, 2023
- Fixes
Fixed bug where resetting the holdout data indices would cause time series
predict_in_sample
to be wrong #4161
- v0.74.0 Apr. 18, 2023
- Changes
Capped size of seasonal period used for determining whether to include STLDecomposer in pipelines #4147
- v0.73.0 Apr. 10, 2023
- Changes
Removed unnecessary logic from imputer components prior to nullable type handling #4038, #4043
Added calls to
_handle_nullable_types
in component fit, transform, and predict methods when needed #4046, #4043Removed existing nullable type handling across AutoMLSearch to just use new handling #4085, #4043
Handled nullable type incompatibility in
Decomposer
#4105, :pr:`4043Removed nullable type incompatibility handling for ARIMA and ExponentialSmoothingRegressor #4129
Changed the default value for
null_strategy
inInvalidTargetDataCheck
todrop
#4131Pinned sktime version to 0.17.0 for nullable types support #4137
- Testing Changes
Fixed installation of prophet for linux nightly tests #4114
- v0.72.0 Mar. 27, 2023
- v0.71.0 Mar. 17, 2023*
- Fixes
Fixed error in
PipelineBase._supports_fast_permutation_importance
with stacked ensemble pipelines #4083
- v0.70.0 Mar. 16, 2023
- v0.69.0 Mar. 15, 2023
- Enhancements
Move black to regular dependency and use it for
generate_pipeline_code
#4005Implement
generate_pipeline_example
#4023Add new downcast utils for component-specific nullable type handling and begin implementation on objective and component base classes #4024
Add nullable type incompatibility properties to the components that need them #4031
Add
get_evalml_requirements_file
#4034Pipelines with DFS Transformers will run fast permutation importance if DFS features pre-exist #4037
Add get_prediction_intervals() at the pipeline level #4052
- Changes
Uncapped
pmdarima
and updated minimum version #4027Increase min catboost to 1.1.1 and xgboost to 1.7.0 to add nullable type support for those estimators #3996
Unpinned
networkx
and updated minimum version #4035Increased
scikit-learn
version to 1.2.2 #4064Capped max
holidays
version to 0.21 #4064Stop allowing
knn
as a boolean impute strategy #4058Capped
nbsphinx
at < 0.9.0 #4071
- v0.68.0 Feb. 15, 2023
- v0.67.0 Jan. 31, 2023
- v0.66.1 Jan. 26, 2023
- v0.66.0 Jan. 24, 2023
- Enhancements
Improved decomposer
determine_periodicity
functionality for better period guesses #3912Added
dates_needed_for_prediction
for time series pipelines #3906Added
RFClassifierRFESelector
andRFRegressorRFESelector
components for feature selection using recursive feature elimination #3934Added
dates_needed_for_prediction_range
for time series pipelines #3941
- Fixes
- v0.65.0 Jan. 3, 2023
- Changes
Added a threshold to
DateTimeFormatDataCheck
to account for too many duplicate or nan values #3883Changed treatment of
Boolean
columns forSimpleImputer
andClassImbalanceDataCheck
to be compatible with new Woodwork inference #3892Split decomposer
seasonal_period
parameter intoseasonal_smoother
andperiod
parameters #3896Excluded catboost from the broken link checking workflow due to 403 errors #3899
Pinned scikit-learn version below 1.2.0 #3901
Cast newly created one hot encoded columns as
bool
dtype #3913
- Documentation Changes
Hid non-essential warning messages in time series docs #3890
Testing Changes
- v0.64.0 Dec. 8, 2022
Enhancements
- Changes
Update leaderboard names to show ranking_score instead of validation_score #3878
Remove Int64Index after Pandas 1.5 Upgrade #3825
Reduced the threshold for setting
use_covariates
to False for ARIMA models in AutoMLSearch #3868Pinned woodwork version at <=0.19.0 #3871
Updated minimum Pandas version to 1.5.0 #3808
Remove dsherry from automated dependency update reviews and added tamargrey #3870
Documentation Changes
Testing Changes
- v0.63.0 Nov. 23, 2022
- Fixes
Fixed
TimeSeriesFeaturizer
potentially selecting lags outside of feature engineering window #3773Fixed bug where
TimeSeriesFeaturizer
could not encode Ordinal columns with non numeric categories #3812Updated demo dataset links to point to new endpoint #3826
Updated
STLDecomposer
to infer the time index frequency if it’s not present #3829Updated
_drop_time_index
to move the time index from X to bothX.index
andy.index
#3829Fixed bug where engineered features lost their origin attribute in partial dependence, causing it to fail #3830
Fixed bug where partial dependence’s fast mode handling for the DFS Transformer wouldn’t work with multi output features #3830
Allowed target to be present and ignored in partial dependence’s DFS Transformer fast mode handling #3830
- Changes
Consolidated decomposition frequency validation logic to
Decomposer
class #3811Removed Featuretools version upper bound and prevent Woodwork 0.20.0 from being installed #3813
Updated min Featuretools version to 0.16.0, min nlp-primitives version to 2.9.0 and min Dask version to 2022.2.0 #3823
Rename issue templates config.yaml to config.yml #3844
Reverted change adding a
should_skip_featurization
flag to time series pipelines #3862
- v0.62.0 Nov. 01, 2022
- v0.61.1 Oct. 27, 2022
- v0.61.0 Oct. 25, 2022
- v0.60.0 Oct. 19, 2022
Warning
- Breaking Changes
TargetLeakageDataCheck
now uses argumentmutual_info
rather thanmutual
#3728
- v0.59.0 Sept. 27, 2022
- v0.58.0 Sept. 20, 2022
- Enhancements
Defined get_trend_df() for PolynomialDecomposer to allow decomposition of target data into trend, seasonality and residual. #3720
Updated to run with Woodwork >= 0.18.0 #3700
Pass time index column to time series native estimators but drop otherwise #3691
Added
errors
attribute toAutoMLSearch
for useful debugging #3702
- Changes
Bumped up minimum version of sktime to 0.12.0. #3720
Added abstract Decomposer class as a parent to PolynomialDecomposer to support additional decomposers. #3720
Pinned
pmdarima
< 2.0.0 #3679Added support for using
downcast_nullable_types
with Series as well as DataFrames #3697Added distinction between ranking and optimization objectives #3721
Documentation Changes
- Testing Changes
- v0.57.0 Sept. 6, 2022
- Enhancements
Added
KNNImputer
class and created new knn parameter for Imputer #3662
- Fixes
IDColumnsDataCheck
now only returns an action code to set the first column as the primary key if it contains unique values #3639IDColumnsDataCheck
now can handle primary key columns containing “integer” values that are of the double type #3683Added support for BooleanNullable columns in EvalML pipelines and imputer #3678
Updated StandardScaler to only apply to numeric columns #3686
- v0.56.1 Aug. 19, 2022
- v0.56.0 Aug. 15, 2022
- Enhancements
Add CI testing environment in Mac for install workflow #3646
Updated
make_pipeline
to only include the Imputer in pipelines if NaNs exist in the data #3657Updated to run with Woodwork >= 0.17.2 #3626
Add
exclude_featurizers
parameter toAutoMLSearch
to specify featurizers that should be excluded from all pipelines #3631Add
fit_transform
method to pipelines and component graphs #3640Changed default value of data splitting for time series problem holdout set evaluation #3650
- Fixes
Reverted the Woodwork 0.17.x compatibility work due to performance regression #3664
- v0.55.0 July. 24, 2022
- Enhancements
Increased the amount of logical type information passed to Woodwork when calling
ww.init()
in transformers #3604Added ability to log how long each batch and pipeline take in
automl.search()
#3577Added the option to set the
sp
parameter for ARIMA models #3597Updated the CV split size of time series problems to match forecast horizon for improved performance #3616
Added holdout set evaluation as part of AutoML search and pipeline ranking #3499
Added Dockerfile.arm and .dockerignore for python version and M1 testing #3609
Added
test_gen_utils::in_container_arm64()
fixture #3609
- Fixes
Fixed iterative graphs not appearing in documentation #3592
Updated the
load_diabetes()
method to account for scikit-learn 1.1.1 changes to the dataset #3591Capped woodwork version at < 0.17.0 #3612
Bump minimum scikit-optimize version to 0.9.0 :pr:`3614
Invalid target data checks involving regression and unsupported data types now produce a different
DataCheckMessageCode
#3630Updated
test_data_checks.py::test_data_checks_raises_value_errors_on_init
- more lenient text check #3609
Documentation Changes
- Testing Changes
Warning
- Breaking Changes
Refactored test cases that iterate over all components to use
pytest.mark.parametrise
and changed the correspondingif...continue
blocks topytest.mark.xfail
#3622
- v0.54.0 June. 23, 2022
- Fixes
Updated the Imputer and SimpleImputer to work with scikit-learn 1.1.1. #3525
Bumped the minimum versions of scikit-learn to 1.1.1 and imbalanced-learn to 0.9.1. #3525
Added a clearer error message when
describe
is called on an un-instantiated ComponentGraph #3569Added a clearer error message when time series’
predict
is called with its X_train or y_train parameter set as None #3579
- v0.53.1 June. 9, 2022
- Changes
Set the development status to
4 - Beta
insetup.cfg
#3550
- v0.53.0 June. 9, 2022
- Enhancements
Pass
n_jobs
to default algorithm #3548
- v0.52.0 May. 12, 2022
- Changes
Added github workflows for featuretools and woodwork to test their main branch against evalml. #3504
Added pmdarima to conda recipe. #3505
Added a threshold for
NullDataCheck
before a warning is issued for null values #3507Changed
NoVarianceDataCheck
to only output warnings #3506Reverted XGBoost Classifier/Regressor patch for all boolean columns needing to be converted to int. #3503
Updated
roc_curve()
andconf_matrix()
to work with IntegerNullable and BooleanNullable types. #3465Changed
ComponentGraph._transform_features
to raise aPipelineError
instead of aValueError
. This is not a breaking change becausePipelineError
is a subclass ofValueError
. #3497Capped
sklearn
at version 1.1.0 #3518
- Documentation Changes
Updated to install prophet extras in Read the Docs. #3509
- Testing Changes
Moved vowpal wabbit in test recipe to
evalml
package fromevalml-core
#3502
- v0.51.0 Apr. 28, 2022
- Enhancements
Updated
make_pipeline_from_data_check_output
to work with time series problems. #3454
- Fixes
Changed
PipelineBase.graph_json()
to return a python dictionary and renamed asgraph_dict()
#3463
- Changes
Added
vowpalwabbit
to local recipe and removeis_using_conda
pytest skip markers from relevant tests #3481
- Documentation Changes
Warning
- v0.50.0 Apr. 12, 2022
- Enhancements
Added
TimeSeriesImputer
component #3374Replaced
pipeline_parameters
andcustom_hyperparameters
withsearch_parameters
inAutoMLSearch
#3373, #3427Added
TimeSeriesRegularizer
to smooth uninferrable date ranges for time series problems #3376Enabled ensembling as a parameter for
DefaultAlgorithm
#3435, #3444
Warning
- v0.49.0 Mar. 31, 2022
- Enhancements
Added
use_covariates
parameter toARIMARegressor
#3407AutoMLSearch
will setuse_covariates
toFalse
for ARIMA when dataset is large #3407Add ability to retrieve logical types to a component in the graph via
get_component_input_logical_types
#3428Add ability to get logical types passed to the last component via
last_component_input_logical_types
#3428
- Fixes
Fix conda build after PR 3407 #3429
Warning
- Breaking Changes
Moved model understanding metrics from
graph.py
tometrics.py
#3417
- v0.48.0 Mar. 25, 2022
- Enhancements
Add support for oversampling in time series classification problems #3387
Warning
- Breaking Changes
Moved partial dependence functions from
graph.py
topartial_dependence.py
#3404
- v0.47.0 Mar. 16, 2022
- v0.46.0 Mar. 03, 2022
- Documentation Changes
Added in-line tabs and copy-paste functionality to documentation, overhauled Install page #3353
- v0.45.0 Feb. 17, 2022
- Testing Changes
Add auto approve dependency workflow schedule for every 30 mins #3312
- v0.44.0 Feb. 04, 2022
- Enhancements
Updated
DefaultAlgorithm
to also limit estimator usage for long-running multiclass problems #3099Added
make_pipeline_from_data_check_output()
utility method #3277Updated
AutoMLSearch
to useDefaultAlgorithm
as the default automl algorithm #3261, #3304Added more specific data check errors to
DatetimeFormatDataCheck
#3288Added
features
as a parameter forAutoMLSearch
and addDFSTransformer
to pipelines whenfeatures
are present #3309
Warning
- v0.43.0 Jan. 25, 2022
- Enhancements
Updated new
NullDataCheck
to return a warning and suggest an action to impute columns with null values #3197Updated
make_pipeline_from_actions
to handle null column imputation #3237Updated data check actions API to return options instead of actions and add functionality to suggest and take action on columns with null values #3182
- Changes
Updated
DataCheck
validate()
output to return a dictionary instead of list for actions #3142Updated
DataCheck
validate()
API to use the newDataCheckActionOption
class instead ofDataCheckAction
#3152Uncapped numba version and removed it from requirements #3263
Renamed
HighlyNullDataCheck
toNullDataCheck
#3197Updated data check
validate()
output to return a list of warnings and errors instead of a dictionary #3244Capped
pandas
at < 1.4.0 #3274
- Testing Changes
Bumped minimum
IPython
version to 7.16.3 intest-requirements.txt
based on dependabot feedback #3269
Warning
- Breaking Changes
Renamed
HighlyNullDataCheck
toNullDataCheck
#3197Updated data check
validate()
output to return a list of warnings and errors instead of a dictionary. See the Data Check or Data Check Actions pages (under User Guide) for examples. #3244Removed
impute_all
anddefault_impute_strategy
parameters from thePerColumnImputer
#3267Updated
PerColumnImputer
such that columns not specified inimpute_strategies
dict will not be imputed anymore #3267
- v0.42.0 Jan. 18, 2022
- Enhancements
Required the separation of training and test data by
gap
+ 1 units to be verified bytime_index
for time series problems #3208Added support for boolean features for
ARIMARegressor
#3187Updated dependency bot workflow to remove outdated description and add new configuration to delete branches automatically #3212
Added
n_obs
andn_splits
toTimeSeriesParametersDataCheck
error details #3246
- Fixes
Fixed classification pipelines to only accept target data with the appropriate number of classes #3185
Added support for time series in
DefaultAlgorithm
#3177Standardized names of featurization components #3192
Removed empty cell in text_input.ipynb #3234
Removed potential prediction explanations failure when pipelines predicted a class with probability 1 #3221
Dropped NaNs before partial dependence grid generation #3235
Allowed prediction explanations to be json-serializable #3262
Fixed bug where
InvalidTargetDataCheck
would not check time series regression targets #3251Fixed bug in
are_datasets_separated_by_gap_time_index
#3256
- Changes
Raised lowest compatible numpy version to 1.21.0 to address security concerns #3207
Changed the default objective to
MedianAE
fromR2
for time series regression #3205Removed all-nan Unknown to Double logical conversion in
infer_feature_types
#3196Checking the validity of holdout data for time series problems can be performed by calling
pipelines.utils.validate_holdout_datasets
prior to callingpredict
#3208
- Testing Changes
Update auto approve workflow trigger and delete branch after merge #3265
Warning
- Breaking Changes
Renamed
DateTime Featurizer Component
toDateTime Featurizer
andNatural Language Featurization Component
toNatural Language Featurizer
#3192
- v0.41.0 Jan. 06, 2022
- v0.40.0 Dec. 22, 2021
- Enhancements
Added
TimeSeriesSplittingDataCheck
toDefaultDataChecks
to verify adequate class representation in time series classification problems #3141Added the ability to accept serialized features and skip computation in
DFSTransformer
#3106Added support for known-in-advance features #3149
Added Holt-Winters
ExponentialSmoothingRegressor
for time series regression problems #3157Required the separation of training and test data by
gap
+ 1 units to be verified bytime_index
for time series problems #3160
- Fixes
Fixed error caused when tuning threshold for time series binary classification #3140
- Changes
TimeSeriesParametersDataCheck
was added toDefaultDataChecks
for time series problems #3139Renamed
date_index
totime_index
inproblem_configuration
for time series problems #3137Updated
nlp-primitives
minimum version to 2.1.0 #3166Updated minimum version of
woodwork
to v0.11.0 #3171Revert 3160 until uninferrable frequency can be addressed earlier in the process #3198
- Documentation Changes
Added comments to provide clarity on doctests #3155
- Testing Changes
Parameterized tests in
test_datasets.py
#3145
Warning
- Breaking Changes
Renamed
date_index
totime_index
inproblem_configuration
for time series problems #3137
- v0.39.0 Dec. 9, 2021
- Enhancements
Renamed
DelayedFeatureTransformer
toTimeSeriesFeaturizer
and enhanced it to compute rolling features #3028Added ability to impute only specific columns in
PerColumnImputer
#3123Added
TimeSeriesParametersDataCheck
to verify the time series parameters are valid given the number of splits in cross validation #3111
- Fixes
Default parameters for
RFRegressorSelectFromModel
andRFClassifierSelectFromModel
has been fixed to avoid selecting all features #3110
- Changes
Removed reliance on a datetime index for
ARIMARegressor
andProphetRegressor
#3104Included target leakage check when fitting
ARIMARegressor
to account for the lack ofTimeSeriesFeaturizer
inARIMARegressor
based pipelines #3104Cleaned up and refactored
InvalidTargetDataCheck
implementation and docstring #3122Removed indices information from the output of
HighlyNullDataCheck
’svalidate()
method #3092Added
ReplaceNullableTypes
component to prepare for handling pandas nullable types. #3090Updated
make_pipeline
for handling pandas nullable types in preprocessing pipeline. #3129Removed unused
EnsembleMissingPipelinesError
exception definition #3131
Warning
- Breaking Changes
Renamed
DelayedFeatureTransformer
toTimeSeriesFeaturizer
#3028ProphetRegressor
now requires a datetime column inX
represented by thedate_index
parameter #3104Renamed module
evalml.data_checks.invalid_target_data_check
toevalml.data_checks.invalid_targets_data_check
#3122Removed unused
EnsembleMissingPipelinesError
exception definition #3131
- v0.38.0 Nov. 27, 2021
- Enhancements
Added
data_check_name
attribute to the data check action class #3034Added
NumWords
andNumCharacters
primitives toTextFeaturizer
and renamedTextFeaturizer` to ``NaturalLanguageFeaturizer
#3030Added support for
scikit-learn > 1.0.0
#3051Required the
date_index
parameter to be specified for time series problems inAutoMLSearch
#3041Allowed time series pipelines to predict on test datasets whose length is less than or equal to the
forecast_horizon
. Also allowed the test set index to start at 0. #3071Enabled time series pipeline to predict on data with features that are not known-in-advanced #3094
- Fixes
Added in error message when fit and predict/predict_proba data types are different #3036
Fixed bug where ensembling components could not get converted to JSON format #3049
Fixed bug where components with tuned integer hyperparameters could not get converted to JSON format #3049
Fixed bug where force plots were not displaying correct feature values #3044
Included confusion matrix at the pipeline threshold for
find_confusion_matrix_per_threshold
#3080Fixed bug where One Hot Encoder would error out if a non-categorical feature had a missing value #3083
Fixed bug where features created from categorical columns by
Delayed Feature Transformer
would be inferred as categorical #3083
- Documentation Changes
Updated docs to use data check action methods rather than manually cleaning data #3050
- Testing Changes
Updated integration tests to use
make_pipeline_from_actions
instead of private method #3047
Warning
- Breaking Changes
Added
data_check_name
attribute to the data check action class #3034Renamed
TextFeaturizer` to ``NaturalLanguageFeaturizer
#3030Updated the
Pipeline.graph_json
function to return a dictionary of “from” and “to” edges instead of tuples #3049Delete
predict_uses_y
estimator attribute #3069Changed time series problems in
AutoMLSearch
to need a not-None
date_index
#3041Changed the
DelayedFeatureTransformer
to throw aValueError
during fit if thedate_index
isNone
#3041Passing
X=None
toDelayedFeatureTransformer
is deprecated #3041
- v0.37.0 Nov. 9, 2021
- Enhancements
Added
find_confusion_matrix_per_threshold
to Model Understanding #2972Limit computationally-intensive models during
AutoMLSearch
for certain multiclass problems, allow for opt-in with parameterallow_long_running_models
#2982Added support for stacked ensemble pipelines to prediction explanations module #2971
Added integration tests for data checks and data checks actions workflow #2883
Added a change in pipeline structure to handle categorical columns separately for pipelines in
DefaultAlgorithm
#2986Added an algorithm to
DelayedFeatureTransformer
to select better lags #3005Added test to ensure pickling pipelines preserves thresholds #3027
Added AutoML function to access ensemble pipeline’s input pipelines IDs #3011
Added ability to define which class is “positive” for label encoder in binary classification case #3033
- Fixes
Fixed bug where
Oversampler
didn’t consider boolean columns to be categorical #2980Fixed permutation importance failing when target is categorical #3017
Updated estimator and pipelines’
predict
,predict_proba
,transform
,inverse_transform
methods to preserve input indices #2979Updated demo dataset link for daily min temperatures #3023
- Changes
Updated
OutliersDataCheck
andUniquenessDataCheck
and allow for the suspension of the Nullable types error #3018
- v0.36.0 Oct. 27, 2021
- Enhancements
Added LIME as an algorithm option for
explain_predictions
andexplain_predictions_best_worst
#2905Standardized data check messages and added default “rows” and “columns” to data check message details dictionary #2869
Added
rows_of_interest
to pipeline utils #2908Added support for woodwork version
0.8.2
#2909Enhanced the
DateTimeFeaturizer
to handleNaNs
in date features #2909Added support for woodwork logical types
PostalCode
,SubRegionCode
, andCountryCode
in model understanding tools #2946Added Vowpal Wabbit regressor and classifiers #2846
Added NoSplit data splitter for future unsupervised learning searches #2958
Added method to convert actions into a preprocessing pipeline #2968
- Fixes
Fixed bug where partial dependence was not respecting the ww schema #2929
Fixed
calculate_permutation_importance
for datetimes onStandardScaler
#2938Fixed
SelectColumns
to only select available features for feature selection inDefaultAlgorithm
#2944Fixed
DropColumns
component not receiving parameters inDefaultAlgorithm
#2945Fixed bug where trained binary thresholds were not being returned by
get_pipeline
orclone
#2948Fixed bug where
Oversampler
selected ww logical categorical instead of ww semantic category #2946
Warning
- Breaking Changes
Standardized data check messages and added default “rows” and “columns” to data check message details dictionary. This may change the number of messages returned from a data check. #2869
- v0.35.0 Oct. 14, 2021
- Changes
Updated pipelines to use a label encoder component instead of doing encoding on the pipeline level #2821
Deleted scikit-learn ensembler #2819
Refactored pipeline building logic out of
AutoMLSearch
and intoIterativeAlgorithm
#2854Refactored names for methods in
ComponentGraph
andPipelineBase
#2902
Warning
- Breaking Changes
Updated pipelines to use a label encoder component instead of doing encoding on the pipeline level. This means that pipelines will no longer automatically encode non-numerical targets. Please use a label encoder if working with classification problems and non-numeric targets. #2821
Deleted scikit-learn ensembler #2819
IterativeAlgorithm
now requires X, y, problem_type as required arguments as well as sampler_name, allowed_model_families, allowed_component_graphs, max_batches, and verbose as optional arguments #2854Changed method names of
fit_features
andcompute_final_component_features
tofit_and_transform_all_but_final
andtransform_all_but_final
inComponentGraph
, andcompute_estimator_features
totransform_all_but_final
in pipeline classes #2902
- v0.34.0 Sep. 30, 2021
- Enhancements
Updated to work with Woodwork 0.8.1 #2783
Added validation that
training_data
andtraining_target
are notNone
in prediction explanations #2787Added support for training-only components in pipelines and component graphs #2776
Added default argument for the parameters value for
ComponentGraph.instantiate
#2796Added
TIME_SERIES_REGRESSION
toLightGBMRegressor's
supported problem types #2793Provided a JSON representation of a pipeline’s DAG structure #2812
Added validation to holdout data passed to
predict
andpredict_proba
for time series #2804Added information about which row indices are outliers in
OutliersDataCheck
#2818Added verbose flag to top level
search()
method #2813Added support for linting jupyter notebooks and clearing the executed cells and empty cells #2829 #2837
Added “DROP_ROWS” action to output of
OutliersDataCheck.validate()
#2820Added the ability of
AutoMLSearch
to accept aSequentialEngine
instance as engine input #2838Added new label encoder component to EvalML #2853
Added our own partial dependence implementation #2834
- Fixes
Fixed bug where
calculate_permutation_importance
was not calculating the right value for pipelines with target transformers #2782Fixed bug where transformed target values were not used in
fit
for time series pipelines #2780Fixed bug where
score_pipelines
method ofAutoMLSearch
would not work for time series problems #2786Removed
TargetTransformer
class #2833Added tests to verify
ComponentGraph
support by pipelines #2830Fixed incorrect parameter for baseline regression pipeline in
AutoMLSearch
#2847Fixed bug where the desired estimator family order was not respected in
IterativeAlgorithm
#2850
- Changes
Changed woodwork initialization to use partial schemas #2774
Made
Transformer.transform()
an abstract method #2744Deleted
EmptyDataChecks
class #2794Removed data check for checking log distributions in
make_pipeline
#2806Changed the minimum
woodwork
version to 0.8.0 #2783Pinned
woodwork
version to 0.8.0 #2832Removed
model_family
attribute fromComponentBase
and transformers #2828Limited
scikit-learn
until new features and errors can be addressed #2842Show DeprecationWarning when Sklearn Ensemblers are called #2859
Warning
- v0.33.0 Sep. 15, 2021
- v0.32.1 Sep. 10, 2021
- Enhancements
Added
verbose
flag toAutoMLSearch
to run search in silent mode by default #2645Added label encoder to
XGBoostClassifier
to remove the warning #2701Set
eval_metric
tologloss
forXGBoostClassifier
#2741Added support for
woodwork
versions0.7.0
and0.7.1
#2743Changed
explain_predictions
functions to display original feature values #2759Added
X_train
andy_train
tograph_prediction_vs_actual_over_time
andget_prediction_vs_actual_over_time_data
#2762Added
forecast_horizon
as a required parameter to time series pipelines andAutoMLSearch
#2697Added
predict_in_sample
andpredict_proba_in_sample
methods to time series pipelines to predict on data where the target is known, e.g. cross-validation #2697
- Changes
Deleted
drop_nan_target_rows
utility method #2737Removed default logging setup and debugging log file #2645
Changed the default n_jobs value for
XGBoostClassifier
andXGBoostRegressor
to 12 #2757Changed
TimeSeriesBaselineEstimator
to only work on a time series pipeline with aDelayedFeaturesTransformer
#2697Added
X_train
andy_train
as optional parameters to pipelinepredict
,predict_proba
. Only used for time series pipelines #2697Added
training_data
andtraining_target
as optional parameters toexplain_predictions
andexplain_predictions_best_worst
to support time series pipelines #2697Changed time series pipeline predictions to no longer output series/dataframes padded with NaNs. A prediction will be returned for every row in the X input #2697
- Testing Changes
Fixed flaky
TargetDistributionDataCheck
test for very_lognormal distribution #2748
Warning
- Breaking Changes
Removed default logging setup and debugging log file #2645
Added
X_train
andy_train
tograph_prediction_vs_actual_over_time
andget_prediction_vs_actual_over_time_data
#2762Added
forecast_horizon
as a required parameter to time series pipelines andAutoMLSearch
#2697Changed
TimeSeriesBaselineEstimator
to only work on a time series pipeline with aDelayedFeaturesTransformer
#2697Added
X_train
andy_train
as required parameters forpredict
andpredict_proba
in time series pipelines #2697Added
training_data
andtraining_target
as required parameters toexplain_predictions
andexplain_predictions_best_worst
for time series pipelines #2697
- v0.32.0 Aug. 31, 2021
- Enhancements
Allow string for
engine
parameter forAutoMLSearch
#2667Add
ProphetRegressor
to AutoML #2619Integrated
DefaultAlgorithm
intoAutoMLSearch
#2634Removed SVM “linear” and “precomputed” kernel hyperparameter options, and improved default parameters #2651
Updated
ComponentGraph
initalization to raiseValueError
when user attempts to use.y
for a component that does not produce a tuple output #2662Updated to support Woodwork 0.6.0 #2690
Updated pipeline
graph()
to distingush X and y edges #2654Added
DropRowsTransformer
component #2692Added
DROP_ROWS
to_make_component_list_from_actions
and clean up metadata #2694Add new ensembler component #2653
- Fixes
Updated Oversampler logic to select best SMOTE based on component input instead of pipeline input #2695
Added ability to explicitly close DaskEngine resources to improve runtime and reduce Dask warnings #2667
Fixed partial dependence bug for ensemble pipelines #2714
Updated
TargetLeakageDataCheck
to maintain user-selected logical types #2711
Warning
- Breaking Changes
Renamed the current top level
search
method tosearch_iterative
and defined a newsearch
method for theDefaultAlgorithm
#2634Replaced
SMOTEOversampler
,SMOTENOversampler
andSMOTENCOversampler
with consolidatedOversampler
component #2695Removed
LinearRegressor
from the list of defaultAutoMLSearch
estimators due to poor performance #2660
- v0.31.0 Aug. 19, 2021
- Enhancements
Updated the high variance check in AutoMLSearch to be robust to a variety of objectives and cv scores #2622
Use Woodwork’s outlier detection for the
OutliersDataCheck
#2637Added ability to utilize instantiated components when creating a pipeline #2643
Sped up the all Nan and unknown check in
infer_feature_types
#2661
Fixes
- Testing Changes
Speed up CI by splitting Prophet tests into a separate workflow in GitHub #2644
Warning
- Breaking Changes
TimeSeriesRegressionPipeline
no longer inherits fromTimeSeriesRegressionPipeline
#2649
- v0.30.2 Aug. 16, 2021
- Fixes
Updated changelog and version numbers to match the release. Release 0.30.1 was release erroneously without a change to the version numbers. 0.30.2 replaces it.
- v0.30.1 Aug. 12, 2021
- Enhancements
Added
DatetimeFormatDataCheck
for time series problems #2603Added
ProphetRegressor
to estimators #2242Updated
ComponentGraph
to handle not calling samplers’ transform during predict, and updated samplers’ transform methods s.t.fit_transform
is equivalent tofit(X, y).transform(X, y)
#2583Updated
ComponentGraph
_validate_component_dict
logic to be stricter about input values #2599Patched bug in
xgboost
estimators where predicting on a feature matrix of only booleans would throw an exception. #2602Updated
ARIMARegressor
to use relative forecasting to predict values #2613Added support for creating pipelines without an estimator as the final component and added
transform(X, y)
method to pipelines and component graphs #2625Updated to support Woodwork 0.5.1 #2610
- Fixes
Updated
AutoMLSearch
to dropARIMARegressor
fromallowed_estimators
if an incompatible frequency is detected #2632Updated
get_best_sampler_for_data
to consider all non-numeric datatypes as categorical for SMOTE #2590Fixed inconsistent test results from TargetDistributionDataCheck #2608
Adopted vectorized pd.NA checking for Woodwork 0.5.1 support #2626
Pinned upper version of astroid to 2.6.6 to keep ReadTheDocs working. #2638
- Changes
Renamed SMOTE samplers to SMOTE oversampler #2595
Changed
partial_dependence
andgraph_partial_dependence
to raise aPartialDependenceError
instead ofValueError
. This is not a breaking change becausePartialDependenceError
is a subclass ofValueError
#2604Cleaned up code duplication in
ComponentGraph
#2612Stored predict_proba results in .x for intermediate estimators in ComponentGraph #2629
- Documentation Changes
To avoid local docs build error, only add warning disable and download headers on ReadTheDocs builds, not locally #2617
- Testing Changes
Updated partial_dependence tests to change the element-wise comparison per the Plotly 5.2.1 upgrade #2638
Changed the lint CI job to only check against python 3.9 via the -t flag #2586
Installed Prophet in linux nightlies test and fixed
test_all_components
#2598Refactored and fixed all
make_pipeline
tests to assert correct order and address new Woodwork Unknown type inference #2572Removed
component_graphs
as a global variable intest_component_graphs.py
#2609
Warning
- Breaking Changes
Renamed SMOTE samplers to SMOTE oversampler. Please use
SMOTEOversampler
,SMOTENCOversampler
,SMOTENOversampler
instead ofSMOTESampler
,SMOTENCSampler
, andSMOTENSampler
#2595
- v0.30.0 Aug. 3, 2021
- Enhancements
Added
LogTransformer
andTargetDistributionDataCheck
#2487Issue a warning to users when a pipeline parameter passed in isn’t used in the pipeline #2564
Added Gini coefficient as an objective #2544
Added
repr
toComponentGraph
#2565Added components to extract features from
URL
andEmailAddress
Logical Types #2550Added support for NaN values in
TextFeaturizer
#2532Added
SelectByType
transformer #2531Added separate thresholds for percent null rows and columns in
HighlyNullDataCheck
#2562Added support for NaN natural language values #2577
- Fixes
Raised error message for types
URL
,NaturalLanguage
, andEmailAddress
inpartial_dependence
#2573
- Changes
Updated
PipelineBase
implementation for creating pipelines from a list of components #2549Moved
get_hyperparameter_ranges
toPipelineBase
class from automl/utils module #2546Renamed
ComponentGraph
’sget_parents
toget_inputs
#2540Removed
ComponentGraph.linearized_component_graph
andComponentGraph.from_list
#2556Updated
ComponentGraph
to enforce requiring .x and .y inputs for each component in the graph #2563Renamed existing ensembler implementation from
StackedEnsemblers
toSklearnStackedEnsemblers
#2578
- Testing Changes
Added test that makes sure
split_data
does not shuffle for time series problems #2552
Warning
- Breaking Changes
Moved
get_hyperparameter_ranges
toPipelineBase
class from automl/utils module #2546Renamed
ComponentGraph
’sget_parents
toget_inputs
#2540Removed
ComponentGraph.linearized_component_graph
andComponentGraph.from_list
#2556Updated
ComponentGraph
to enforce requiring .x and .y inputs for each component in the graph #2563
- v0.29.0 Jul. 21, 2021
- Enhancements
Updated 1-way partial dependence support for datetime features #2454
Added details on how to fix error caused by broken ww schema #2466
Added ability to use built-in pickle for saving AutoMLSearch #2463
Updated our components and component graphs to use latest features of ww 0.4.1, e.g.
concat_columns
and drop in-place. #2465Added new, concurrent.futures based engine for parallel AutoML #2506
Added support for new Woodwork
Unknown
type in AutoMLSearch #2477Updated our components with an attribute that describes if they modify features or targets and can be used in list API for pipeline initialization #2504
Updated
ComponentGraph
to accept X and y as inputs #2507Removed unused
TARGET_BINARY_INVALID_VALUES
fromDataCheckMessageCode
enum and fixed formatting of objective documentation #2520Added
EvalMLAlgorithm
#2525Added support for NaN values in
TextFeaturizer
#2532
- Fixes
Fixed
FraudCost
objective and reverted threshold optimization method for binary classification toGolden
#2450Added custom exception message for partial dependence on features with scales that are too small #2455
Ensures the typing for Ordinal and Datetime ltypes are passed through _retain_custom_types_and_initalize_woodwork #2461
Updated to work with Pandas 1.3.0 #2442
Updated to work with sktime 0.7.0 #2499
- Testing Changes
Warning
- Breaking Changes
NaN values in the Natural Language type are no longer supported by the Imputer with the pandas upgrade. #2477
- v0.28.0 Jul. 2, 2021
- Fixes
Deleted unreachable line from
IterativeAlgorithm
#2464
- v0.27.0 Jun. 22, 2021
- Enhancements
Adds force plots for prediction explanations #2157
Removed self-reference from
AutoMLSearch
#2304Added support for nonlinear pipelines for
generate_pipeline_code
#2332Added
inverse_transform
method to pipelines #2256Add optional automatic update checker #2350
Added
search_order
toAutoMLSearch
’srankings
andfull_rankings
tables #2345Updated threshold optimization method for binary classification #2315
Updated demos to pull data from S3 instead of including demo data in package #2387
Upgrade woodwork version to v0.4.1 #2379
- Fixes
Preserve user-specified woodwork types throughout pipeline fit/predict #2297
Fixed
ComponentGraph
appending target tofinal_component_features
if there is a component that returns both X and y #2358Fixed partial dependence graph method failing on multiclass problems when the class labels are numeric #2372
Added
thresholding_objective
argument toAutoMLSearch
for binary classification problems #2320Added change for
k_neighbors
parameter in SMOTE Oversamplers to automatically handle small samples #2375Changed naming for
Logistic Regression Classifier
file #2399Pinned pytest-timeout to fix minimum dependence checker #2425
Replaced
Elastic Net Classifier
base class withLogistsic Regression
to avoidNaN
outputs #2420
- Changes
Cleaned up
PipelineBase
’scomponent_graph
and_component_graph
attributes. UpdatedPipelineBase
__repr__
and added__eq__
forComponentGraph
#2332Added and applied
black
linting package to the EvalML repo in place ofautopep8
#2306Separated custom_hyperparameters from pipelines and added them as an argument to
AutoMLSearch
#2317Replaced allowed_pipelines with allowed_component_graphs #2364
Removed private method
_compute_features_during_fit
fromPipelineBase
#2359Updated
compute_order
inComponentGraph
to be a read-only property #2408Unpinned PyZMQ version in requirements.txt #2389
Uncapping LightGBM version in requirements.txt #2405
Updated minimum version of plotly #2415
Removed
SensitivityLowAlert
objective from core objectives #2418
- Testing Changes
Update minimum unit tests to run on all pull requests #2314
Pass token to authorize uploading of codecov reports #2344
Add
pytest-timeout
. All tests that run longer than 6 minutes will fail. #2374Separated the dask tests out into separate github action jobs to isolate dask failures. #2376
Refactored dask tests #2377
Added the combined dask/non-dask unit tests back and renamed the dask only unit tests. #2382
Sped up unit tests and split into separate jobs #2365
Change CI job names, run lint for python 3.9, run nightlies on python 3.8 at 3am EST #2395 #2398
Set fail-fast to false for CI jobs that run for PRs #2402
Warning
- Breaking Changes
AutoMLSearch will accept allowed_component_graphs instead of allowed_pipelines #2364
Removed
PipelineBase
’s_component_graph
attribute. UpdatedPipelineBase
__repr__
and added__eq__
forComponentGraph
#2332pipeline_parameters will no longer accept skopt.space variables since hyperparameter ranges will now be specified through custom_hyperparameters #2317
- v0.25.0 Jun. 01, 2021
Warning
- v0.24.2 May. 24, 2021
- Fixes
Set default n_jobs to 1 for StackedEnsembleClassifier and StackedEnsembleRegressor until fix for text-based parallelism in sklearn stacking can be found #2295
- Changes
Updated
start_iteration_callback
to accept a pipeline instance instead of a pipeline class and no longer accept pipeline parameters as a parameter #2290Refactored
calculate_permutation_importance
method and add per-column permutation importance method #2302Updated logging information in
AutoMLSearch.__init__
to clarify pipeline generation #2263
- Documentation Changes
Minor changes to the release procedure #2230
Warning
- Breaking Changes
Updated
start_iteration_callback
to accept a pipeline instance instead of a pipeline class and no longer accept pipeline parameters as a parameter #2290Moved
default_parameters
toComponentGraph
fromPipelineBase
. A pipeline’sdefault_parameters
is now accessible viapipeline.component_graph.default_parameters
#2307
- v0.24.1 May. 16, 2021
- Documentation Changes
Capped Sphinx version under 4.0.0 #2244
- v0.24.0 May. 04, 2021
- Enhancements
Added date_index as a required parameter for TimeSeries problems #2217
Have the
OneHotEncoder
return the transformed columns as booleans rather than floats #2170Added Oversampler transformer component to EvalML #2079
Added Undersampler to AutoMLSearch, as well as arguments
_sampler_method
andsampler_balanced_ratio
#2128Updated prediction explanations functions to allow pipelines with XGBoost estimators #2162
Added partial dependence for datetime columns #2180
Update precision-recall curve with positive label index argument, and fix for 2d predicted probabilities #2090
Add pct_null_rows to
HighlyNullDataCheck
#2211Added a standalone AutoML search method for convenience, which runs data checks and then runs automl #2152
Make the first batch of AutoML have a predefined order, with linear models first and complex models last #2223 #2225
Added sampling dictionary support to
BalancedClassficationSampler
#2235
- Changes
Deleted baseline pipeline classes #2202
Reverting user specified date feature PR #2155 until pmdarima installation fix is found #2214
Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. #2091
Removed all old datasplitters from EvalML #2193
Deleted
make_pipeline_from_components
#2218
- Documentation Changes
- Testing Changes
Use machineFL user token for dependency update bot, and add more reviewers #2189
Warning
- Breaking Changes
All baseline pipeline classes (
BaselineBinaryPipeline
,BaselineMulticlassPipeline
,BaselineRegressionPipeline
, etc.) have been deleted #2202Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. Pipelines can now be initialized by specifying the component graph as the first parameter, and then passing in optional arguments such as
custom_name
,parameters
, etc. For example,BinaryClassificationPipeline(["Random Forest Classifier"], parameters={})
. #2091Removed all old datasplitters from EvalML #2193
Deleted utility method
make_pipeline_from_components
#2218
- v0.23.0 Apr. 20, 2021
- Enhancements
Refactored
EngineBase
andSequentialEngine
api. AddingDaskEngine
#1975.Added optional
engine
argument toAutoMLSearch
#1975Added a warning about how time series support is still in beta when a user passes in a time series problem to
AutoMLSearch
#2118Added
NaturalLanguageNaNDataCheck
data check #2122Added ValueError to
partial_dependence
to prevent users from computing partial dependence on columns with all NaNs #2120Added standard deviation of cv scores to rankings table #2154
- Fixes
Fixed
BalancedClassificationDataCVSplit
,BalancedClassificationDataTVSplit
, andBalancedClassificationSampler
to useminority:majority
ratio instead ofmajority:minority
#2077Fixed bug where two-way partial dependence plots with categorical variables were not working correctly #2117
Fixed bug where
hyperparameters
were not displaying properly for pipelines with a listcomponent_graph
and duplicate components #2133Fixed bug where
pipeline_parameters
argument inAutoMLSearch
was not applied to pipelines passed in asallowed_pipelines
#2133Fixed bug where
AutoMLSearch
was not applying custom hyperparameters to pipelines with a listcomponent_graph
and duplicate components #2133
- Changes
Removed
hyperparameter_ranges
from Undersampler and renamedbalanced_ratio
tosampling_ratio
for samplers #2113Renamed
TARGET_BINARY_NOT_TWO_EXAMPLES_PER_CLASS
data check message code toTARGET_MULTICLASS_NOT_TWO_EXAMPLES_PER_CLASS
#2126Modified one-way partial dependence plots of categorical features to display data with a bar plot #2117
Renamed
score
column forautoml.rankings
asmean_cv_score
#2135Remove ‘warning’ from docs tool output #2031
Warning
- Breaking Changes
Renamed
balanced_ratio
tosampling_ratio
for theBalancedClassificationDataCVSplit
,BalancedClassificationDataTVSplit
,BalancedClassficationSampler
, and Undersampler #2113Deleted the “errors” key from automl results #1975
Deleted the
raise_and_save_error_callback
and thelog_and_save_error_callback
#1975Fixed
BalancedClassificationDataCVSplit
,BalancedClassificationDataTVSplit
, andBalancedClassificationSampler
to use minority:majority ratio instead of majority:minority #2077
- v0.22.0 Apr. 06, 2021
- Enhancements
Added a GitHub Action for
linux_unit_tests
#2013Added recommended actions for
InvalidTargetDataCheck
, updated_make_component_list_from_actions
to address new action, and addedTargetImputer
component #1989Updated
AutoMLSearch._check_for_high_variance
to not emitRuntimeWarning
#2024Added exception when pipeline passed to
explain_predictions
is aStacked Ensemble
pipeline #2033Added sensitivity at low alert rates as an objective #2001
Added
Undersampler
transformer component #2030
- Fixes
Updated Engine’s
train_batch
to apply undersampling #2038Fixed bug in where Time Series Classification pipelines were not encoding targets in
predict
andpredict_proba
#2040Fixed data splitting errors if target is float for classification problems #2050
Pinned
docutils
to <0.17 to fix ReadtheDocs warning issues #2088
Testing Changes
- v0.21.0 Mar. 24, 2021
- Enhancements
Changed
AutoMLSearch
to defaultoptimize_thresholds
to True #1943Added multiple oversampling and undersampling sampling methods as data splitters for imbalanced classification #1775
Added params to balanced classification data splitters for visibility #1966
Updated
make_pipeline
to not addImputer
if input data does not have numeric or categorical columns #1967Updated
ClassImbalanceDataCheck
to better handle multiclass imbalances #1986Added recommended actions for the output of data check’s
validate
method #1968Added error message for
partial_dependence
when features are mostly the same value #1994Updated
OneHotEncoder
to drop one redundant feature by default for features with two categories #1997Added a
PolynomialDecomposer
component #1992Added
DateTimeNaNDataCheck
data check #2039
Documentation Changes
Warning
- Breaking Changes
Changed
AutoMLSearch
to defaultoptimize_thresholds
to True #1943Removed
data_checks
parameter,data_check_results
and data checks logic fromAutoMLSearch
. To run the data checks which were previously run by default inAutoMLSearch
, please callDefaultDataChecks().validate(X_train, y_train)
or take a look at our documentation for more examples. #1935Deleted
random_state
argument #1985
- v0.20.0 Mar. 10, 2021
- Enhancements
Added a GitHub Action for Detecting dependency changes #1933
Create a separate CV split to train stacked ensembler on for AutoMLSearch #1814
Added a GitHub Action for Linux unit tests #1846
Added
ARIMARegressor
estimator #1894Added
DataCheckAction
class andDataCheckActionCode
enum #1896Updated
Woodwork
requirement tov0.0.10
#1900Added
BalancedClassificationDataCVSplit
andBalancedClassificationDataTVSplit
to AutoMLSearch #1875Update default classification data splitter to use downsampling for highly imbalanced data #1875
Updated
describe_pipeline
to return more information, includingid
of pipelines used for ensemble models #1909Added utility method to create list of components from a list of
DataCheckAction
#1907Updated
validate
method to include aaction
key in returned dictionary for allDataCheck``and ``DataChecks
#1916Aggregating the shap values for predictions that we know the provenance of, e.g. OHE, text, and date-time. #1901
Improved error message when custom objective is passed as a string in
pipeline.score
#1941Added
score_pipelines
andtrain_pipelines
methods toAutoMLSearch
#1913Added support for
pandas
version 1.2.0 #1708Added
score_batch
andtrain_batch
abstact methods toEngineBase
and implementations inSequentialEngine
#1913Added ability to handle index columns in
AutoMLSearch
andDataChecks
#2138
- Fixes
Removed CI check for
check_dependencies_updated_linux
#1950Added metaclass for time series pipelines and fix binary classification pipeline
predict
not using objective if it is passed as a named argument #1874Fixed stack trace in prediction explanation functions caused by mixed string/numeric pandas column names #1871
Fixed stack trace caused by passing pipelines with duplicate names to
AutoMLSearch
#1932Fixed
AutoMLSearch.get_pipelines
returning pipelines with the same attributes #1958
- Changes
Reversed GitHub Action for Linux unit tests until a fix for report generation is found #1920
Updated
add_results
inAutoMLAlgorithm
to take in entire pipeline results dictionary fromAutoMLSearch
#1891Updated
ClassImbalanceDataCheck
to look for severe class imbalance scenarios #1905Deleted the
explain_prediction
function #1915Removed
HighVarianceCVDataCheck
and convered it to anAutoMLSearch
method instead #1928Removed warning in
InvalidTargetDataCheck
returned when numeric binary classification targets are not (0, 1) #1959
- Documentation Changes
Updated
model_understanding.ipynb
to demo the two-way partial dependence capability #1919
Testing Changes
Warning
- v0.19.0 Feb. 23, 2021
- Enhancements
Added a GitHub Action for Python windows unit tests #1844
Added a GitHub Action for checking updated release notes #1849
Added a GitHub Action for Python lint checks #1837
Adjusted
explain_prediction
,explain_predictions
andexplain_predictions_best_worst
to handle timeseries problems. #1818Updated
InvalidTargetDataCheck
to check for mismatched indices in target and features #1816Updated
Woodwork
structures returned from components to supportWoodwork
logical type overrides set by the user #1784Updated estimators to keep track of input feature names during
fit()
#1794Updated
visualize_decision_tree
to include feature names in output #1813Added
is_bounded_like_percentage
property for objectives. If true, thecalculate_percent_difference
method will return the absolute difference rather than relative difference #1809Added full error traceback to AutoMLSearch logger file #1840
Changed
TargetEncoder
to preserve custom indices in the data #1836Refactored
explain_predictions
andexplain_predictions_best_worst
to only compute features once for all rows that need to be explained #1843Added custom random undersampler data splitter for classification #1857
Updated
OutliersDataCheck
implementation to calculate the probability of having no outliers #1855Added
Engines
pipeline processing API #1838
- Fixes
Changed EngineBase random_state arg to random_seed and same for user guide docs #1889
- Changes
Modified
calculate_percent_difference
so that division by 0 is now inf rather than nan #1809Removed
text_columns
parameter fromLSA
andTextFeaturizer
components #1652Added
random_seed
as an argument to our automl/pipeline/component API. Usingrandom_state
will raise a warning #1798Added
DataCheckError
message inInvalidTargetDataCheck
if input target is None and removed exception raised #1866
Documentation Changes
Warning
- Breaking Changes
Added a deprecation warning to
explain_prediction
. It will be deleted in the next release. #1860
- v0.18.2 Feb. 10, 2021
- Enhancements
Added uniqueness score data check #1785
Added “dataframe” output format for prediction explanations #1781
Updated LightGBM estimators to handle
pandas.MultiIndex
#1770Sped up permutation importance for some pipelines #1762
Added sparsity data check #1797
Confirmed support for threshold tuning for binary time series classification problems #1803
Fixes
Changes
- Documentation Changes
Added section on conda to the contributing guide #1771
Updated release process to reflect freezing main before perf tests #1787
Moving some prs to the right section of the release notes #1789
Tweak README.md. #1800
Fixed back arrow on install page docs #1795
Fixed docstring for ClassImbalanceDataCheck.validate() #1817
Testing Changes
- v0.18.1 Feb. 1, 2021
- Enhancements
Added
graph_t_sne
as a visualization tool for high dimensional data #1731Added the ability to see the linear coefficients of features in linear models terms #1738
Added support for
scikit-learn
v0.24.0
#1733Added support for
scipy
v1.6.0
#1752Added SVM Classifier and Regressor to estimators #1714 #1761
Testing Changes
Warning
- v0.18.0 Jan. 26, 2021
- Enhancements
Added RMSLE, MSLE, and MAPE to core objectives while checking for negative target values in
invalid_targets_data_check
#1574Added validation checks for binary problems with regression-like datasets and multiclass problems without true multiclass targets in
invalid_targets_data_check
#1665Added time series support for
make_pipeline
#1566Added target name for output of pipeline
predict
method #1578Added multiclass check to
InvalidTargetDataCheck
for two examples per class #1596Added support for
graphviz
v0.16
#1657Enhanced time series pipelines to accept empty features #1651
Added KNN Classifier to estimators. #1650
Added support for list inputs for objectives #1663
Added support for
AutoMLSearch
to handle time series classification pipelines #1666Enhanced
DelayedFeaturesTransformer
to encode categorical features and targets before delaying them #1691Added 2-way dependence plots. #1690
Added ability to directly iterate through components within Pipelines #1583
- Fixes
Fixed inconsistent attributes and added Exceptions to docs #1673
Fixed
TargetLeakageDataCheck
to use Woodworkmutual_information
rather than using Pandas’ Pearson Correlation #1616Fixed thresholding for pipelines in
AutoMLSearch
to only threshold binary classification pipelines #1622 #1626Updated
load_data
to return Woodwork structures and update default parameter value forindex
toNone
#1610Pinned scipy at < 1.6.0 while we work on adding support #1629
Fixed data check message formatting in
AutoMLSearch
#1633Addressed stacked ensemble component for
scikit-learn
v0.24 support by settingshuffle=True
for default CV #1613Fixed bug where
Imputer
reset the index onX
#1590Fixed
AutoMLSearch
stacktrace when a cutom objective was passed in as a primary objective or additional objective #1575Fixed custom index bug for
MAPE
objective #1641Fixed index bug for
TextFeaturizer
andLSA
components #1644Limited
load_fraud
dataset loaded intoautoml.ipynb
#1646add_to_rankings
updatesAutoMLSearch.best_pipeline
when necessary #1647Fixed bug where time series baseline estimators were not receiving
gap
andmax_delay
inAutoMLSearch
#1645Fixed jupyter notebooks to help the RTD buildtime #1654
Added
positive_only
objectives tonon_core_objectives
#1661Fixed stacking argument
n_jobs
for IterativeAlgorithm #1706Updated CatBoost estimators to return self in
.fit()
rather than the underlying model for consistency #1701Added ability to initialize pipeline parameters in
AutoMLSearch
constructor #1676
- Changes
Added labeling to
graph_confusion_matrix
#1632Rerunning search for
AutoMLSearch
results in a message thrown rather than failing the search, and removedhas_searched
property #1647Changed tuner class to allow and ignore single parameter values as input #1686
Capped LightGBM version limit to remove bug in docs #1711
Removed support for np.random.RandomState in EvalML #1727
- Documentation Changes
Update Model Understanding in the user guide to include
visualize_decision_tree
#1678Updated docs to include information about
AutoMLSearch
callback parameters and methods #1577Updated docs to prompt users to install graphiz on Mac #1656
Added
infer_feature_types
to thestart.ipynb
guide #1700Added multicollinearity data check to API reference and docs #1707
Testing Changes
Warning
- Breaking Changes
Removed
has_searched
property fromAutoMLSearch
#1647Components and pipelines return
Woodwork
data structures instead ofpandas
data structures #1668Removed support for np.random.RandomState in EvalML. Rather than passing
np.random.RandomState
as component and pipeline random_state values, we use int random_seed #1727
- v0.17.0 Dec. 29, 2020
- Enhancements
Added
save_plot
that allows for saving figures from different backends #1588Added
LightGBM Regressor
to regression components #1459Added
visualize_decision_tree
for tree visualization withdecision_tree_data_from_estimator
anddecision_tree_data_from_pipeline
to reformat tree structure output #1511Added DFS Transformer component into transformer components #1454
Added
MAPE
to the standard metrics for time series problems and update objectives #1510Added
graph_prediction_vs_actual_over_time
andget_prediction_vs_actual_over_time_data
to the model understanding module for time series problems #1483Added a
ComponentGraph
class that will support future pipelines as directed acyclic graphs #1415Updated data checks to accept
Woodwork
data structures #1481Added parameter to
InvalidTargetDataCheck
to show only top unique values rather than all unique values #1485Added multicollinearity data check #1515
Added baseline pipeline and components for time series regression problems #1496
Added more information to users about ensembling behavior in
AutoMLSearch
#1527Add woodwork support for more utility and graph methods #1544
Changed
DateTimeFeaturizer
to encode features as int #1479Return trained pipelines from
AutoMLSearch.best_pipeline
#1547Added utility method so that users can set feature types without having to learn about Woodwork directly #1555
Added Linear Discriminant Analysis transformer for dimensionality reduction #1331
Added multiclass support for
partial_dependence
andgraph_partial_dependence
#1554Added
TimeSeriesBinaryClassificationPipeline
andTimeSeriesMulticlassClassificationPipeline
classes #1528Added
make_data_splitter
method for easier automl data split customization #1568Integrated
ComponentGraph
class into Pipelines for full non-linear pipeline support #1543Update
AutoMLSearch
constructor to take training data instead ofsearch
andadd_to_leaderboard
#1597Update
split_data
helper args #1597Add problem type utils
is_regression
,is_classification
,is_timeseries
#1597Rename
AutoMLSearch
data_split
arg todata_splitter
#1569
- Fixes
Fix AutoML not passing CV folds to
DefaultDataChecks
for usage byClassImbalanceDataCheck
#1619Fix Windows CI jobs: install
numba
via conda, required forshap
#1490Added custom-index support for reset-index-get_prediction_vs_actual_over_time_data #1494
Fix
generate_pipeline_code
to account for boolean and None differences between Python and JSON #1524 #1531Set max value for plotly and xgboost versions while we debug CI failures with newer versions #1532
Undo version pinning for plotly #1533
Fix ReadTheDocs build by updating the version of
setuptools
#1561Set
random_state
of data splitter in AutoMLSearch to take int to keep consistency in the resulting splits #1579Pin sklearn version while we work on adding support #1594
Pin pandas at <1.2.0 while we work on adding support #1609
Pin graphviz at < 0.16 while we work on adding support #1609
- Changes
Reverting
save_graph
#1550 to resolve kaleido build issues #1585Update circleci badge to apply to
main
#1489Added script to generate github markdown for releases #1487
Updated selection using pandas
dtypes
to selecting using Woodwork logical types #1551Updated dependencies to fix
ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes'
error and to address Woodwork and Featuretool dependencies #1540Made
get_prediction_vs_actual_data()
a public method #1553Updated
Woodwork
version requirement to v0.0.7 #1560Move data splitters from
evalml.automl.data_splitters
toevalml.preprocessing.data_splitters
#1597Rename “# Testing” in automl log output to “# Validation” #1597
- Testing Changes
Set
n_jobs=1
in most unit tests to reduce memory #1505
Warning
- Breaking Changes
Updated minimal dependencies:
numpy>=1.19.1
,pandas>=1.1.0
,scikit-learn>=0.23.1
,scikit-optimize>=0.8.1
Updated
AutoMLSearch.best_pipeline
to return a trained pipeline. Pass intrain_best_pipeline=False
to AutoMLSearch in order to return an untrained pipeline.Pipeline component instances can no longer be iterated through using
Pipeline.component_graph
#1543Update
AutoMLSearch
constructor to take training data instead ofsearch
andadd_to_leaderboard
#1597Update
split_data
helper args #1597Move data splitters from
evalml.automl.data_splitters
toevalml.preprocessing.data_splitters
#1597Rename
AutoMLSearch
data_split
arg todata_splitter
#1569
- v0.16.1 Dec. 1, 2020
- v0.16.0 Nov. 24, 2020
- Enhancements
Updated pipelines and
make_pipeline
to acceptWoodwork
inputs #1393Updated components to accept
Woodwork
inputs #1423Added ability to freeze hyperparameters for
AutoMLSearch
#1284Added
Target Encoder
into transformer components #1401Added callback for error handling in
AutoMLSearch
#1403Added the index id to the
explain_predictions_best_worst
output to help users identify which rows in their data are included #1365The top_k features displayed in
explain_predictions_*
functions are now determined by the magnitude of shap values as opposed to thetop_k
largest and smallest shap values. #1374Added a problem type for time series regression #1386
Added a
is_defined_for_problem_type
method toObjectiveBase
#1386Added a
random_state
parameter tomake_pipeline_from_components
function #1411Added
DelayedFeaturesTransformer
#1396Added a
TimeSeriesRegressionPipeline
class #1418Removed
core-requirements.txt
from the package distribution #1429Updated data check messages to include a “code” and “details” fields #1451, #1462
Added a
TimeSeriesSplit
data splitter for time series problems #1441Added a
problem_configuration
parameter to AutoMLSearch #1457
- Fixes
Fixed
IndexError
raised inAutoMLSearch
whenensembling = True
but only one pipeline to iterate over #1397Fixed stacked ensemble input bug and LightGBM warning and bug in
AutoMLSearch
#1388Updated enum classes to show possible enum values as attributes #1391
Updated calls to
Woodwork
’sto_pandas()
toto_series()
andto_dataframe()
#1428Fixed bug in OHE where column names were not guaranteed to be unique #1349
Fixed bug with percent improvement of
ExpVariance
objective on data with highly skewed target #1467Fix SimpleImputer error which occurs when all features are bool type #1215
- Changes
Changed
OutliersDataCheck
to return the list of columns, rather than rows, that contain outliers #1377Simplified and cleaned output for Code Generation #1371
Updated data checks to return dictionary of warnings and errors instead of a list #1448
Updated
AutoMLSearch
to passWoodwork
data structures to every pipeline (instead of pandas DataFrames) #1450Update
AutoMLSearch
to default tomax_batches=1
instead ofmax_iterations=5
#1452Updated _evaluate_pipelines to consolidate side effects #1410
- Documentation Changes
Added description of CLA to contributing guide, updated description of draft PRs #1402
Updated documentation to include all data checks,
DataChecks
, and usage of data checks in AutoML #1412Updated docstrings from
np.array
tonp.ndarray
#1417Added section on stacking ensembles in AutoMLSearch documentation #1425
- Testing Changes
Removed
category_encoders
from test-requirements.txt #1373Tweak codecov.io settings again to avoid flakes #1413
Modified
make lint
to check notebook versions in the docs #1431Modified
make lint-fix
to standardize notebook versions in the docs #1431Use new version of pull request Github Action for dependency check (#1443)
Reduced number of workers for tests to 4 #1447
Warning
- Breaking Changes
The
top_k
andtop_k_features
parameters inexplain_predictions_*
functions now returnk
features as opposed to2 * k
features #1374Renamed
problem_type
toproblem_types
inRegressionObjective
,BinaryClassificationObjective
, andMulticlassClassificationObjective
#1319Data checks now return a dictionary of warnings and errors instead of a list #1448
- v0.15.0 Oct. 29, 2020
- Enhancements
Added stacked ensemble component classes (
StackedEnsembleClassifier
,StackedEnsembleRegressor
) #1134Added stacked ensemble components to
AutoMLSearch
#1253Added
DecisionTreeClassifier
andDecisionTreeRegressor
to AutoML #1255Added
graph_prediction_vs_actual
inmodel_understanding
for regression problems #1252Added parameter to
OneHotEncoder
to enable filtering for features to encode for #1249Added percent-better-than-baseline for all objectives to automl.results #1244
Added
HighVarianceCVDataCheck
and replaced synonymous warning inAutoMLSearch
#1254Added PCA Transformer component for dimensionality reduction #1270
Added
generate_pipeline_code
andgenerate_component_code
to allow for code generation given a pipeline or component instance #1306Added
PCA Transformer
component for dimensionality reduction #1270Updated
AutoMLSearch
to supportWoodwork
data structures #1299Added cv_folds to
ClassImbalanceDataCheck
and added this check toDefaultDataChecks
#1333Make
max_batches
argument toAutoMLSearch.search
public #1320Added text support to automl search #1062
Added
_pipelines_per_batch
as a private argument toAutoMLSearch
#1355
- Fixes
Fixed ML performance issue with ordered datasets: always shuffle data in automl’s default CV splits #1265
Fixed broken
evalml info
CLI command #1293Fixed
boosting type='rf'
for LightGBM Classifier, as well asnum_leaves
error #1302Fixed bug in
explain_predictions_best_worst
where a custom index in the target variable would cause aValueError
#1318Added stacked ensemble estimators to to
evalml.pipelines.__init__
file #1326Fixed bug in OHE where calls to transform were not deterministic if
top_n
was less than the number of categories in a column #1324Fixed LightGBM warning messages during AutoMLSearch #1342
Fix warnings thrown during AutoMLSearch in
HighVarianceCVDataCheck
#1346Fixed bug where TrainingValidationSplit would return invalid location indices for dataframes with a custom index #1348
Fixed bug where the AutoMLSearch
random_state
was not being passed to the created pipelines #1321
- Changes
Allow
add_to_rankings
to be called before AutoMLSearch is called #1250Removed Graphviz from test-requirements to add to requirements.txt #1327
Removed
max_pipelines
parameter fromAutoMLSearch
#1264Include editable installs in all install make targets #1335
Made pip dependencies featuretools and nlp_primitives core dependencies #1062
Removed PartOfSpeechCount from TextFeaturizer transform primitives #1062
Added warning for
partial_dependency
when the feature includes null values #1352
- Documentation Changes
Fixed and updated code blocks in Release Notes #1243
Added DecisionTree estimators to API Reference #1246
Changed class inheritance display to flow vertically #1248
Updated cost-benefit tutorial to use a holdout/test set #1159
Added
evalml info
command to documentation #1293Miscellaneous doc updates #1269
Removed conda pre-release testing from the release process document #1282
Updates to contributing guide #1310
Added Alteryx footer to docs with Twitter and Github link #1312
Added documentation for evalml installation for Python 3.6 #1322
Added documentation changes to make the API Docs easier to understand #1323
Fixed documentation for
feature_importance
#1353Added tutorial for running AutoML with text data #1357
Added documentation for woodwork integration with automl search #1361
- Testing Changes
Added tests for
jupyter_check
to handle IPython #1256Cleaned up
make_pipeline
tests to test for all estimators #1257Added a test to check conda build after merge to main #1247
Removed code that was lacking codecov for
__main__.py
and unnecessary #1293Codecov: round coverage up instead of down #1334
Add DockerHub credentials to CI testing environment #1356
Add DockerHub credentials to conda testing environment #1363
Warning
- Breaking Changes
Renamed
LabelLeakageDataCheck
toTargetLeakageDataCheck
#1319max_pipelines
parameter has been removed fromAutoMLSearch
. Please usemax_iterations
instead. #1264AutoMLSearch.search()
will now log a warning if the input is not aWoodwork
data structure (pandas
,numpy
) #1299Make
max_batches
argument toAutoMLSearch.search
public #1320Removed unused argument feature_types from AutoMLSearch.search #1062
- v0.14.1 Sep. 29, 2020
- Enhancements
Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns #1150
Added
get_feature_names
onOneHotEncoder
#1193Added
detect_problem_type
toproblem_type/utils.py
to automatically detect the problem type given targets #1194Added LightGBM to
AutoMLSearch
#1199Updated
scikit-learn
andscikit-optimize
to use latest versions - 0.23.2 and 0.8.1 respectively #1141Added
__str__
and__repr__
for pipelines and components #1218Included internal target check for both training and validation data in
AutoMLSearch
#1226Added
ProblemTypes.all_problem_types
helper to get list of supported problem types #1219Added
DecisionTreeClassifier
andDecisionTreeRegressor
classes #1223Added
ProblemTypes.all_problem_types
helper to get list of supported problem types #1219DataChecks
can now be parametrized by passing a list ofDataCheck
classes and a parameter dictionary #1167Added first CV fold score as validation score in
AutoMLSearch.rankings
#1221Updated
flake8
configuration to enable linting on__init__.py
files #1234Refined
make_pipeline_from_components
implementation #1204
- Changes
Added
allow_writing_files
as a named argument to CatBoost estimators. #1202Added
solver
andmulti_class
as named arguments toLogisticRegressionClassifier
#1202Replaced pipeline’s
._transform
method to evaluate all the preprocessing steps of a pipeline with.compute_estimator_features
#1231Changed default large dataset train/test splitting behavior #1205
- Documentation Changes
Included description of how to access the component instances and features for pipeline user guide #1163
Updated API docs to refer to target as “target” instead of “labels” for non-classification tasks and minor docs cleanup #1160
Added Class Imbalance Data Check to
api_reference.rst
#1190 #1200Added pipeline properties to API reference #1209
Clarified what the objective parameter in AutoML is used for in AutoML API reference and AutoML user guide #1222
Updated API docs to include
skopt.space.Categorical
option for component hyperparameter range definition #1228Added install documentation for
libomp
in order to use LightGBM on Mac #1233Improved description of
max_iterations
in documentation #1212Removed unused code from sphinx conf #1235
Testing Changes
Warning
- Breaking Changes
DefaultDataChecks
now accepts aproblem_type
parameter that must be specified #1167Pipeline’s
._transform
method to evaluate all the preprocessing steps of a pipeline has been replaced with.compute_estimator_features
#1231get_objectives
has been renamed toget_core_objectives
. This function will now return a list of valid objective instances #1230
- v0.13.2 Sep. 17, 2020
- Enhancements
Added
output_format
field to explain predictions functions #1107Modified
get_objective
andget_objectives
to be able to return any objective inevalml.objectives
#1132Added a
return_instance
boolean parameter toget_objective
#1132Added
ClassImbalanceDataCheck
to determine whether target imbalance falls below a given threshold #1135Added label encoder to LightGBM for binary classification #1152
Added labels for the row index of confusion matrix #1154
Added
AutoMLSearch
object as another parameter in search callbacks #1156Added the corresponding probability threshold for each point displayed in
graph_roc_curve
#1161Added
__eq__
forComponentBase
andPipelineBase
#1178Added support for multiclass classification for
roc_curve
#1164Added
categories
accessor toOneHotEncoder
for listing the categories associated with a feature #1182Added utility function to create pipeline instances from a list of component instances #1176
- Fixes
Fixed XGBoost column names for partial dependence methods #1104
Removed dead code validating column type from
TextFeaturizer
#1122Fixed issue where
Imputer
cannot fit when there is None in a categorical or boolean column #1144OneHotEncoder
preserves the custom index in the input data #1146Fixed representation for
ModelFamily
#1165Removed duplicate
nbsphinx
dependency indev-requirements.txt
#1168Users can now pass in any valid kwargs to all estimators #1157
Remove broken accessor
OneHotEncoder.get_feature_names
and unneeded base class #1179Removed LightGBM Estimator from AutoML models #1186
- Documentation Changes
Fixed API docs for
AutoMLSearch
add_result_callback
#1113Added a step to our release process for pushing our latest version to conda-forge #1118
Added warning for missing ipywidgets dependency for using
PipelineSearchPlots
on Jupyterlab #1145Updated
README.md
example to load demo dataset #1151Swapped mapping of breast cancer targets in
model_understanding.ipynb
#1170
Warning
- Breaking Changes
get_objective
will now return a class definition rather than an instance by default #1132Deleted
OPTIONS
dictionary inevalml.objectives.utils.py
#1132If specifying an objective by string, the string must now match the objective’s name field, case-insensitive #1132
- Passing “Cost Benefit Matrix”, “Fraud Cost”, “Lead Scoring”, “Mean Squared Log Error”,
“Recall”, “Recall Macro”, “Recall Micro”, “Recall Weighted”, or “Root Mean Squared Log Error” to
AutoMLSearch
will now result in aValueError
rather than anObjectiveNotFoundError
#1132
Search callbacks
start_iteration_callback
andadd_results_callback
have changed to include a copy of the AutoMLSearch object as a third parameter #1156Deleted
OneHotEncoder.get_feature_names
method which had been broken for a while, in favor of pipelines’input_feature_names
#1179Deleted empty base class
CategoricalEncoder
whichOneHotEncoder
component was inheriting from #1176Results from
roc_curve
will now return as a list of dictionaries with each dictionary representing a class #1164max_pipelines
now raises aDeprecationWarning
and will be removed in the next release.max_iterations
should be used instead. #1169
- v0.13.1 Aug. 25, 2020
- Enhancements
Added Cost-Benefit Matrix objective for binary classification #1038
Split
fill_value
intocategorical_fill_value
andnumeric_fill_value
for Imputer #1019Added
explain_predictions
andexplain_predictions_best_worst
for explaining multiple predictions with SHAP #1016Added new LSA component for text featurization #1022
Added guide on installing with conda #1041
Added a “cost-benefit curve” util method to graph cost-benefit matrix scores vs. binary classification thresholds #1081
Standardized error when calling transform/predict before fit for pipelines #1048
Added
percent_better_than_baseline
to AutoML search rankings and full rankings table #1050Added one-way partial dependence and partial dependence plots #1079
Added “Feature Value” column to prediction explanation reports. #1064
Added
max_batches
parameter toAutoMLSearch
#1087
- Fixes
Updated
TextFeaturizer
component to no longer require an internet connection to run #1022Fixed non-deterministic element of
TextFeaturizer
transformations #1022Added a StandardScaler to all ElasticNet pipelines #1065
Updated cost-benefit matrix to normalize score #1099
Fixed logic in
calculate_percent_difference
so that it can handle negative values #1100
- Changes
Added
needs_fitting
property toComponentBase
#1044Updated references to data types to use datatype lists defined in
evalml.utils.gen_utils
#1039Remove maximum version limit for SciPy dependency #1051
Moved
all_components
and other component importers into runtime methods #1045Consolidated graphing utility methods under
evalml.utils.graph_utils
#1060Made slight tweaks to how
TextFeaturizer
usesfeaturetools
, and did some refactoring of that and of LSA #1090Changed
show_all_features
parameter intoimportance_threshold
, which allows for thresholding feature importance #1097, #1103
Warning
- v0.12.2 Aug. 6, 2020
- v0.12.0 Aug. 3, 2020
- Enhancements
Added string and categorical targets support for binary and multiclass pipelines and check for numeric targets for
DetectLabelLeakage
data check #932Added clear exception for regression pipelines if target datatype is string or categorical #960
Added target column names and class labels in
predict
andpredict_proba
output for pipelines #951Added
_compute_shap_values
andnormalize_values
topipelines/explanations
module #958Added
explain_prediction
feature which explains single predictions with SHAP #974Added Imputer to allow different imputation strategies for numerical and categorical dtypes #991
Added support for configuring logfile path using env var, and don’t create logger if there are filesystem errors #975
Updated catboost estimators’ default parameters and automl hyperparameter ranges to speed up fit time #998
- Fixes
Fixed ReadtheDocs warning failure regarding embedded gif #943
Removed incorrect parameter passed to pipeline classes in
_add_baseline_pipelines
#941Added universal error for calling
predict
,predict_proba
,transform
, andfeature_importances
before fitting #969, #994Made
TextFeaturizer
component and pip dependenciesfeaturetools
andnlp_primitives
optional #976Updated imputation strategy in automl to no longer limit impute strategy to
most_frequent
for all features if there are any categorical columns #991Fixed
UnboundLocalError
forcv_pipeline
when automl search errors #996Fixed
Imputer
to reset dataframe index to preserve behavior expected fromSimpleImputer
#1009
- Changes
Moved
get_estimators
toevalml.pipelines.components.utils
#934Modified Pipelines to raise
PipelineScoreError
when they encounter an error during scoring #936Moved
evalml.model_families.list_model_families
toevalml.pipelines.components.allowed_model_families
#959Renamed
DateTimeFeaturization
toDateTimeFeaturizer
#977Added check to stop search and raise an error if all pipelines in a batch return NaN scores #1015
- Documentation Changes
Updated
README.md
#963Reworded message when errors are returned from data checks in search #982
Added section on understanding model predictions with
explain_prediction
to User Guide #981Added a section to the user guide and api reference about how XGBoost and CatBoost are not fully supported. #992
Added custom components section in user guide #993
Updated FAQ section formatting #997
Updated release process documentation #1003
Warning
- Breaking Changes
get_estimators
has been moved toevalml.pipelines.components.utils
(previously was underevalml.pipelines.utils
) #934Removed the
raise_errors
flag in AutoML search. All errors during pipeline evaluation will be caught and logged. #936evalml.model_families.list_model_families
has been moved toevalml.pipelines.components.allowed_model_families
#959TextFeaturizer
: thefeaturetools
andnlp_primitives
packages must be installed after installing evalml in order to use this component #976Renamed
DateTimeFeaturization
toDateTimeFeaturizer
#977
- v0.11.2 July 16, 2020
- Enhancements
Added
NoVarianceDataCheck
toDefaultDataChecks
#893Added text processing and featurization component
TextFeaturizer
#913, #924Added additional checks to
InvalidTargetDataCheck
to handle invalid target data types #929AutoMLSearch
will now handleKeyboardInterrupt
and prompt user for confirmation #915
- Fixes
Makes automl results a read-only property #919
- Changes
Deleted static pipelines and refactored tests involving static pipelines, removed
all_pipelines()
andget_pipelines()
#904Moved
list_model_families
toevalml.model_family.utils
#903Updated
all_pipelines
,all_estimators
,all_components
to use the same mechanism for dynamically generating their elements #898Rename
master
branch tomain
#918Add pypi release github action #923
Updated
AutoMLSearch.search
stdout output and logging and removed tqdm progress bar #921Moved automl config checks previously in
search()
to init #933
- Testing Changes
Cleaned up fixture names and usages in tests #895
Warning
- Breaking Changes
list_model_families
has been moved toevalml.model_family.utils
(previously was underevalml.pipelines.utils
) #903get_estimators
has been moved toevalml.pipelines.components.utils
(previously was underevalml.pipelines.utils
) #934Static pipeline definitions have been removed, but similar pipelines can still be constructed via creating an instance of
PipelineBase
#904all_pipelines()
andget_pipelines()
utility methods have been removed #904
- v0.11.0 June 30, 2020
- Enhancements
Added multiclass support for ROC curve graphing #832
Added preprocessing component to drop features whose percentage of NaN values exceeds a specified threshold #834
Added data check to check for problematic target labels #814
Added PerColumnImputer that allows imputation strategies per column #824
Added transformer to drop specific columns #827
Added support for
categories
,handle_error
, anddrop
parameters inOneHotEncoder
#830 #897Added preprocessing component to handle DateTime columns featurization #838
Added ability to clone pipelines and components #842
Define getter method for component
parameters
#847Added utility methods to calculate and graph permutation importances #860, #880
Added new utility functions necessary for generating dynamic preprocessing pipelines #852
Added kwargs to all components #863
Updated
AutoSearchBase
to use dynamically generated preprocessing pipelines #870Added SelectColumns transformer #873
Added ability to evaluate additional pipelines for automl search #874
Added
default_parameters
class property to components and pipelines #879Added better support for disabling data checks in automl search #892
Added ability to save and load AutoML objects to file #888
Updated
AutoSearchBase.get_pipelines
to return an untrained pipeline instance #876Saved learned binary classification thresholds in automl results cv data dict #876
- Fixes
Fixed bug where SimpleImputer cannot handle dropped columns #846
Fixed bug where PerColumnImputer cannot handle dropped columns #855
Enforce requirement that builtin components save all inputted values in their parameters dict #847
Don’t list base classes in
all_components
output #847Standardize all components to output pandas data structures, and accept either pandas or numpy #853
Fixed rankings and full_rankings error when search has not been run #894
- Changes
Update
all_pipelines
andall_components
to try initializing pipelines/components, and on failure exclude them #849Refactor
handle_components
tohandle_components_class
, standardize toComponentBase
subclass instead of instance #850Refactor “blacklist”/”whitelist” to “allow”/”exclude” lists #854
Replaced
AutoClassificationSearch
andAutoRegressionSearch
withAutoMLSearch
#871Renamed feature_importances and permutation_importances methods to use singular names (feature_importance and permutation_importance) #883
Updated
automl
default data splitter to train/validation split for large datasets #877Added open source license, update some repo metadata #887
Removed dead code in
_get_preprocessing_components
#896
- Documentation Changes
Fix some typos and update the EvalML logo #872
Warning
- Breaking Changes
Pipelines’ static
component_graph
field must contain eitherComponentBase
subclasses orstr
, instead ofComponentBase
subclass instances #850Rename
handle_component
tohandle_component_class
. Now standardizes toComponentBase
subclasses instead ofComponentBase
subclass instances #850Renamed automl’s
cv
argument todata_split
#877Pipelines’ and classifiers’
feature_importances
is renamedfeature_importance
,graph_feature_importances
is renamedgraph_feature_importance
#883Passing
data_checks=None
to automl search will not perform any data checks as opposed to default checks. #892Pipelines to search for in AutoML are now determined automatically, rather than using the statically-defined pipeline classes. #870
Updated
AutoSearchBase.get_pipelines
to return an untrained pipeline instance, instead of one which happened to be trained on the final cross-validation fold #876
- v0.10.0 May 29, 2020
- Enhancements
Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML #746
Port over highly-null guardrail as a data check and define
DefaultDataChecks
andDisableDataChecks
classes #745Update
Tuner
classes to work directly with pipeline parameters dicts instead of flat parameter lists #779Add Elastic Net as a pipeline option #812
Added new Pipeline option
ExtraTrees
#790Added precicion-recall curve metrics and plot for binary classification problems in
evalml.pipeline.graph_utils
#794Update the default automl algorithm to search in batches, starting with default parameters for each pipeline and iterating from there #793
Added
AutoMLAlgorithm
class andIterativeAlgorithm
impl, separated fromAutoSearchBase
#793
- Fixes
Update pipeline
score
to returnnan
score for any objective which throws an exception during scoring #787Fixed bug introduced in #787 where binary classification metrics requiring predicted probabilities error in scoring #798
CatBoost and XGBoost classifiers and regressors can no longer have a learning rate of 0 #795
- Changes
Cleanup pipeline
score
code, and cleanup codecov #711Remove
pass
for abstract methods for codecov #730Added __str__ for AutoSearch object #675
Add util methods to graph ROC and confusion matrix #720
Refactor
AutoBase
toAutoSearchBase
#758Updated AutoBase with
data_checks
parameter, removed previousdetect_label_leakage
parameter, and added functionality to run data checks before search in AutoML #765Updated our logger to use Python’s logging utils #763
Refactor most of
AutoSearchBase._do_iteration
impl intoAutoSearchBase._evaluate
#762Port over all guardrails to use the new DataCheck API #789
Expanded
import_or_raise
to catch all exceptions #759Adds RMSE, MSLE, RMSLE as standard metrics #788
Don’t allow
Recall
to be used as an objective for AutoML #784Removed feature selection from pipelines #819
Update default estimator parameters to make automl search faster and more accurate #793
- Testing Changes
Delete codecov yml, use codecov.io’s default #732
Added unit tests for fraud cost, lead scoring, and standard metric objectives #741
Update codecov client #782
Updated AutoBase __str__ test to include no parameters case #783
Added unit tests for
ExtraTrees
pipeline #790If codecov fails to upload, fail build #810
Updated Python version of dependency action #816
Update the dependency update bot to use a suffix when creating branches #817
Warning
- Breaking Changes
The
detect_label_leakage
parameter for AutoML classes has been removed and replaced by adata_checks
parameter #765Moved ROC and confusion matrix methods from
evalml.pipeline.plot_utils
toevalml.pipeline.graph_utils
#720Tuner
classes require a pipeline hyperparameter range dict as an init arg instead of a space definition #779Tuner.propose
andTuner.add
work directly with pipeline parameters dicts instead of flat parameter lists #779PipelineBase.hyperparameters
andcustom_hyperparameters
use pipeline parameters dict format instead of being represented as a flat list #779All guardrail functions previously under
evalml.guardrails.utils
will be removed and replaced by data checks #789Recall
disallowed as an objective for AutoML #784AutoSearchBase
parametertuner
has been renamed totuner_class
#793AutoSearchBase
parameterpossible_pipelines
andpossible_model_families
have been renamed toallowed_pipelines
andallowed_model_families
#793
- v0.9.0 Apr. 27, 2020
- Enhancements
Added
Accuracy
as an standard objective #624Added verbose parameter to load_fraud #560
Added Balanced Accuracy metric for binary, multiclass #612 #661
Added XGBoost regressor and XGBoost regression pipeline #666
Added
Accuracy
metric for multiclass #672Added objective name in
AutoBase.describe_pipeline
#686Added
DataCheck
andDataChecks
,Message
classes and relevant subclasses #739
- Fixes
Removed direct access to
cls.component_graph
#595Add testing files to .gitignore #625
Remove circular dependencies from
Makefile
#637Add error case for
normalize_confusion_matrix()
#640Fixed
XGBoostClassifier
andXGBoostRegressor
bug with feature names that contain [, ], or < #659Update
make_pipeline_graph
to not accidentally create empty file when testing if path is valid #649Fix pip installation warning about docsutils version, from boto dependency #664
Removed zero division warning for F1/precision/recall metrics #671
Fixed
summary
for pipelines without estimators #707
- Changes
Updated default objective for binary/multiclass classification to log loss #613
Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes #405
Changed the output of
score
to return one dictionary #429Created binary and multiclass objective subclasses #504
Updated objectives API #445
Removed call to
get_plot_data
from AutoML #615Set
raise_error
to default to True for AutoML classes #638Remove unnecessary “u” prefixes on some unicode strings #641
Changed one-hot encoder to return uint8 dtypes instead of ints #653
Pipeline
_name
field changed tocustom_name
#650Removed
graphs.py
and moved methods intoPipelineBase
#657, #665Remove s3fs as a dev dependency #664
Changed requirements-parser to be a core dependency #673
Replace
supported_problem_types
field on pipelines withproblem_type
attribute on base classes #678Changed AutoML to only show best results for a given pipeline template in
rankings
, addedfull_rankings
property to show all #682Update
ModelFamily
values: don’t list xgboost/catboost as classifiers now that we have regression pipelines for them #677Changed AutoML’s
describe_pipeline
to get problem type from pipeline instead #685Standardize
import_or_raise
error messages #683Updated argument order of objectives to align with sklearn’s #698
Renamed
pipeline.feature_importance_graph
topipeline.graph_feature_importances
#700Moved ROC and confusion matrix methods to
evalml.pipelines.plot_utils
#704Renamed
MultiClassificationObjective
toMulticlassClassificationObjective
, to align with pipeline naming scheme #715
- Documentation Changes
Fixed some sphinx warnings #593
Fixed docstring for
AutoClassificationSearch
with correct command #599Limit readthedocs formats to pdf, not htmlzip and epub #594 #600
Clean up objectives API documentation #605
Fixed function on Exploring search results page #604
Update release process doc #567
AutoClassificationSearch
andAutoRegressionSearch
show inherited methods in API reference #651Fixed improperly formatted code in breaking changes for changelog #655
Added configuration to treat Sphinx warnings as errors #660
Removed separate plotting section for pipelines in API reference #657, #665
Have leads example notebook load S3 files using https, so we can delete s3fs dev dependency #664
Categorized components in API reference and added descriptions for each category #663
Fixed Sphinx warnings about
BalancedAccuracy
objective #669Updated API reference to include missing components and clean up pipeline docstrings #689
Reorganize API ref, and clarify pipeline sub-titles #688
Add and update preprocessing utils in API reference #687
Added inheritance diagrams to API reference #695
Documented which default objective AutoML optimizes for #699
Create seperate install page #701
Include more utils in API ref, like
import_or_raise
#704Add more color to pipeline documentation #705
- Testing Changes
Matched install commands of
check_latest_dependencies
test and it’s GitHub action #578Added Github app to auto assign PR author as assignee #477
Removed unneeded conda installation of xgboost in windows checkin tests #618
Update graph tests to always use tmpfile dir #649
Changelog checkin test workaround for release PRs: If ‘future release’ section is empty of PR refs, pass check #658
Add changelog checkin test exception for
dep-update
branch #723
Warning
Breaking Changes
Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.
fit()
andpredict()
now use an optionalobjective
parameter, which is only used in binary classification pipelines to fit for a specific objective.score()
will now use a requiredobjectives
parameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline’s objective was scored on regardless.score()
will now return one dictionary of all objective scores.ROC
andConfusionMatrix
plot methods viaAuto(*).plot
have been removed by #615 and are replaced byroc_curve
andconfusion_matrix
inevamlm.pipelines.plot_utils
in #704normalize_confusion_matrix
has been moved toevalml.pipelines.plot_utils
#704Pipelines
_name
field changed tocustom_name
Pipelines
supported_problem_types
field is removed because it is no longer necessary #678Updated argument order of objectives’
objective_function
to align with sklearn #698pipeline.feature_importance_graph
has been renamed topipeline.graph_feature_importances
in #700Removed unsupported
MSLE
objective #704
- v0.8.0 Apr. 1, 2020
- Enhancements
Add normalization option and information to confusion matrix #484
Add util function to drop rows with NaN values #487
Renamed
PipelineBase.name
asPipelineBase.summary
and redefinedPipelineBase.name
as class property #491Added access to parameters in Pipelines with
PipelineBase.parameters
(used to be return ofPipelineBase.describe
) #501Added
fill_value
parameter forSimpleImputer
#509Added functionality to override component hyperparameters and made pipelines take hyperparemeters from components #516
Allow
numpy.random.RandomState
for random_state parameters #556
- Fixes
Removed unused dependency
matplotlib
, and movecategory_encoders
to test reqs #572
- Changes
Undo version cap in XGBoost placed in #402 and allowed all released of XGBoost #407
Support pandas 1.0.0 #486
Made all references to the logger static #503
Refactored
model_type
parameter for components and pipelines tomodel_family
#507Refactored
problem_types
for pipelines and components intosupported_problem_types
#515Moved
pipelines/utils.save_pipeline
andpipelines/utils.load_pipeline
toPipelineBase.save
andPipelineBase.load
#526Limit number of categories encoded by
OneHotEncoder
#517
Warning
Breaking Changes
AutoClassificationSearch
andAutoRegressionSearch
’smodel_types
parameter has been refactored intoallowed_model_families
ModelTypes
enum has been changed toModelFamily
Components and Pipelines now have a
model_family
field instead ofmodel_type
get_pipelines
utility function now acceptsmodel_families
as an argument instead ofmodel_types
PipelineBase.name
no longer returns structure of pipeline and has been replaced byPipelineBase.summary
PipelineBase.problem_types
andEstimator.problem_types
has been renamed tosupported_problem_types
pipelines/utils.save_pipeline
andpipelines/utils.load_pipeline
moved toPipelineBase.save
andPipelineBase.load
- v0.7.0 Mar. 9, 2020
- Enhancements
Added emacs buffers to .gitignore #350
Add CatBoost (gradient-boosted trees) classification and regression components and pipelines #247
Added Tuner abstract base class #351
Added
n_jobs
as parameter forAutoClassificationSearch
andAutoRegressionSearch
#403Changed colors of confusion matrix to shades of blue and updated axis order to match scikit-learn’s #426
Added
PipelineBase
.graph
and.feature_importance_graph
methods, moved from previous location #423Added support for python 3.8 #462
- Changes
Added
n_estimators
as a tunable parameter for XGBoost #307Remove unused parameter
ObjectiveBase.fit_needs_proba
#320Remove extraneous parameter
component_type
from all components #361Remove unused
rankings.csv
file #397Downloaded demo and test datasets so unit tests can run offline #408
Remove
_needs_fitting
attribute from Components #398Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all #413
Refactored
PipelineBase
to take in parameter dictionary and moved pipeline metadata to class attribute #421Dropped support for Python 3.5 #438
Removed unused
apply.py
file #449Clean up
requirements.txt
to remove unused deps #451Support installation without all required dependencies #459
- Documentation Changes
Update release.md with instructions to release to internal license key #354
- Testing Changes
Added tests for utils (and moved current utils to gen_utils) #297
Moved XGBoost install into it’s own separate step on Windows using Conda #313
Rewind pandas version to before 1.0.0, to diagnose test failures for that version #325
Added dependency update checkin test #324
Rewind XGBoost version to before 1.0.0 to diagnose test failures for that version #402
Update dependency check to use a whitelist #417
Update unit test jobs to not install dev deps #455
Warning
Breaking Changes
Python 3.5 will not be actively supported.
- v0.6.0 Dec. 16, 2019
- Enhancements
Added ability to create a plot of feature importances #133
Add early stopping to AutoML using patience and tolerance parameters #241
Added ROC and confusion matrix metrics and plot for classification problems and introduce PipelineSearchPlots class #242
Enhanced AutoML results with search order #260
Added utility function to show system and environment information #300
- Changes
Renamed automl classes to
AutoRegressionSearch
andAutoClassificationSearch
#287Updating demo datasets to retain column names #223
Moving pipeline visualization to
PipelinePlot
class #228Standarizing inputs as
pd.Dataframe
/pd.Series
#130Enforcing that pipelines must have an estimator as last component #277
Added
ipywidgets
as a dependency inrequirements.txt
#278Added Random and Grid Search Tuners #240
Warning
Breaking Changes
The
fit()
method forAutoClassifier
andAutoRegressor
has been renamed tosearch()
.AutoClassifier
has been renamed toAutoClassificationSearch
AutoRegressor
has been renamed toAutoRegressionSearch
AutoClassificationSearch.results
andAutoRegressionSearch.results
now is a dictionary withpipeline_results
andsearch_order
keys.pipeline_results
can be used to access a dictionary that is identical to the old.results
dictionary. Whereas,search_order
returns a list of the search order in terms ofpipeline_id
.Pipelines now require an estimator as the last component in
component_list
. Slicing pipelines now throws anNotImplementedError
to avoid returning pipelines without an estimator.
- v0.5.2 Nov. 18, 2019
- v0.5.1 Nov. 15, 2019
- v0.5.0 Oct. 29, 2019
- Enhancements
Added basic one hot encoding #73
Use enums for model_type #110
Support for splitting regression datasets #112
Auto-infer multiclass classification #99
Added support for other units in
max_time
#125Detect highly null columns #121
Added additional regression objectives #100
Show an interactive iteration vs. score plot when using fit() #134
- v0.4.1 Sep. 16, 2019
- Enhancements
Added AutoML for classification and regressor using Autobase and Skopt #7 #9
Implemented standard classification and regression metrics #7
Added logistic regression, random forest, and XGBoost pipelines #7
Implemented support for custom objectives #15
Feature importance for pipelines #18
Serialization for pipelines #19
Allow fitting on objectives for optimal threshold #27
Added detect label leakage #31
Implemented callbacks #42
Allow for multiclass classification #21
Added support for additional objectives #79
- Testing Changes
Added testing for loading data #39
- v0.2.0 Aug. 13, 2019
- Enhancements
Created fraud detection objective #4
- v0.1.0 July. 31, 2019