dimensionality_reduction#

Transformers that reduce the dimensionality of the input data.

Submodules#

Package Contents#

Classes Summary#

LinearDiscriminantAnalysis

Reduces the number of features by using Linear Discriminant Analysis.

PCA

Reduces the number of features by using Principal Component Analysis (PCA).

Contents#

class evalml.pipelines.components.transformers.dimensionality_reduction.LinearDiscriminantAnalysis(n_components=None, random_seed=0, **kwargs)[source]#

Reduces the number of features by using Linear Discriminant Analysis.

Parameters
  • n_components (int) – The number of features to maintain after computation. Defaults to None.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

{}

modifies_features

True

modifies_target

False

name

Linear Discriminant Analysis Transformer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the LDA component.

fit_transform

Fit and transform data using the LDA component.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transform data using the fitted LDA component.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y)[source]#

Fits the LDA component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If input data is not all numeric.

fit_transform(self, X, y=None)[source]#

Fit and transform data using the LDA component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

Raises

ValueError – If input data is not all numeric.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transform data using the fitted LDA component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

Raises

ValueError – If input data is not all numeric.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.

class evalml.pipelines.components.transformers.dimensionality_reduction.PCA(variance=0.95, n_components=None, random_seed=0, **kwargs)[source]#

Reduces the number of features by using Principal Component Analysis (PCA).

Parameters
  • variance (float) – The percentage of the original data variance that should be preserved when reducing the number of features. Defaults to 0.95.

  • n_components (int) – The number of features to maintain after computing SVD. Defaults to None, but will override variance variable if set.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Attributes

hyperparameter_ranges

Real(0.25, 1)}:type: {“variance”

modifies_features

True

modifies_target

False

name

PCA Transformer

training_only

False

Methods

clone

Constructs a new component with the same parameters and random state.

default_parameters

Returns the default parameters for this component.

describe

Describe a component and its parameters.

fit

Fits the PCA component.

fit_transform

Fit and transform data using the PCA component.

load

Loads component at file path.

needs_fitting

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

parameters

Returns the parameters which were used to initialize the component.

save

Saves component at file path.

transform

Transform data using fitted PCA component.

update_parameters

Updates the parameter dictionary of the component.

clone(self)#

Constructs a new component with the same parameters and random state.

Returns

A new instance of this component with identical parameters and random state.

default_parameters(cls)#

Returns the default parameters for this component.

Our convention is that Component.default_parameters == Component().parameters.

Returns

Default parameters for this component.

Return type

dict

describe(self, print_name=False, return_dict=False)#

Describe a component and its parameters.

Parameters
  • print_name (bool, optional) – whether to print name of component

  • return_dict (bool, optional) – whether to return description as dictionary in the format {“name”: name, “parameters”: parameters}

Returns

Returns dictionary if return_dict is True, else None.

Return type

None or dict

fit(self, X, y=None)[source]#

Fits the PCA component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

self

Raises

ValueError – If input data is not all numeric.

fit_transform(self, X, y=None)[source]#

Fit and transform data using the PCA component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

Raises

ValueError – If input data is not all numeric.

static load(file_path)#

Loads component at file path.

Parameters

file_path (str) – Location to load file.

Returns

ComponentBase object

needs_fitting(self)#

Returns boolean determining if component needs fitting before calling predict, predict_proba, transform, or feature_importances.

This can be overridden to False for components that do not need to be fit or whose fit methods do nothing.

Returns

True.

property parameters(self)#

Returns the parameters which were used to initialize the component.

save(self, file_path, pickle_protocol=cloudpickle.DEFAULT_PROTOCOL)#

Saves component at file path.

Parameters
  • file_path (str) – Location to save file.

  • pickle_protocol (int) – The pickle data stream format.

transform(self, X, y=None)[source]#

Transform data using fitted PCA component.

Parameters
  • X (pd.DataFrame) – The input training data of shape [n_samples, n_features].

  • y (pd.Series, optional) – The target training data of length [n_samples].

Returns

Transformed data.

Return type

pd.DataFrame

Raises

ValueError – If input data is not all numeric.

update_parameters(self, update_dict, reset_fit=True)#

Updates the parameter dictionary of the component.

Parameters
  • update_dict (dict) – A dict of parameters to update.

  • reset_fit (bool, optional) – If True, will set _is_fitted to False.