utils =========================================== .. py:module:: evalml.pipelines.components.utils .. autoapi-nested-parse:: Utility methods for EvalML components. Module Contents --------------- Classes Summary ~~~~~~~~~~~~~~~ .. autoapisummary:: evalml.pipelines.components.utils.WrappedSKClassifier evalml.pipelines.components.utils.WrappedSKRegressor Functions ~~~~~~~~~ .. autoapisummary:: :nosignatures: evalml.pipelines.components.utils.all_components evalml.pipelines.components.utils.allowed_model_families evalml.pipelines.components.utils.drop_natural_language_columns evalml.pipelines.components.utils.estimator_unable_to_handle_nans evalml.pipelines.components.utils.generate_component_code evalml.pipelines.components.utils.get_estimators evalml.pipelines.components.utils.handle_component_class evalml.pipelines.components.utils.make_balancing_dictionary evalml.pipelines.components.utils.scikit_learn_wrapped_estimator evalml.pipelines.components.utils.set_boolean_columns_to_categorical Contents ~~~~~~~~~~~~~~~~~~~ .. py:function:: all_components() Get all available components. .. py:function:: allowed_model_families(problem_type) List the model types allowed for a particular problem type. :param problem_type: ProblemTypes enum or string. :type problem_type: ProblemTypes or str :returns: A list of model families. :rtype: list[ModelFamily] .. py:function:: drop_natural_language_columns(X) Drops natural language columns from dataframes for the imputers. :param X: The dataframe that we want to impute on. :type X: pd.Dataframe :returns: the dataframe with any natural language columns dropped. list: list of all the columns that are considered natural language. :rtype: pd.Dataframe .. py:function:: estimator_unable_to_handle_nans(estimator_class) If True, provided estimator class is unable to handle NaN values as an input. :param estimator_class: Estimator class :type estimator_class: Estimator :raises ValueError: If estimator is not a valid estimator class. :returns: True if estimator class is unable to process NaN values, False otherwise. :rtype: bool .. py:function:: generate_component_code(element) Creates and returns a string that contains the Python imports and code required for running the EvalML component. :param element: The instance of the component to generate string Python code for. :type element: component instance :returns: String representation of Python code that can be run separately in order to recreate the component instance. Does not include code for custom component implementation. :raises ValueError: If the input element is not a component instance. .. rubric:: Examples >>> from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor >>> assert generate_component_code(DecisionTreeRegressor()) == "from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor\n\ndecisionTreeRegressor = DecisionTreeRegressor(**{'criterion': 'mse', 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0})" ... >>> from evalml.pipelines.components.transformers.imputers.simple_imputer import SimpleImputer >>> assert generate_component_code(SimpleImputer()) == "from evalml.pipelines.components.transformers.imputers.simple_imputer import SimpleImputer\n\nsimpleImputer = SimpleImputer(**{'impute_strategy': 'most_frequent', 'fill_value': None})" .. py:function:: get_estimators(problem_type, model_families=None) Returns the estimators allowed for a particular problem type. Can also optionally filter by a list of model types. :param problem_type: Problem type to filter for. :type problem_type: ProblemTypes or str :param model_families: Model families to filter for. :type model_families: list[ModelFamily] or list[str] :returns: A list of estimator subclasses. :rtype: list[class] :raises TypeError: If the model_families parameter is not a list. :raises RuntimeError: If a model family is not valid for the problem type. .. py:function:: handle_component_class(component_class) Standardizes input from a string name to a ComponentBase subclass if necessary. If a str is provided, will attempt to look up a ComponentBase class by that name and return a new instance. Otherwise if a ComponentBase subclass or Component instance is provided, will return that without modification. :param component_class: Input to be standardized. :type component_class: str, ComponentBase :returns: ComponentBase :raises ValueError: If input is not a valid component class. :raises MissingComponentError: If the component cannot be found. .. rubric:: Examples >>> from evalml.pipelines.components.estimators.regressors.decision_tree_regressor import DecisionTreeRegressor >>> handle_component_class(DecisionTreeRegressor) >>> handle_component_class("Random Forest Regressor") .. py:function:: make_balancing_dictionary(y, sampling_ratio) Makes dictionary for oversampler components. Find ratio of each class to the majority. If the ratio is smaller than the sampling_ratio, we want to oversample, otherwise, we don't want to sample at all, and we leave the data as is. :param y: Target data. :type y: pd.Series :param sampling_ratio: The balanced ratio we want the samples to meet. :type sampling_ratio: float :returns: Dictionary where keys are the classes, and the corresponding values are the counts of samples for each class that will satisfy sampling_ratio. :rtype: dict :raises ValueError: If sampling ratio is not in the range (0, 1] or the target is empty. .. rubric:: Examples >>> import pandas as pd >>> y = pd.Series([1] * 4 + [2] * 8 + [3]) >>> assert make_balancing_dictionary(y, 0.5) == {2: 8, 1: 4, 3: 4} >>> assert make_balancing_dictionary(y, 0.9) == {2: 8, 1: 7, 3: 7} >>> assert make_balancing_dictionary(y, 0.1) == {2: 8, 1: 4, 3: 1} .. py:function:: scikit_learn_wrapped_estimator(evalml_obj) Wraps an EvalML object as a scikit-learn estimator. .. py:function:: set_boolean_columns_to_categorical(X) Sets boolean columns to categorical for the imputer. :param X: The dataframe that we want to impute on. :type X: pd.Dataframe :returns: the dataframe with any of its ww columns that are boolean set to categorical. :rtype: pd.Dataframe .. py:class:: WrappedSKClassifier(pipeline) Scikit-learn classifier wrapper class. **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.utils.WrappedSKClassifier.fit evalml.pipelines.components.utils.WrappedSKClassifier.get_params evalml.pipelines.components.utils.WrappedSKClassifier.predict evalml.pipelines.components.utils.WrappedSKClassifier.predict_proba evalml.pipelines.components.utils.WrappedSKClassifier.score evalml.pipelines.components.utils.WrappedSKClassifier.set_params .. py:method:: fit(self, X, y) Fits component to data. :param X: The input training data of shape [n_samples, n_features]. :type X: pd.DataFrame or np.ndarray :param y: The target training data of length [n_samples]. :type y: pd.Series, optional :returns: self .. py:method:: get_params(self, deep=True) Get parameters for this estimator. :param deep: If True, will return the parameters for this estimator and contained subobjects that are estimators. :type deep: bool, default=True :returns: **params** -- Parameter names mapped to their values. :rtype: dict .. py:method:: predict(self, X) Make predictions using selected features. :param X: Features :type X: pd.DataFrame :returns: Predicted values. :rtype: np.ndarray .. py:method:: predict_proba(self, X) Make probability estimates for labels. :param X: Features. :type X: pd.DataFrame :returns: Probability estimates. :rtype: np.ndarray .. py:method:: score(self, X, y, sample_weight=None) Return the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. :param X: Test samples. :type X: array-like of shape (n_samples, n_features) :param y: True labels for `X`. :type y: array-like of shape (n_samples,) or (n_samples, n_outputs) :param sample_weight: Sample weights. :type sample_weight: array-like of shape (n_samples,), default=None :returns: **score** -- Mean accuracy of ``self.predict(X)`` wrt. `y`. :rtype: float .. py:method:: set_params(self, **params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :param \*\*params: Estimator parameters. :type \*\*params: dict :returns: **self** -- Estimator instance. :rtype: estimator instance .. py:class:: WrappedSKRegressor(pipeline) Scikit-learn regressor wrapper class. **Methods** .. autoapisummary:: :nosignatures: evalml.pipelines.components.utils.WrappedSKRegressor.fit evalml.pipelines.components.utils.WrappedSKRegressor.get_params evalml.pipelines.components.utils.WrappedSKRegressor.predict evalml.pipelines.components.utils.WrappedSKRegressor.score evalml.pipelines.components.utils.WrappedSKRegressor.set_params .. py:method:: fit(self, X, y) Fits component to data. :param X: the input training data of shape [n_samples, n_features] :type X: pd.DataFrame or np.ndarray :param y: the target training data of length [n_samples] :type y: pd.Series, optional :returns: self .. py:method:: get_params(self, deep=True) Get parameters for this estimator. :param deep: If True, will return the parameters for this estimator and contained subobjects that are estimators. :type deep: bool, default=True :returns: **params** -- Parameter names mapped to their values. :rtype: dict .. py:method:: predict(self, X) Make predictions using selected features. :param X: Features. :type X: pd.DataFrame :returns: Predicted values. :rtype: np.ndarray .. py:method:: score(self, X, y, sample_weight=None) Return the coefficient of determination of the prediction. The coefficient of determination :math:`R^2` is defined as :math:`(1 - \frac{u}{v})`, where :math:`u` is the residual sum of squares ``((y_true - y_pred)** 2).sum()`` and :math:`v` is the total sum of squares ``((y_true - y_true.mean()) ** 2).sum()``. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of `y`, disregarding the input features, would get a :math:`R^2` score of 0.0. :param X: Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape ``(n_samples, n_samples_fitted)``, where ``n_samples_fitted`` is the number of samples used in the fitting for the estimator. :type X: array-like of shape (n_samples, n_features) :param y: True values for `X`. :type y: array-like of shape (n_samples,) or (n_samples, n_outputs) :param sample_weight: Sample weights. :type sample_weight: array-like of shape (n_samples,), default=None :returns: **score** -- :math:`R^2` of ``self.predict(X)`` wrt. `y`. :rtype: float .. rubric:: Notes The :math:`R^2` score used when calling ``score`` on a regressor uses ``multioutput='uniform_average'`` from version 0.23 to keep consistent with default value of :func:`~sklearn.metrics.r2_score`. This influences the ``score`` method of all the multioutput regressors (except for :class:`~sklearn.multioutput.MultiOutputRegressor`). .. py:method:: set_params(self, **params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :param \*\*params: Estimator parameters. :type \*\*params: dict :returns: **self** -- Estimator instance. :rtype: estimator instance