coco_pipe.decoding.utils
========================

.. py:module:: coco_pipe.decoding.utils

.. autoapi-nested-parse::

   Decoding Utilities
   ==================

   Helper functions and classes for the decoding module, primarily focused on
   Cross-Validation (CV) strategy management.

   This module provides:
   - `get_cv_splitter`: A factory function to instantiate Scikit-Learn cross-validators
     from a Pydantic `CVConfig`.
   - `SimpleSplit`: A custom validator for a single train/test split.
   - `_CVWithGroups`: A wrapper to ensure group constraints are respected even when
     Scikit-Learn's `cross_val_score` internals might obscure them.


Classes
-------

.. autoapisummary::

   coco_pipe.decoding.utils._CVWithGroups
   coco_pipe.decoding.utils.SimpleSplit


Functions
---------

.. autoapisummary::

   coco_pipe.decoding.utils.get_cv_splitter
   coco_pipe.decoding.utils.get_scorer
   coco_pipe.decoding.utils.cross_validate_score


Module Contents
---------------

.. py:class:: _CVWithGroups(cv, groups)

   Bases: :py:obj:`sklearn.model_selection.BaseCrossValidator`


   Internal wrapper to bind specific groups to a CV splitter.

   This ensures that `.split(X, y)` always uses the strict `groups` provided
   at initialization, ignoring any groups passed at runtime. This is critical
   for preventing data leakage when complex grouping logic is defined upstream.

   :param cv: The underlying Scikit-Learn cross-validator (e.g., GroupKFold).
   :type cv: BaseCrossValidator
   :param groups: The group labels to enforce for all splits.
   :type groups: array-like


   .. py:attribute:: cv


   .. py:attribute:: groups


   .. py:method:: split(X, y=None, groups=None)

      Generate indices to split data into training and test set.

      :param X: Training data, where `n_samples` is the number of samples
                and `n_features` is the number of features.
      :type X: array-like of shape (n_samples, n_features)
      :param y: The target variable for supervised learning problems.
      :type y: array-like of shape (n_samples,)
      :param groups: Group labels for the samples used while splitting the dataset into
                     train/test set.
      :type groups: array-like of shape (n_samples,), default=None

      :Yields: * **train** (*ndarray*) -- The training set indices for that split.
               * **test** (*ndarray*) -- The testing set indices for that split.


   .. py:method:: get_n_splits(X=None, y=None, groups=None)

      Returns the number of splitting iterations in the cross-validator.


.. py:class:: SimpleSplit(test_size: float = 0.2, shuffle: bool = True, random_state: Optional[int] = None, stratify: Optional[Union[pandas.Series, numpy.ndarray]] = None)

   Bases: :py:obj:`sklearn.model_selection.BaseCrossValidator`


   A unified 1-fold CV strategy wrapping `train_test_split`.

   This allows "hold-out" validation to be treated as a Cross-Validation
   strategy with `n_splits=1`, integrating seamlessly into loops that
   expect a generator of indices.

   :param test_size: Proportion of the dataset to include in the test split.
   :type test_size: float, default=0.2
   :param shuffle: Whether to shuffle the data before splitting.
   :type shuffle: bool, default=True
   :param random_state: Controls the shuffling applied to the data before applying the split.
   :type random_state: int, optional
   :param stratify: If not None, data is split in a stratified fashion, using this array
                    as the class labels.
   :type stratify: array-like, optional


   .. py:attribute:: test_size
      :value: 0.2


   .. py:attribute:: shuffle
      :value: True


   .. py:attribute:: random_state
      :value: None


   .. py:attribute:: stratify
      :value: None


   .. py:method:: split(X: Union[pandas.DataFrame, numpy.ndarray], y: Optional[Union[pandas.Series, numpy.ndarray]] = None, groups: Optional[Sequence] = None)

      Yield a single (train_index, test_index) tuple.


   .. py:method:: get_n_splits(X: Any = None, y: Any = None, groups: Any = None) -> int

      Always returns 1 split.


.. py:function:: get_cv_splitter(config: coco_pipe.decoding.configs.CVConfig, groups: Optional[Sequence] = None) -> sklearn.model_selection.BaseCrossValidator

   Factory function to create a Scikit-Learn compliant cross-validator.

   Constructs the appropriate splitter based on the provided `CVConfig` strategy.
   If `groups` are provided, they are bound to the splitter using `_CVWithGroups`
   to guarantee consistent grouping across pipeline steps.

   :param config: Validated configuration object specifying:
                  - strategy: 'stratified', 'kfold', 'group_kfold', 'leave_p_out', etc.
                  - n_splits: Number of folds (where applicable).
                  - shuffle: Whether to shuffle data (where applicable).
                  - random_state: Seed for reproducibility.
   :type config: CVConfig
   :param groups: Group labels for the samples. Required for 'group_kfold', 'leave_p_out',
                  and 'stratified_group_kfold'.
                  If provided, the returned validator will ignore any groups passed to its
                  `.split()` method and use these instead.
   :type groups: sequence, optional

   :returns: An initialized cross-validator instance.
   :rtype: BaseCrossValidator

   :raises ValueError: If an unknown CV strategy is specified or if required parameters (like
       n_groups for leave_p_out) are missing from the configuration.


.. py:function:: get_scorer(name: str) -> Callable

   Retrieve or construct a Scikit-Learn compliant scorer by name.

   :param name: The name of the metric (e.g., 'accuracy', 'f1_macro', 'neg_mean_squared_error').
   :type name: str

   :returns: A scoring function with signature `(y_true, y_pred) -> float`.
   :rtype: Callable

   :raises ValueError: If the metric name is unknown.


.. py:function:: cross_validate_score(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: Sequence, *, groups: Optional[Sequence] = None, cv_config: Optional[coco_pipe.decoding.configs.CVConfig] = None, metric: str = 'balanced_accuracy', use_scaler: bool = False) -> float

   Compute one mean cross-validated score for an estimator.

   :param estimator: Estimator to fit inside each fold.
   :type estimator: BaseEstimator
   :param X: Input features with shape ``(n_samples, n_features)``.
   :type X: np.ndarray
   :param y: Target labels aligned with ``X``.
   :type y: sequence
   :param groups: Group labels aligned with ``X``.
   :type groups: sequence, optional
   :param cv_config: Cross-validation configuration. Defaults to a 5-fold stratified
                     strategy, or 5-fold stratified-group strategy when groups are
                     provided.
   :type cv_config: CVConfig, optional
   :param metric: Metric name resolved through :func:`get_scorer`.
   :type metric: str, default="balanced_accuracy"
   :param use_scaler: When ``True``, wraps the estimator in a ``StandardScaler`` pipeline.
   :type use_scaler: bool, default=False

   :returns: Mean cross-validated score.
   :rtype: float