coco_pipe.decoding.utils ======================== .. py:module:: coco_pipe.decoding.utils .. autoapi-nested-parse:: Decoding Utilities ================== Helper functions and classes for the decoding module, primarily focused on Cross-Validation (CV) strategy management. This module provides: - `get_cv_splitter`: A factory function to instantiate Scikit-Learn cross-validators from a Pydantic `CVConfig`. - `SimpleSplit`: A custom validator for a single train/test split. - `_CVWithGroups`: A wrapper to ensure group constraints are respected even when Scikit-Learn's `cross_val_score` internals might obscure them. Classes ------- .. autoapisummary:: coco_pipe.decoding.utils._CVWithGroups coco_pipe.decoding.utils.SimpleSplit Functions --------- .. autoapisummary:: coco_pipe.decoding.utils.get_cv_splitter coco_pipe.decoding.utils.get_scorer coco_pipe.decoding.utils.cross_validate_score Module Contents --------------- .. py:class:: _CVWithGroups(cv, groups) Bases: :py:obj:`sklearn.model_selection.BaseCrossValidator` Internal wrapper to bind specific groups to a CV splitter. This ensures that `.split(X, y)` always uses the strict `groups` provided at initialization, ignoring any groups passed at runtime. This is critical for preventing data leakage when complex grouping logic is defined upstream. :param cv: The underlying Scikit-Learn cross-validator (e.g., GroupKFold). :type cv: BaseCrossValidator :param groups: The group labels to enforce for all splits. :type groups: array-like .. py:attribute:: cv .. py:attribute:: groups .. py:method:: split(X, y=None, groups=None) Generate indices to split data into training and test set. :param X: Training data, where `n_samples` is the number of samples and `n_features` is the number of features. :type X: array-like of shape (n_samples, n_features) :param y: The target variable for supervised learning problems. :type y: array-like of shape (n_samples,) :param groups: Group labels for the samples used while splitting the dataset into train/test set. :type groups: array-like of shape (n_samples,), default=None :Yields: * **train** (*ndarray*) -- The training set indices for that split. * **test** (*ndarray*) -- The testing set indices for that split. .. py:method:: get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator. .. py:class:: SimpleSplit(test_size: float = 0.2, shuffle: bool = True, random_state: Optional[int] = None, stratify: Optional[Union[pandas.Series, numpy.ndarray]] = None) Bases: :py:obj:`sklearn.model_selection.BaseCrossValidator` A unified 1-fold CV strategy wrapping `train_test_split`. This allows "hold-out" validation to be treated as a Cross-Validation strategy with `n_splits=1`, integrating seamlessly into loops that expect a generator of indices. :param test_size: Proportion of the dataset to include in the test split. :type test_size: float, default=0.2 :param shuffle: Whether to shuffle the data before splitting. :type shuffle: bool, default=True :param random_state: Controls the shuffling applied to the data before applying the split. :type random_state: int, optional :param stratify: If not None, data is split in a stratified fashion, using this array as the class labels. :type stratify: array-like, optional .. py:attribute:: test_size :value: 0.2 .. py:attribute:: shuffle :value: True .. py:attribute:: random_state :value: None .. py:attribute:: stratify :value: None .. py:method:: split(X: Union[pandas.DataFrame, numpy.ndarray], y: Optional[Union[pandas.Series, numpy.ndarray]] = None, groups: Optional[Sequence] = None) Yield a single (train_index, test_index) tuple. .. py:method:: get_n_splits(X: Any = None, y: Any = None, groups: Any = None) -> int Always returns 1 split. .. py:function:: get_cv_splitter(config: coco_pipe.decoding.configs.CVConfig, groups: Optional[Sequence] = None) -> sklearn.model_selection.BaseCrossValidator Factory function to create a Scikit-Learn compliant cross-validator. Constructs the appropriate splitter based on the provided `CVConfig` strategy. If `groups` are provided, they are bound to the splitter using `_CVWithGroups` to guarantee consistent grouping across pipeline steps. :param config: Validated configuration object specifying: - strategy: 'stratified', 'kfold', 'group_kfold', 'leave_p_out', etc. - n_splits: Number of folds (where applicable). - shuffle: Whether to shuffle data (where applicable). - random_state: Seed for reproducibility. :type config: CVConfig :param groups: Group labels for the samples. Required for 'group_kfold', 'leave_p_out', and 'stratified_group_kfold'. If provided, the returned validator will ignore any groups passed to its `.split()` method and use these instead. :type groups: sequence, optional :returns: An initialized cross-validator instance. :rtype: BaseCrossValidator :raises ValueError: If an unknown CV strategy is specified or if required parameters (like n_groups for leave_p_out) are missing from the configuration. .. py:function:: get_scorer(name: str) -> Callable Retrieve or construct a Scikit-Learn compliant scorer by name. :param name: The name of the metric (e.g., 'accuracy', 'f1_macro', 'neg_mean_squared_error'). :type name: str :returns: A scoring function with signature `(y_true, y_pred) -> float`. :rtype: Callable :raises ValueError: If the metric name is unknown. .. py:function:: cross_validate_score(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: Sequence, *, groups: Optional[Sequence] = None, cv_config: Optional[coco_pipe.decoding.configs.CVConfig] = None, metric: str = 'balanced_accuracy', use_scaler: bool = False) -> float Compute one mean cross-validated score for an estimator. :param estimator: Estimator to fit inside each fold. :type estimator: BaseEstimator :param X: Input features with shape ``(n_samples, n_features)``. :type X: np.ndarray :param y: Target labels aligned with ``X``. :type y: sequence :param groups: Group labels aligned with ``X``. :type groups: sequence, optional :param cv_config: Cross-validation configuration. Defaults to a 5-fold stratified strategy, or 5-fold stratified-group strategy when groups are provided. :type cv_config: CVConfig, optional :param metric: Metric name resolved through :func:`get_scorer`. :type metric: str, default="balanced_accuracy" :param use_scaler: When ``True``, wraps the estimator in a ``StandardScaler`` pipeline. :type use_scaler: bool, default=False :returns: Mean cross-validated score. :rtype: float