coco_pipe.decoding.core
=======================

.. py:module:: coco_pipe.decoding.core

.. autoapi-nested-parse::

   Decoding Core
   =============
   This module is responsible for:
   1. Orchestrating the Cross-Validation loop.
   2. Managing Estimator lifecycles (instantiation, fitting, prediction).
   3. Computing metrics dynamically based on task type.
   4. Aggregating results for downstream analysis.


Attributes
----------

.. autoapisummary::

   coco_pipe.decoding.core.logger


Classes
-------

.. autoapisummary::

   coco_pipe.decoding.core.Experiment
   coco_pipe.decoding.core.ExperimentResult


Module Contents
---------------

.. py:data:: logger

.. py:class:: Experiment(config: coco_pipe.decoding.configs.ExperimentConfig)

   Main executor for decoding experiments.

   :param config: The complete configuration for the experiment.
   :type config: ExperimentConfig


   .. py:attribute:: config


   .. py:attribute:: results
      :type:  Dict[str, Any]


   .. py:method:: _validate_config()

      Perform comprehensive runtime validation of the configuration.

      Logic
      -----
      1. **Tuning Consistency**: Warns if `tuning.enabled` but no `grids`
         are provided.
      2. **Task vs Metrics**: Checks if metrics match the task (e.g. no 'accuracy'
         for regression). Raises ValueError if incompatible.
      3. **Task vs CV**: Checks if CV strategy matches task (e.g. no 'stratified'
         for regression). Raises ValueError if incompatible.
      4. **Task vs Model**: Heuristic check for model type (e.g. no Regressor for
         Classification). Raises ValueError if incompatible.

      :raises ValueError: If configuration contains incompatible settings.


   .. py:method:: _prepare_estimator(model_name: str, model_config: Any) -> sklearn.base.BaseEstimator

      Orchestrate the creation of the full Estimator Pipeline.

      Steps
      -----
      1. **Instantiation**: Calls `_instantiate_model` to get the base estimator
         (handling recursion).
      2. **Scaling**: If `use_scaler=True`, prepends a StandardScaler.
      3. **Feature Selection**: If enabled, prepends the FS step (Filter or Wrapper).
      4. **Pipeline Construction**: wraps steps in `sklearn.pipeline.Pipeline`.
         - Enables caching if FS + Tuning are both active.
      5. **Tuning Wrapper**: If tuning is enabled for this model, wraps the Pipeline
         in GridSearchCV/RandomizedSearchCV via `_wrap_with_tuning`.

      :param model_name: Friendly name from config (used for grid lookup).
      :type model_name: str
      :param model_config: Pydantic configuration object for the model.
      :type model_config: Any

      :returns: Final ready-to-run estimator (Pipeline or SearchCV).
      :rtype: BaseEstimator


   .. py:method:: _instantiate_model(name: str, config: Any) -> sklearn.base.BaseEstimator

      Instantiate a raw estimator from its configuration object.

      Logic
      -----
      1. **Registry Lookup**: Resolves class from `config.method`.
      2. **Recursion**: If config implies a meta-estimator (has `base_estimator`),
         recursively calls `_prepare_estimator` for the child.
      3. **Parameter Injection**: passed config fields as kwargs to `__init__`.
         - Automatically filters out invalid parameters if `TypeError` occurs
           (robustness for mismatched config/class versions).

      :returns: The instantiated model (e.g., LogisticRegression instance) without pipeline
                wrappers.
      :rtype: BaseEstimator


   .. py:method:: _create_fs_step(estimator: sklearn.base.BaseEstimator) -> Optional[tuple]

      Create a Feature Selection step for the pipeline.

      Logic
      -----
      - **Filter (k_best)**: Fast. selected before training the classifier based on
        statistical test. No inner CV loop required.
      - **Wrapper (sfs)**: Slow but accurate. Wraps the estimator in a
        SequentialFeatureSelector. This runs an **Inner CV Loop**
        (size = config.feature_selection.cv) to validate feature subsets.

      If used inside Hyperparameter Tuning, this step is part of the Pipeline,
      ensuring features are re-selected for every fold and every parameter
      combination (Nested Simplification).

      :returns: ("fs", Transformer) step for sklearn Pipeline.
      :rtype: tuple or None


   .. py:method:: _wrap_with_tuning(estimator: sklearn.base.BaseEstimator, name: str) -> sklearn.base.BaseEstimator

      Wrap the estimator (or pipeline) in a Hyperparameter Search object.

      This implements **Nested Cross-Validation** (Middle Loop):
      1. **Input**: A Pipeline (Scaler + FS + Classifier).
      2. **Search**: Creates a GridSearchCV / RandomizedSearchCV.
      3. **Process**:
         - For each fold of the *tuning* CV (defined by config.cv):
           - Train the Pipeline (including FS!) on the tuning train set.
           - Evaluate on the tuning validation set.
         - Finds the best (Hyperparameters + Features) combination.
         - Refits on the entire training set provided by the Outer Loop.

      This ensures simultaneous optimization of Preprocessing (FS) and Modeling
      parameters.


   .. py:method:: run(X: Union[pandas.DataFrame, numpy.ndarray], y: Union[pandas.Series, numpy.ndarray], groups: Optional[Union[pandas.Series, numpy.ndarray]] = None) -> ExperimentResult

      Execute the full experiment pipeline.

      This is the main entry point. It orchestrates:
      1. **Data Validation**: Checks input shapes and types.
      2. **Model Loop**: Iterates through all configured models.
      3. **Preparation**: Instantiates models -> Builds Pipelines (Scaler/FS) ->
         Wraps in Tuning.
      4. **Validation**: Runs the Outer Cross-Validation loop (optionally
         parallelized).
      5. **Aggregation**: Collects scores, predictions, and importances.

      :param X: Training data (2D) or Time-Series data (3D).
      :type X: array-like of shape (n_samples, n_features)
      :param y: Target labels or values.
      :type y: array-like of shape (n_samples,) or (n_samples, n_targets)
      :param groups: Group labels for splitting (e.g., subject-specific splits).
      :type groups: array-like of shape (n_samples,), optional

      :returns: Object containing results with methods to export to Tidy DataFrames.
      :rtype: ExperimentResult


   .. py:method:: save_results(path: Optional[Union[str, pathlib.Path]] = None)

      Serialize results, configuration, and metadata to disk.

      :param path: Path to save the results. If None, uses config.output_dir.
                   If both are None, raises ValueError.
      :type path: str or Path, optional


   .. py:method:: load_results(path: Union[str, pathlib.Path]) -> ExperimentResult
      :staticmethod:


      Load a saved experiment payload and wrap it in ExperimentResult.

      :returns: The loaded results wrapper.
      :rtype: ExperimentResult


   .. py:method:: _cross_validate(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: numpy.ndarray, groups: Optional[numpy.ndarray]) -> Dict[str, Any]

      Execute the Outer Cross-Validation Loop (Evaluation).

      This is the **Level 1 (Top Level)** Splits:
      - Splits the entire dataset into K folds (defined by config.cv).
      - For each fold:
        1. **Training Data**: 80% (if 5-fold). Passed to `estimator.fit()`.
           - If `estimator` is a GridSearch (Tuning Enabled), it will internally split
             this 80% again for validation (Level 2 Split).
        2. **Test Data**: 20%. Used strictly for final `estimator.predict()`
           evaluation.

      Parallelization
      ---------------
      If `config.n_jobs > 1`, these folds run in parallel processes to speed up
      execution.


   .. py:method:: _fit_and_score_fold(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: numpy.ndarray, train_idx: numpy.ndarray, test_idx: numpy.ndarray) -> Dict[str, Any]

      Execute a single Cross-Validation fold: Fit, Predict, and Score.

      Optimized for:
      - **Standard Estimators**: (N, F) input -> (N,) output.
      - **Sliding Estimators**: (N, F, T) input -> (N, T) output (Diagonal Decoding).

      :returns: Contains 'test_idx', 'preds' (y_pred, y_true, y_proba),
                'scores' (dict of metric values), and 'importance'.
      :rtype: dict


   .. py:method:: _extract_metadata(estimator: sklearn.base.BaseEstimator) -> Dict[str, Any]
      :staticmethod:


      Extract training metadata like best Hyperparameters and Selected Features.


   .. py:method:: _compute_metric_safe(scorer, y_true, y_est, is_multiclass, is_proba=False)
      :staticmethod:


      Compute metric handling standard and temporal (diagonal) shapes.

      Shapes Handled
      --------------
      - **Standard**: y_est is (N,) or (N, C)
      - **Generalizing (Matrix)**:
        - y_pred: (N, T_train, T_test) -> Score each (T_train, T_test) pair.
        - y_proba: (N, C, T_train, T_test) -> Score each (T_train, T_test) pair.


   .. py:method:: _force_serial_execution(estimator: sklearn.base.BaseEstimator) -> sklearn.base.BaseEstimator

      Recursively set n_jobs=1 for the estimator and its sub-components.
      Used when the outer loop is already parallelized to avoid oversubscription.


   .. py:method:: _extract_feature_importances(estimator: sklearn.base.BaseEstimator) -> Optional[numpy.ndarray]
      :staticmethod:


      Extract feature importances or coefficients from a fitted estimator.
      Handles Pipelines and Feature Selection.


.. py:class:: ExperimentResult(raw_results: Dict[str, Any])

   Unified Container for Experiment Results.
   Provides Tidy Data views for easier analysis.


   .. py:attribute:: raw


   .. py:method:: summary() -> pandas.DataFrame

      Get a high-level summary of performance (Mean/Std across folds).

      :returns: Index: Model Name
                Columns: Metric Mean/Std
      :rtype: pd.DataFrame


   .. py:method:: get_detailed_scores() -> pandas.DataFrame

      Get fold-level scores for all models in long format.

      :returns: Columns: Model, Fold, Metric, Value
      :rtype: pd.DataFrame


   .. py:method:: get_predictions() -> pandas.DataFrame

      Get concatenated predictions for all models.

      :returns: Columns: Model, Fold, y_true, y_pred, (y_proba if available)
      :rtype: pd.DataFrame


   .. py:method:: get_best_params() -> pandas.DataFrame

      Get the best hyperparameters selected per fold (if Tuning was enabled).

      :returns: Columns: Model, Fold, Param, Value
      :rtype: pd.DataFrame


   .. py:method:: get_feature_stability() -> pandas.DataFrame

      Analyze feature selection stability across folds.

      :returns: Index: Feature Index/Name
                Columns: Selection Frequency (0.0 - 1.0)
      :rtype: pd.DataFrame


   .. py:method:: get_generalization_matrix(metric: str = None) -> pandas.DataFrame

      Get Generalization Matrix (Train Time x Test Time) averaged across folds.

      :param metric: The metric to retrieve (e.g., 'accuracy', 'roc_auc').
                     Defaults to the first metric found in results.
      :type metric: str, optional

      :returns: Index: Train Time
                Columns: Test Time
                Values: Average Score
      :rtype: pd.DataFrame