coco_pipe.decoding.core ======================= .. py:module:: coco_pipe.decoding.core .. autoapi-nested-parse:: Decoding Core ============= This module is responsible for: 1. Orchestrating the Cross-Validation loop. 2. Managing Estimator lifecycles (instantiation, fitting, prediction). 3. Computing metrics dynamically based on task type. 4. Aggregating results for downstream analysis. Attributes ---------- .. autoapisummary:: coco_pipe.decoding.core.logger Classes ------- .. autoapisummary:: coco_pipe.decoding.core.Experiment coco_pipe.decoding.core.ExperimentResult Module Contents --------------- .. py:data:: logger .. py:class:: Experiment(config: coco_pipe.decoding.configs.ExperimentConfig) Main executor for decoding experiments. :param config: The complete configuration for the experiment. :type config: ExperimentConfig .. py:attribute:: config .. py:attribute:: results :type: Dict[str, Any] .. py:method:: _validate_config() Perform comprehensive runtime validation of the configuration. Logic ----- 1. **Tuning Consistency**: Warns if `tuning.enabled` but no `grids` are provided. 2. **Task vs Metrics**: Checks if metrics match the task (e.g. no 'accuracy' for regression). Raises ValueError if incompatible. 3. **Task vs CV**: Checks if CV strategy matches task (e.g. no 'stratified' for regression). Raises ValueError if incompatible. 4. **Task vs Model**: Heuristic check for model type (e.g. no Regressor for Classification). Raises ValueError if incompatible. :raises ValueError: If configuration contains incompatible settings. .. py:method:: _prepare_estimator(model_name: str, model_config: Any) -> sklearn.base.BaseEstimator Orchestrate the creation of the full Estimator Pipeline. Steps ----- 1. **Instantiation**: Calls `_instantiate_model` to get the base estimator (handling recursion). 2. **Scaling**: If `use_scaler=True`, prepends a StandardScaler. 3. **Feature Selection**: If enabled, prepends the FS step (Filter or Wrapper). 4. **Pipeline Construction**: wraps steps in `sklearn.pipeline.Pipeline`. - Enables caching if FS + Tuning are both active. 5. **Tuning Wrapper**: If tuning is enabled for this model, wraps the Pipeline in GridSearchCV/RandomizedSearchCV via `_wrap_with_tuning`. :param model_name: Friendly name from config (used for grid lookup). :type model_name: str :param model_config: Pydantic configuration object for the model. :type model_config: Any :returns: Final ready-to-run estimator (Pipeline or SearchCV). :rtype: BaseEstimator .. py:method:: _instantiate_model(name: str, config: Any) -> sklearn.base.BaseEstimator Instantiate a raw estimator from its configuration object. Logic ----- 1. **Registry Lookup**: Resolves class from `config.method`. 2. **Recursion**: If config implies a meta-estimator (has `base_estimator`), recursively calls `_prepare_estimator` for the child. 3. **Parameter Injection**: passed config fields as kwargs to `__init__`. - Automatically filters out invalid parameters if `TypeError` occurs (robustness for mismatched config/class versions). :returns: The instantiated model (e.g., LogisticRegression instance) without pipeline wrappers. :rtype: BaseEstimator .. py:method:: _create_fs_step(estimator: sklearn.base.BaseEstimator) -> Optional[tuple] Create a Feature Selection step for the pipeline. Logic ----- - **Filter (k_best)**: Fast. selected before training the classifier based on statistical test. No inner CV loop required. - **Wrapper (sfs)**: Slow but accurate. Wraps the estimator in a SequentialFeatureSelector. This runs an **Inner CV Loop** (size = config.feature_selection.cv) to validate feature subsets. If used inside Hyperparameter Tuning, this step is part of the Pipeline, ensuring features are re-selected for every fold and every parameter combination (Nested Simplification). :returns: ("fs", Transformer) step for sklearn Pipeline. :rtype: tuple or None .. py:method:: _wrap_with_tuning(estimator: sklearn.base.BaseEstimator, name: str) -> sklearn.base.BaseEstimator Wrap the estimator (or pipeline) in a Hyperparameter Search object. This implements **Nested Cross-Validation** (Middle Loop): 1. **Input**: A Pipeline (Scaler + FS + Classifier). 2. **Search**: Creates a GridSearchCV / RandomizedSearchCV. 3. **Process**: - For each fold of the *tuning* CV (defined by config.cv): - Train the Pipeline (including FS!) on the tuning train set. - Evaluate on the tuning validation set. - Finds the best (Hyperparameters + Features) combination. - Refits on the entire training set provided by the Outer Loop. This ensures simultaneous optimization of Preprocessing (FS) and Modeling parameters. .. py:method:: run(X: Union[pandas.DataFrame, numpy.ndarray], y: Union[pandas.Series, numpy.ndarray], groups: Optional[Union[pandas.Series, numpy.ndarray]] = None) -> ExperimentResult Execute the full experiment pipeline. This is the main entry point. It orchestrates: 1. **Data Validation**: Checks input shapes and types. 2. **Model Loop**: Iterates through all configured models. 3. **Preparation**: Instantiates models -> Builds Pipelines (Scaler/FS) -> Wraps in Tuning. 4. **Validation**: Runs the Outer Cross-Validation loop (optionally parallelized). 5. **Aggregation**: Collects scores, predictions, and importances. :param X: Training data (2D) or Time-Series data (3D). :type X: array-like of shape (n_samples, n_features) :param y: Target labels or values. :type y: array-like of shape (n_samples,) or (n_samples, n_targets) :param groups: Group labels for splitting (e.g., subject-specific splits). :type groups: array-like of shape (n_samples,), optional :returns: Object containing results with methods to export to Tidy DataFrames. :rtype: ExperimentResult .. py:method:: save_results(path: Optional[Union[str, pathlib.Path]] = None) Serialize results, configuration, and metadata to disk. :param path: Path to save the results. If None, uses config.output_dir. If both are None, raises ValueError. :type path: str or Path, optional .. py:method:: load_results(path: Union[str, pathlib.Path]) -> ExperimentResult :staticmethod: Load a saved experiment payload and wrap it in ExperimentResult. :returns: The loaded results wrapper. :rtype: ExperimentResult .. py:method:: _cross_validate(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: numpy.ndarray, groups: Optional[numpy.ndarray]) -> Dict[str, Any] Execute the Outer Cross-Validation Loop (Evaluation). This is the **Level 1 (Top Level)** Splits: - Splits the entire dataset into K folds (defined by config.cv). - For each fold: 1. **Training Data**: 80% (if 5-fold). Passed to `estimator.fit()`. - If `estimator` is a GridSearch (Tuning Enabled), it will internally split this 80% again for validation (Level 2 Split). 2. **Test Data**: 20%. Used strictly for final `estimator.predict()` evaluation. Parallelization --------------- If `config.n_jobs > 1`, these folds run in parallel processes to speed up execution. .. py:method:: _fit_and_score_fold(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: numpy.ndarray, train_idx: numpy.ndarray, test_idx: numpy.ndarray) -> Dict[str, Any] Execute a single Cross-Validation fold: Fit, Predict, and Score. Optimized for: - **Standard Estimators**: (N, F) input -> (N,) output. - **Sliding Estimators**: (N, F, T) input -> (N, T) output (Diagonal Decoding). :returns: Contains 'test_idx', 'preds' (y_pred, y_true, y_proba), 'scores' (dict of metric values), and 'importance'. :rtype: dict .. py:method:: _extract_metadata(estimator: sklearn.base.BaseEstimator) -> Dict[str, Any] :staticmethod: Extract training metadata like best Hyperparameters and Selected Features. .. py:method:: _compute_metric_safe(scorer, y_true, y_est, is_multiclass, is_proba=False) :staticmethod: Compute metric handling standard and temporal (diagonal) shapes. Shapes Handled -------------- - **Standard**: y_est is (N,) or (N, C) - **Generalizing (Matrix)**: - y_pred: (N, T_train, T_test) -> Score each (T_train, T_test) pair. - y_proba: (N, C, T_train, T_test) -> Score each (T_train, T_test) pair. .. py:method:: _force_serial_execution(estimator: sklearn.base.BaseEstimator) -> sklearn.base.BaseEstimator Recursively set n_jobs=1 for the estimator and its sub-components. Used when the outer loop is already parallelized to avoid oversubscription. .. py:method:: _extract_feature_importances(estimator: sklearn.base.BaseEstimator) -> Optional[numpy.ndarray] :staticmethod: Extract feature importances or coefficients from a fitted estimator. Handles Pipelines and Feature Selection. .. py:class:: ExperimentResult(raw_results: Dict[str, Any]) Unified Container for Experiment Results. Provides Tidy Data views for easier analysis. .. py:attribute:: raw .. py:method:: summary() -> pandas.DataFrame Get a high-level summary of performance (Mean/Std across folds). :returns: Index: Model Name Columns: Metric Mean/Std :rtype: pd.DataFrame .. py:method:: get_detailed_scores() -> pandas.DataFrame Get fold-level scores for all models in long format. :returns: Columns: Model, Fold, Metric, Value :rtype: pd.DataFrame .. py:method:: get_predictions() -> pandas.DataFrame Get concatenated predictions for all models. :returns: Columns: Model, Fold, y_true, y_pred, (y_proba if available) :rtype: pd.DataFrame .. py:method:: get_best_params() -> pandas.DataFrame Get the best hyperparameters selected per fold (if Tuning was enabled). :returns: Columns: Model, Fold, Param, Value :rtype: pd.DataFrame .. py:method:: get_feature_stability() -> pandas.DataFrame Analyze feature selection stability across folds. :returns: Index: Feature Index/Name Columns: Selection Frequency (0.0 - 1.0) :rtype: pd.DataFrame .. py:method:: get_generalization_matrix(metric: str = None) -> pandas.DataFrame Get Generalization Matrix (Train Time x Test Time) averaged across folds. :param metric: The metric to retrieve (e.g., 'accuracy', 'roc_auc'). Defaults to the first metric found in results. :type metric: str, optional :returns: Index: Train Time Columns: Test Time Values: Average Score :rtype: pd.DataFrame