coco_pipe.decoding.core¶
This module is responsible for: 1. Orchestrating the Cross-Validation loop. 2. Managing Estimator lifecycles (instantiation, fitting, prediction). 3. Computing metrics dynamically based on task type. 4. Aggregating results for downstream analysis.
Attributes¶
Classes¶
Main executor for decoding experiments. |
|
Unified Container for Experiment Results. |
Module Contents¶
- coco_pipe.decoding.core.logger¶
- class coco_pipe.decoding.core.Experiment(config: coco_pipe.decoding.configs.ExperimentConfig)[source]¶
Main executor for decoding experiments.
- Parameters:
config (ExperimentConfig) – The complete configuration for the experiment.
- config¶
- results: Dict[str, Any]¶
- _validate_config()[source]¶
Perform comprehensive runtime validation of the configuration.
Logic¶
Tuning Consistency: Warns if tuning.enabled but no grids are provided.
Task vs Metrics: Checks if metrics match the task (e.g. no ‘accuracy’ for regression). Raises ValueError if incompatible.
Task vs CV: Checks if CV strategy matches task (e.g. no ‘stratified’ for regression). Raises ValueError if incompatible.
Task vs Model: Heuristic check for model type (e.g. no Regressor for Classification). Raises ValueError if incompatible.
- raises ValueError:
If configuration contains incompatible settings.
- _prepare_estimator(model_name: str, model_config: Any) sklearn.base.BaseEstimator[source]¶
Orchestrate the creation of the full Estimator Pipeline.
Steps¶
Instantiation: Calls _instantiate_model to get the base estimator (handling recursion).
Scaling: If use_scaler=True, prepends a StandardScaler.
Feature Selection: If enabled, prepends the FS step (Filter or Wrapper).
Pipeline Construction: wraps steps in sklearn.pipeline.Pipeline. - Enables caching if FS + Tuning are both active.
Tuning Wrapper: If tuning is enabled for this model, wraps the Pipeline in GridSearchCV/RandomizedSearchCV via _wrap_with_tuning.
- param model_name:
Friendly name from config (used for grid lookup).
- type model_name:
str
- param model_config:
Pydantic configuration object for the model.
- type model_config:
Any
- returns:
Final ready-to-run estimator (Pipeline or SearchCV).
- rtype:
BaseEstimator
- _instantiate_model(name: str, config: Any) sklearn.base.BaseEstimator[source]¶
Instantiate a raw estimator from its configuration object.
Logic¶
Registry Lookup: Resolves class from config.method.
Recursion: If config implies a meta-estimator (has base_estimator), recursively calls _prepare_estimator for the child.
Parameter Injection: passed config fields as kwargs to __init__. - Automatically filters out invalid parameters if TypeError occurs
(robustness for mismatched config/class versions).
- returns:
The instantiated model (e.g., LogisticRegression instance) without pipeline wrappers.
- rtype:
BaseEstimator
- _create_fs_step(estimator: sklearn.base.BaseEstimator) tuple | None[source]¶
Create a Feature Selection step for the pipeline.
Logic¶
Filter (k_best): Fast. selected before training the classifier based on statistical test. No inner CV loop required.
Wrapper (sfs): Slow but accurate. Wraps the estimator in a SequentialFeatureSelector. This runs an Inner CV Loop (size = config.feature_selection.cv) to validate feature subsets.
If used inside Hyperparameter Tuning, this step is part of the Pipeline, ensuring features are re-selected for every fold and every parameter combination (Nested Simplification).
- returns:
(“fs”, Transformer) step for sklearn Pipeline.
- rtype:
tuple or None
- _wrap_with_tuning(estimator: sklearn.base.BaseEstimator, name: str) sklearn.base.BaseEstimator[source]¶
Wrap the estimator (or pipeline) in a Hyperparameter Search object.
This implements Nested Cross-Validation (Middle Loop): 1. Input: A Pipeline (Scaler + FS + Classifier). 2. Search: Creates a GridSearchCV / RandomizedSearchCV. 3. Process:
For each fold of the tuning CV (defined by config.cv): - Train the Pipeline (including FS!) on the tuning train set. - Evaluate on the tuning validation set.
Finds the best (Hyperparameters + Features) combination.
Refits on the entire training set provided by the Outer Loop.
This ensures simultaneous optimization of Preprocessing (FS) and Modeling parameters.
- run(X: pandas.DataFrame | numpy.ndarray, y: pandas.Series | numpy.ndarray, groups: pandas.Series | numpy.ndarray | None = None) ExperimentResult[source]¶
Execute the full experiment pipeline.
This is the main entry point. It orchestrates: 1. Data Validation: Checks input shapes and types. 2. Model Loop: Iterates through all configured models. 3. Preparation: Instantiates models -> Builds Pipelines (Scaler/FS) ->
Wraps in Tuning.
Validation: Runs the Outer Cross-Validation loop (optionally parallelized).
Aggregation: Collects scores, predictions, and importances.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data (2D) or Time-Series data (3D).
y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target labels or values.
groups (array-like of shape (n_samples,), optional) – Group labels for splitting (e.g., subject-specific splits).
- Returns:
Object containing results with methods to export to Tidy DataFrames.
- Return type:
- save_results(path: str | pathlib.Path | None = None)[source]¶
Serialize results, configuration, and metadata to disk.
- Parameters:
path (str or Path, optional) – Path to save the results. If None, uses config.output_dir. If both are None, raises ValueError.
- static load_results(path: str | pathlib.Path) ExperimentResult[source]¶
Load a saved experiment payload and wrap it in ExperimentResult.
- Returns:
The loaded results wrapper.
- Return type:
- _cross_validate(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: numpy.ndarray, groups: numpy.ndarray | None) Dict[str, Any][source]¶
Execute the Outer Cross-Validation Loop (Evaluation).
This is the Level 1 (Top Level) Splits: - Splits the entire dataset into K folds (defined by config.cv). - For each fold:
Training Data: 80% (if 5-fold). Passed to estimator.fit(). - If estimator is a GridSearch (Tuning Enabled), it will internally split
this 80% again for validation (Level 2 Split).
Test Data: 20%. Used strictly for final estimator.predict() evaluation.
Parallelization¶
If config.n_jobs > 1, these folds run in parallel processes to speed up execution.
- _fit_and_score_fold(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: numpy.ndarray, train_idx: numpy.ndarray, test_idx: numpy.ndarray) Dict[str, Any][source]¶
Execute a single Cross-Validation fold: Fit, Predict, and Score.
Optimized for: - Standard Estimators: (N, F) input -> (N,) output. - Sliding Estimators: (N, F, T) input -> (N, T) output (Diagonal Decoding).
- Returns:
Contains ‘test_idx’, ‘preds’ (y_pred, y_true, y_proba), ‘scores’ (dict of metric values), and ‘importance’.
- Return type:
dict
- static _extract_metadata(estimator: sklearn.base.BaseEstimator) Dict[str, Any][source]¶
Extract training metadata like best Hyperparameters and Selected Features.
- static _compute_metric_safe(scorer, y_true, y_est, is_multiclass, is_proba=False)[source]¶
Compute metric handling standard and temporal (diagonal) shapes.
Shapes Handled¶
Standard: y_est is (N,) or (N, C)
Generalizing (Matrix): - y_pred: (N, T_train, T_test) -> Score each (T_train, T_test) pair. - y_proba: (N, C, T_train, T_test) -> Score each (T_train, T_test) pair.
- class coco_pipe.decoding.core.ExperimentResult(raw_results: Dict[str, Any])[source]¶
Unified Container for Experiment Results. Provides Tidy Data views for easier analysis.
- raw¶
- summary() pandas.DataFrame[source]¶
Get a high-level summary of performance (Mean/Std across folds).
- Returns:
Index: Model Name Columns: Metric Mean/Std
- Return type:
pd.DataFrame
- get_detailed_scores() pandas.DataFrame[source]¶
Get fold-level scores for all models in long format.
- Returns:
Columns: Model, Fold, Metric, Value
- Return type:
pd.DataFrame
- get_predictions() pandas.DataFrame[source]¶
Get concatenated predictions for all models.
- Returns:
Columns: Model, Fold, y_true, y_pred, (y_proba if available)
- Return type:
pd.DataFrame
- get_best_params() pandas.DataFrame[source]¶
Get the best hyperparameters selected per fold (if Tuning was enabled).
- Returns:
Columns: Model, Fold, Param, Value
- Return type:
pd.DataFrame
- get_feature_stability() pandas.DataFrame[source]¶
Analyze feature selection stability across folds.
- Returns:
Index: Feature Index/Name Columns: Selection Frequency (0.0 - 1.0)
- Return type:
pd.DataFrame
- get_generalization_matrix(metric: str = None) pandas.DataFrame[source]¶
Get Generalization Matrix (Train Time x Test Time) averaged across folds.
- Parameters:
metric (str, optional) – The metric to retrieve (e.g., ‘accuracy’, ‘roc_auc’). Defaults to the first metric found in results.
- Returns:
Index: Train Time Columns: Test Time Values: Average Score
- Return type:
pd.DataFrame