coco_pipe.decoding¶
Submodules¶
Classes¶
Master configuration for a Decoding Experiment. |
|
Main executor for decoding experiments. |
Functions¶
|
Retrieve an estimator class by name. |
|
Decorator to register an estimator class under a specific name. |
|
Compute one mean cross-validated score for an estimator. |
Package Contents¶
- class coco_pipe.decoding.ExperimentConfig(/, **data: Any)[source]¶
Bases:
pydantic.BaseModelMaster configuration for a Decoding Experiment.
- task: Literal['classification', 'regression'] = 'classification'¶
- output_dir: pathlib.Path | None = None¶
- tag: str = 'experiment'¶
- models: Dict[str, EstimatorConfigType]¶
- grids: Dict[str, Dict[str, List[Any]]] | None = None¶
- tuning: TuningConfig = None¶
- feature_selection: FeatureSelectionConfig = None¶
- metrics: List[str] = None¶
- temporal: TemporalConfig = None¶
- use_scaler: bool = None¶
- n_jobs: int = -1¶
- verbose: bool = True¶
- class coco_pipe.decoding.Experiment(config: coco_pipe.decoding.configs.ExperimentConfig)[source]¶
Main executor for decoding experiments.
- Parameters:
config (ExperimentConfig) – The complete configuration for the experiment.
- config¶
- results: Dict[str, Any]¶
- _validate_config()[source]¶
Perform comprehensive runtime validation of the configuration.
Logic¶
Tuning Consistency: Warns if tuning.enabled but no grids are provided.
Task vs Metrics: Checks if metrics match the task (e.g. no ‘accuracy’ for regression). Raises ValueError if incompatible.
Task vs CV: Checks if CV strategy matches task (e.g. no ‘stratified’ for regression). Raises ValueError if incompatible.
Task vs Model: Heuristic check for model type (e.g. no Regressor for Classification). Raises ValueError if incompatible.
- raises ValueError:
If configuration contains incompatible settings.
- _prepare_estimator(model_name: str, model_config: Any) sklearn.base.BaseEstimator[source]¶
Orchestrate the creation of the full Estimator Pipeline.
Steps¶
Instantiation: Calls _instantiate_model to get the base estimator (handling recursion).
Scaling: If use_scaler=True, prepends a StandardScaler.
Feature Selection: If enabled, prepends the FS step (Filter or Wrapper).
Pipeline Construction: wraps steps in sklearn.pipeline.Pipeline. - Enables caching if FS + Tuning are both active.
Tuning Wrapper: If tuning is enabled for this model, wraps the Pipeline in GridSearchCV/RandomizedSearchCV via _wrap_with_tuning.
- param model_name:
Friendly name from config (used for grid lookup).
- type model_name:
str
- param model_config:
Pydantic configuration object for the model.
- type model_config:
Any
- returns:
Final ready-to-run estimator (Pipeline or SearchCV).
- rtype:
BaseEstimator
- _instantiate_model(name: str, config: Any) sklearn.base.BaseEstimator[source]¶
Instantiate a raw estimator from its configuration object.
Logic¶
Registry Lookup: Resolves class from config.method.
Recursion: If config implies a meta-estimator (has base_estimator), recursively calls _prepare_estimator for the child.
Parameter Injection: passed config fields as kwargs to __init__. - Automatically filters out invalid parameters if TypeError occurs
(robustness for mismatched config/class versions).
- returns:
The instantiated model (e.g., LogisticRegression instance) without pipeline wrappers.
- rtype:
BaseEstimator
- _create_fs_step(estimator: sklearn.base.BaseEstimator) tuple | None[source]¶
Create a Feature Selection step for the pipeline.
Logic¶
Filter (k_best): Fast. selected before training the classifier based on statistical test. No inner CV loop required.
Wrapper (sfs): Slow but accurate. Wraps the estimator in a SequentialFeatureSelector. This runs an Inner CV Loop (size = config.feature_selection.cv) to validate feature subsets.
If used inside Hyperparameter Tuning, this step is part of the Pipeline, ensuring features are re-selected for every fold and every parameter combination (Nested Simplification).
- returns:
(“fs”, Transformer) step for sklearn Pipeline.
- rtype:
tuple or None
- _wrap_with_tuning(estimator: sklearn.base.BaseEstimator, name: str) sklearn.base.BaseEstimator[source]¶
Wrap the estimator (or pipeline) in a Hyperparameter Search object.
This implements Nested Cross-Validation (Middle Loop): 1. Input: A Pipeline (Scaler + FS + Classifier). 2. Search: Creates a GridSearchCV / RandomizedSearchCV. 3. Process:
For each fold of the tuning CV (defined by config.cv): - Train the Pipeline (including FS!) on the tuning train set. - Evaluate on the tuning validation set.
Finds the best (Hyperparameters + Features) combination.
Refits on the entire training set provided by the Outer Loop.
This ensures simultaneous optimization of Preprocessing (FS) and Modeling parameters.
- run(X: pandas.DataFrame | numpy.ndarray, y: pandas.Series | numpy.ndarray, groups: pandas.Series | numpy.ndarray | None = None) ExperimentResult[source]¶
Execute the full experiment pipeline.
This is the main entry point. It orchestrates: 1. Data Validation: Checks input shapes and types. 2. Model Loop: Iterates through all configured models. 3. Preparation: Instantiates models -> Builds Pipelines (Scaler/FS) ->
Wraps in Tuning.
Validation: Runs the Outer Cross-Validation loop (optionally parallelized).
Aggregation: Collects scores, predictions, and importances.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data (2D) or Time-Series data (3D).
y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target labels or values.
groups (array-like of shape (n_samples,), optional) – Group labels for splitting (e.g., subject-specific splits).
- Returns:
Object containing results with methods to export to Tidy DataFrames.
- Return type:
- save_results(path: str | pathlib.Path | None = None)[source]¶
Serialize results, configuration, and metadata to disk.
- Parameters:
path (str or Path, optional) – Path to save the results. If None, uses config.output_dir. If both are None, raises ValueError.
- static load_results(path: str | pathlib.Path) ExperimentResult[source]¶
Load a saved experiment payload and wrap it in ExperimentResult.
- Returns:
The loaded results wrapper.
- Return type:
- _cross_validate(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: numpy.ndarray, groups: numpy.ndarray | None) Dict[str, Any][source]¶
Execute the Outer Cross-Validation Loop (Evaluation).
This is the Level 1 (Top Level) Splits: - Splits the entire dataset into K folds (defined by config.cv). - For each fold:
Training Data: 80% (if 5-fold). Passed to estimator.fit(). - If estimator is a GridSearch (Tuning Enabled), it will internally split
this 80% again for validation (Level 2 Split).
Test Data: 20%. Used strictly for final estimator.predict() evaluation.
Parallelization¶
If config.n_jobs > 1, these folds run in parallel processes to speed up execution.
- _fit_and_score_fold(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: numpy.ndarray, train_idx: numpy.ndarray, test_idx: numpy.ndarray) Dict[str, Any][source]¶
Execute a single Cross-Validation fold: Fit, Predict, and Score.
Optimized for: - Standard Estimators: (N, F) input -> (N,) output. - Sliding Estimators: (N, F, T) input -> (N, T) output (Diagonal Decoding).
- Returns:
Contains ‘test_idx’, ‘preds’ (y_pred, y_true, y_proba), ‘scores’ (dict of metric values), and ‘importance’.
- Return type:
dict
- static _extract_metadata(estimator: sklearn.base.BaseEstimator) Dict[str, Any][source]¶
Extract training metadata like best Hyperparameters and Selected Features.
- static _compute_metric_safe(scorer, y_true, y_est, is_multiclass, is_proba=False)[source]¶
Compute metric handling standard and temporal (diagonal) shapes.
Shapes Handled¶
Standard: y_est is (N,) or (N, C)
Generalizing (Matrix): - y_pred: (N, T_train, T_test) -> Score each (T_train, T_test) pair. - y_proba: (N, C, T_train, T_test) -> Score each (T_train, T_test) pair.
- coco_pipe.decoding.get_estimator_cls(name: str) Type[source]¶
Retrieve an estimator class by name.
- Parameters:
name (str) – Name of the estimator.
- Returns:
The class object.
- Return type:
Type
- Raises:
ValueError – If name is not found.
- coco_pipe.decoding.register_estimator(name: str) Callable[[Type], Type][source]¶
Decorator to register an estimator class under a specific name.
- Parameters:
name (str) – The unique alias for the estimator (e.g., “RandomForestClassifier”).
- coco_pipe.decoding.cross_validate_score(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: Sequence, *, groups: Sequence | None = None, cv_config: coco_pipe.decoding.configs.CVConfig | None = None, metric: str = 'balanced_accuracy', use_scaler: bool = False) float[source]¶
Compute one mean cross-validated score for an estimator.
- Parameters:
estimator (BaseEstimator) – Estimator to fit inside each fold.
X (np.ndarray) – Input features with shape
(n_samples, n_features).y (sequence) – Target labels aligned with
X.groups (sequence, optional) – Group labels aligned with
X.cv_config (CVConfig, optional) – Cross-validation configuration. Defaults to a 5-fold stratified strategy, or 5-fold stratified-group strategy when groups are provided.
metric (str, default="balanced_accuracy") – Metric name resolved through
get_scorer().use_scaler (bool, default=False) – When
True, wraps the estimator in aStandardScalerpipeline.
- Returns:
Mean cross-validated score.
- Return type:
float