coco_pipe.decoding ================== .. py:module:: coco_pipe.decoding Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/coco_pipe/decoding/configs/index /autoapi/coco_pipe/decoding/core/index /autoapi/coco_pipe/decoding/registry/index /autoapi/coco_pipe/decoding/utils/index Classes ------- .. autoapisummary:: coco_pipe.decoding.ExperimentConfig coco_pipe.decoding.Experiment Functions --------- .. autoapisummary:: coco_pipe.decoding.get_estimator_cls coco_pipe.decoding.register_estimator coco_pipe.decoding.cross_validate_score Package Contents ---------------- .. py:class:: ExperimentConfig(/, **data: Any) Bases: :py:obj:`pydantic.BaseModel` Master configuration for a Decoding Experiment. .. py:attribute:: task :type: Literal['classification', 'regression'] :value: 'classification' .. py:attribute:: output_dir :type: Optional[pathlib.Path] :value: None .. py:attribute:: tag :type: str :value: 'experiment' .. py:attribute:: models :type: Dict[str, EstimatorConfigType] .. py:attribute:: grids :type: Optional[Dict[str, Dict[str, List[Any]]]] :value: None .. py:attribute:: cv :type: CVConfig :value: None .. py:attribute:: tuning :type: TuningConfig :value: None .. py:attribute:: feature_selection :type: FeatureSelectionConfig :value: None .. py:attribute:: metrics :type: List[str] :value: None .. py:attribute:: temporal :type: TemporalConfig :value: None .. py:attribute:: use_scaler :type: bool :value: None .. py:attribute:: n_jobs :type: int :value: -1 .. py:attribute:: verbose :type: bool :value: True .. py:class:: Experiment(config: coco_pipe.decoding.configs.ExperimentConfig) Main executor for decoding experiments. :param config: The complete configuration for the experiment. :type config: ExperimentConfig .. py:attribute:: config .. py:attribute:: results :type: Dict[str, Any] .. py:method:: _validate_config() Perform comprehensive runtime validation of the configuration. Logic ----- 1. **Tuning Consistency**: Warns if `tuning.enabled` but no `grids` are provided. 2. **Task vs Metrics**: Checks if metrics match the task (e.g. no 'accuracy' for regression). Raises ValueError if incompatible. 3. **Task vs CV**: Checks if CV strategy matches task (e.g. no 'stratified' for regression). Raises ValueError if incompatible. 4. **Task vs Model**: Heuristic check for model type (e.g. no Regressor for Classification). Raises ValueError if incompatible. :raises ValueError: If configuration contains incompatible settings. .. py:method:: _prepare_estimator(model_name: str, model_config: Any) -> sklearn.base.BaseEstimator Orchestrate the creation of the full Estimator Pipeline. Steps ----- 1. **Instantiation**: Calls `_instantiate_model` to get the base estimator (handling recursion). 2. **Scaling**: If `use_scaler=True`, prepends a StandardScaler. 3. **Feature Selection**: If enabled, prepends the FS step (Filter or Wrapper). 4. **Pipeline Construction**: wraps steps in `sklearn.pipeline.Pipeline`. - Enables caching if FS + Tuning are both active. 5. **Tuning Wrapper**: If tuning is enabled for this model, wraps the Pipeline in GridSearchCV/RandomizedSearchCV via `_wrap_with_tuning`. :param model_name: Friendly name from config (used for grid lookup). :type model_name: str :param model_config: Pydantic configuration object for the model. :type model_config: Any :returns: Final ready-to-run estimator (Pipeline or SearchCV). :rtype: BaseEstimator .. py:method:: _instantiate_model(name: str, config: Any) -> sklearn.base.BaseEstimator Instantiate a raw estimator from its configuration object. Logic ----- 1. **Registry Lookup**: Resolves class from `config.method`. 2. **Recursion**: If config implies a meta-estimator (has `base_estimator`), recursively calls `_prepare_estimator` for the child. 3. **Parameter Injection**: passed config fields as kwargs to `__init__`. - Automatically filters out invalid parameters if `TypeError` occurs (robustness for mismatched config/class versions). :returns: The instantiated model (e.g., LogisticRegression instance) without pipeline wrappers. :rtype: BaseEstimator .. py:method:: _create_fs_step(estimator: sklearn.base.BaseEstimator) -> Optional[tuple] Create a Feature Selection step for the pipeline. Logic ----- - **Filter (k_best)**: Fast. selected before training the classifier based on statistical test. No inner CV loop required. - **Wrapper (sfs)**: Slow but accurate. Wraps the estimator in a SequentialFeatureSelector. This runs an **Inner CV Loop** (size = config.feature_selection.cv) to validate feature subsets. If used inside Hyperparameter Tuning, this step is part of the Pipeline, ensuring features are re-selected for every fold and every parameter combination (Nested Simplification). :returns: ("fs", Transformer) step for sklearn Pipeline. :rtype: tuple or None .. py:method:: _wrap_with_tuning(estimator: sklearn.base.BaseEstimator, name: str) -> sklearn.base.BaseEstimator Wrap the estimator (or pipeline) in a Hyperparameter Search object. This implements **Nested Cross-Validation** (Middle Loop): 1. **Input**: A Pipeline (Scaler + FS + Classifier). 2. **Search**: Creates a GridSearchCV / RandomizedSearchCV. 3. **Process**: - For each fold of the *tuning* CV (defined by config.cv): - Train the Pipeline (including FS!) on the tuning train set. - Evaluate on the tuning validation set. - Finds the best (Hyperparameters + Features) combination. - Refits on the entire training set provided by the Outer Loop. This ensures simultaneous optimization of Preprocessing (FS) and Modeling parameters. .. py:method:: run(X: Union[pandas.DataFrame, numpy.ndarray], y: Union[pandas.Series, numpy.ndarray], groups: Optional[Union[pandas.Series, numpy.ndarray]] = None) -> ExperimentResult Execute the full experiment pipeline. This is the main entry point. It orchestrates: 1. **Data Validation**: Checks input shapes and types. 2. **Model Loop**: Iterates through all configured models. 3. **Preparation**: Instantiates models -> Builds Pipelines (Scaler/FS) -> Wraps in Tuning. 4. **Validation**: Runs the Outer Cross-Validation loop (optionally parallelized). 5. **Aggregation**: Collects scores, predictions, and importances. :param X: Training data (2D) or Time-Series data (3D). :type X: array-like of shape (n_samples, n_features) :param y: Target labels or values. :type y: array-like of shape (n_samples,) or (n_samples, n_targets) :param groups: Group labels for splitting (e.g., subject-specific splits). :type groups: array-like of shape (n_samples,), optional :returns: Object containing results with methods to export to Tidy DataFrames. :rtype: ExperimentResult .. py:method:: save_results(path: Optional[Union[str, pathlib.Path]] = None) Serialize results, configuration, and metadata to disk. :param path: Path to save the results. If None, uses config.output_dir. If both are None, raises ValueError. :type path: str or Path, optional .. py:method:: load_results(path: Union[str, pathlib.Path]) -> ExperimentResult :staticmethod: Load a saved experiment payload and wrap it in ExperimentResult. :returns: The loaded results wrapper. :rtype: ExperimentResult .. py:method:: _cross_validate(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: numpy.ndarray, groups: Optional[numpy.ndarray]) -> Dict[str, Any] Execute the Outer Cross-Validation Loop (Evaluation). This is the **Level 1 (Top Level)** Splits: - Splits the entire dataset into K folds (defined by config.cv). - For each fold: 1. **Training Data**: 80% (if 5-fold). Passed to `estimator.fit()`. - If `estimator` is a GridSearch (Tuning Enabled), it will internally split this 80% again for validation (Level 2 Split). 2. **Test Data**: 20%. Used strictly for final `estimator.predict()` evaluation. Parallelization --------------- If `config.n_jobs > 1`, these folds run in parallel processes to speed up execution. .. py:method:: _fit_and_score_fold(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: numpy.ndarray, train_idx: numpy.ndarray, test_idx: numpy.ndarray) -> Dict[str, Any] Execute a single Cross-Validation fold: Fit, Predict, and Score. Optimized for: - **Standard Estimators**: (N, F) input -> (N,) output. - **Sliding Estimators**: (N, F, T) input -> (N, T) output (Diagonal Decoding). :returns: Contains 'test_idx', 'preds' (y_pred, y_true, y_proba), 'scores' (dict of metric values), and 'importance'. :rtype: dict .. py:method:: _extract_metadata(estimator: sklearn.base.BaseEstimator) -> Dict[str, Any] :staticmethod: Extract training metadata like best Hyperparameters and Selected Features. .. py:method:: _compute_metric_safe(scorer, y_true, y_est, is_multiclass, is_proba=False) :staticmethod: Compute metric handling standard and temporal (diagonal) shapes. Shapes Handled -------------- - **Standard**: y_est is (N,) or (N, C) - **Generalizing (Matrix)**: - y_pred: (N, T_train, T_test) -> Score each (T_train, T_test) pair. - y_proba: (N, C, T_train, T_test) -> Score each (T_train, T_test) pair. .. py:method:: _force_serial_execution(estimator: sklearn.base.BaseEstimator) -> sklearn.base.BaseEstimator Recursively set n_jobs=1 for the estimator and its sub-components. Used when the outer loop is already parallelized to avoid oversubscription. .. py:method:: _extract_feature_importances(estimator: sklearn.base.BaseEstimator) -> Optional[numpy.ndarray] :staticmethod: Extract feature importances or coefficients from a fitted estimator. Handles Pipelines and Feature Selection. .. py:function:: get_estimator_cls(name: str) -> Type Retrieve an estimator class by name. :param name: Name of the estimator. :type name: str :returns: The class object. :rtype: Type :raises ValueError: If name is not found. .. py:function:: register_estimator(name: str) -> Callable[[Type], Type] Decorator to register an estimator class under a specific name. :param name: The unique alias for the estimator (e.g., "RandomForestClassifier"). :type name: str .. py:function:: cross_validate_score(estimator: sklearn.base.BaseEstimator, X: numpy.ndarray, y: Sequence, *, groups: Optional[Sequence] = None, cv_config: Optional[coco_pipe.decoding.configs.CVConfig] = None, metric: str = 'balanced_accuracy', use_scaler: bool = False) -> float Compute one mean cross-validated score for an estimator. :param estimator: Estimator to fit inside each fold. :type estimator: BaseEstimator :param X: Input features with shape ``(n_samples, n_features)``. :type X: np.ndarray :param y: Target labels aligned with ``X``. :type y: sequence :param groups: Group labels aligned with ``X``. :type groups: sequence, optional :param cv_config: Cross-validation configuration. Defaults to a 5-fold stratified strategy, or 5-fold stratified-group strategy when groups are provided. :type cv_config: CVConfig, optional :param metric: Metric name resolved through :func:`get_scorer`. :type metric: str, default="balanced_accuracy" :param use_scaler: When ``True``, wraps the estimator in a ``StandardScaler`` pipeline. :type use_scaler: bool, default=False :returns: Mean cross-validated score. :rtype: float