coco_pipe.io
============

.. py:module:: coco_pipe.io


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/coco_pipe/io/config/index
   /autoapi/coco_pipe/io/dataset/index
   /autoapi/coco_pipe/io/load/index
   /autoapi/coco_pipe/io/structures/index
   /autoapi/coco_pipe/io/transform/index
   /autoapi/coco_pipe/io/utils/index


Classes
-------

.. autoapisummary::

   coco_pipe.io.BaseDatasetConfig
   coco_pipe.io.BIDSConfig
   coco_pipe.io.DatasetConfig
   coco_pipe.io.EmbeddingConfig
   coco_pipe.io.TabularConfig
   coco_pipe.io.DataContainer
   coco_pipe.io.SklearnWrapper
   coco_pipe.io.SpatialWhitener


Functions
---------

.. autoapisummary::

   coco_pipe.io.load_data


Package Contents
----------------

.. py:class:: BaseDatasetConfig(/, **data: Any)

   Bases: :py:obj:`pydantic.BaseModel`


   !!! abstract "Usage Documentation"
       [Models](../concepts/models.md)

   A base class for creating Pydantic models.

   .. attribute:: __class_vars__

      The names of the class variables defined on the model.

   .. attribute:: __private_attributes__

      Metadata about the private attributes of the model.

   .. attribute:: __signature__

      The synthesized `__init__` [`Signature`][inspect.Signature] of the model.

   .. attribute:: __pydantic_complete__

      Whether model building is completed, or if there are still undefined fields.

   .. attribute:: __pydantic_core_schema__

      The core schema of the model.

   .. attribute:: __pydantic_custom_init__

      Whether the model has a custom `__init__` function.

   .. attribute:: __pydantic_decorators__

      Metadata containing the decorators defined on the model.
      This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.

   .. attribute:: __pydantic_generic_metadata__

      A dictionary containing metadata about generic Pydantic models.
      The `origin` and `args` items map to the [`__origin__`][genericalias.__origin__]
      and [`__args__`][genericalias.__args__] attributes of [generic aliases][types-genericalias],
      and the `parameter` item maps to the `__parameter__` attribute of generic classes.

   .. attribute:: __pydantic_parent_namespace__

      Parent namespace of the model, used for automatic rebuilding of models.

   .. attribute:: __pydantic_post_init__

      The name of the post-init method for the model, if defined.

   .. attribute:: __pydantic_root_model__

      Whether the model is a [`RootModel`][pydantic.root_model.RootModel].

   .. attribute:: __pydantic_serializer__

      The `pydantic-core` `SchemaSerializer` used to dump instances of the model.

   .. attribute:: __pydantic_validator__

      The `pydantic-core` `SchemaValidator` used to validate instances of the model.

   .. attribute:: __pydantic_fields__

      A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.

   .. attribute:: __pydantic_computed_fields__

      A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.

   .. attribute:: __pydantic_extra__

      A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
      is set to `'allow'`.

   .. attribute:: __pydantic_fields_set__

      The names of fields explicitly set during instantiation.

   .. attribute:: __pydantic_private__

      Values of private attributes set on the model instance.


   .. py:attribute:: path
      :type:  pathlib.Path
      :value: None


   .. py:attribute:: subjects
      :type:  Optional[Union[int, List[Union[str, int]]]]
      :value: None


.. py:class:: BIDSConfig(/, **data: Any)

   Bases: :py:obj:`BaseDatasetConfig`


   Configuration for BIDSDataset.


   .. py:attribute:: mode
      :type:  Literal['bids']
      :value: 'bids'


   .. py:attribute:: task
      :type:  Optional[str]
      :value: None


   .. py:attribute:: session
      :type:  Optional[Union[str, List[str]]]
      :value: None


   .. py:attribute:: datatype
      :type:  str
      :value: 'eeg'


   .. py:attribute:: suffix
      :type:  Optional[str]
      :value: None


   .. py:attribute:: loading_mode
      :type:  str
      :value: None


   .. py:attribute:: window_length
      :type:  Optional[float]
      :value: None


   .. py:attribute:: stride
      :type:  Optional[float]
      :value: None


.. py:class:: DatasetConfig(/, **data: Any)

   Bases: :py:obj:`pydantic.BaseModel`


   Master configuration container for IO.


   .. py:attribute:: dataset
      :type:  Union[TabularConfig, BIDSConfig, EmbeddingConfig]
      :value: None


.. py:class:: EmbeddingConfig(/, **data: Any)

   Bases: :py:obj:`BaseDatasetConfig`


   Configuration for EmbeddingDataset.


   .. py:attribute:: mode
      :type:  Literal['embedding']
      :value: 'embedding'


   .. py:attribute:: pattern
      :type:  str
      :value: '*.pkl'


   .. py:attribute:: dims
      :type:  Tuple[str, Ellipsis]
      :value: ('obs', 'feature')


   .. py:attribute:: coords
      :type:  Optional[Dict[str, Union[List, Any]]]
      :value: None


   .. py:attribute:: task
      :type:  Optional[str]
      :value: None


   .. py:attribute:: run
      :type:  Optional[str]
      :value: None


   .. py:attribute:: processing
      :type:  Optional[str]
      :value: None


.. py:class:: TabularConfig(/, **data: Any)

   Bases: :py:obj:`BaseDatasetConfig`


   Configuration for TabularDataset.


   .. py:attribute:: mode
      :type:  Literal['tabular']
      :value: 'tabular'


   .. py:attribute:: target_col
      :type:  Optional[str]
      :value: None


   .. py:attribute:: index_col
      :type:  Optional[Union[str, int]]
      :value: None


   .. py:attribute:: sep
      :type:  str
      :value: None


   .. py:attribute:: header
      :type:  Optional[Union[int, List[int]]]
      :value: 0


   .. py:attribute:: sheet_name
      :type:  Union[str, int]
      :value: 0


   .. py:attribute:: columns_to_dims
      :type:  Optional[List[str]]
      :value: None


   .. py:attribute:: col_sep
      :type:  str
      :value: '_'


   .. py:attribute:: meta_columns
      :type:  Optional[List[str]]
      :value: None


   .. py:attribute:: clean
      :type:  bool
      :value: False


   .. py:attribute:: clean_kwargs
      :type:  Dict[str, Any]
      :value: None


   .. py:attribute:: select_kwargs
      :type:  Dict[str, Any]
      :value: None


.. py:function:: load_data(path: Union[str, pathlib.Path], mode: str = 'auto', target_col: Optional[str] = None, index_col: Optional[Union[str, int]] = None, sep: str = '\t', header: Optional[Union[int, List[int]]] = 0, sheet_name: Optional[Union[str, int]] = 0, columns_to_dims: Optional[List[str]] = None, col_sep: str = '_', meta_columns: Optional[List[str]] = None, clean: bool = False, clean_kwargs: Optional[Dict[str, Any]] = None, task: Optional[str] = None, session: Optional[Union[str, List[str]]] = None, datatype: str = 'eeg', suffix: Optional[str] = None, loading_mode: str = 'epochs', window_length: Optional[float] = None, stride: Optional[float] = None, subject_metadata_df: Optional[Any] = None, subject_key: Optional[str] = None, pattern: str = '*.pkl', dims: Tuple[str, Ellipsis] = ('obs', 'feature'), coords: Optional[Dict[str, Union[List, numpy.ndarray]]] = None, reader: Optional[Any] = None, id_fn: Optional[Any] = None, subjects: Optional[Union[str, List[str], int, List[int]]] = None, **kwargs) -> coco_pipe.io.structures.DataContainer

   Universal data loader factory.
   Dispatches to `BIDSDataset`, `TabularDataset`, or `EmbeddingDataset` based on
   `mode`.

   :param path: Path to data source (file or directory).
   :type path: str or Path
   :param mode: Type of data to load.
                - "auto": Infers type from file extension or directory structure.
                - "tabular": uses `TabularDataset` (CSV, TSV, Excel, TXT).
                - "bids": uses `BIDSDataset` (BIDS-compliant directories).
                - "embedding": uses `EmbeddingDataset` (NPY, PKL, H5, JSON).
   :type mode: {"auto", "tabular", "bids", "embedding"}, default="auto"
   :param Tabular Arguments (mode="tabular"):
   :param ----------------------------------:
   :param target_col: Name of the column to extract as target `y`. Removed from features `X`.
   :type target_col: str, optional
   :param index_col: Column to use as index (observation IDs).
   :type index_col: str or int, optional
   :param sep: Separator for text files (e.g. ',' for CSV).
   :type sep: str, default='\t'
   :param header: Row number(s) to use as column names.
   :type header: int or list of int, default=0
   :param sheet_name: Sheet name or index for Excel files.
   :type sheet_name: str or int, default=0
   :param columns_to_dims: If provided, attempts to reshape 2D feature columns into N-D dimensions.
                           Columns must follow: `dim1_dim2_..._feature`.
   :type columns_to_dims: list of str, optional
   :param col_sep: Separator used in column names for reshaping.
   :type col_sep: str, default='_'
   :param meta_columns: Columns to extract as metadata coordinates instead of features.
   :type meta_columns: list of str, optional
   :param clean: Whether to perform automated cleaning (drop NaNs/Infs).
   :type clean: bool, default=False
   :param clean_kwargs: Arguments passed to `TabularDataset.clean`.
   :type clean_kwargs: dict, optional
   :param BIDS Arguments (mode="bids"):
   :param ----------------------------:
   :param task: BIDS task name (e.g., 'rest', 'audiovisual').
   :type task: str, optional
   :param session: Session ID(s) to load. Defaults to all available.
   :type session: str or List[str], optional
   :param datatype: Data type folder (e.g., 'eeg', 'meg', 'ieeg').
   :type datatype: str, default='eeg'
   :param suffix: File suffix to load (e.g., 'eeg', 'epo', 'ave').
   :type suffix: str, optional
   :param loading_mode: How to process the data. passed as `mode` to BIDSDataset.
                        - 'epochs': Splices continuous data into fixed-length windows.
                        - 'continuous': Loads as single continuous segments.
                        - 'load_existing': Loads pre-computed epochs.
   :type loading_mode: str, default='epochs'
   :param window_length: Window length in seconds (for 'epochs' mode).
   :type window_length: float, optional
   :param stride: Stride in seconds (for 'epochs' mode).
   :type stride: float, optional
   :param subject_metadata_df: External subject-level metadata to merge by subject during BIDS loading.
   :type subject_metadata_df: DataFrame, optional
   :param subject_key: Column in `subject_metadata_df` containing the BIDS subject identifier.
   :type subject_key: str, optional
   :param subjects: Specific subject IDs to load (without 'sub-').
   :type subjects: str or List[str], optional
   :param Embedding Arguments (mode="embedding"):
   :param --------------------------------------:
   :param pattern: Glob pattern to match files.
   :type pattern: str, default='*.pkl'
   :param dims: Dimension labels for the data arrays.
   :type dims: tuple of str, default=('obs', 'feature')
   :param coords: Dictionary of coordinates for dimensions.
   :type coords: dict, optional
   :param reader: Custom file reader function.
   :type reader: callable, optional
   :param id_fn: Custom subject ID extraction function.
   :type id_fn: callable, optional
   :param subjects: If int, loads first N subjects. If list, filters by ID.
   :type subjects: int or list, optional

   :returns: Standardized data container with attributes:
             - X: (N_obs, ...) data array
             - y: Targets (if available)
             - ids: Observation identifiers
             - coords: Coordinate metadata
   :rtype: DataContainer


.. py:class:: DataContainer

   Generic container for N-dimensional neurophysiological data.

   Acts as a lightweight labelled array (like xarray but simpler), managing
   dimensions, coordinates, and associated target labels (y) and IDs.

   .. attribute:: X

      The primary data tensor. Shape must match `dims`.

      :type: np.ndarray

   .. attribute:: dims

      Labels for each dimension of X.
      Examples: ('obs', 'feature'), ('obs', 'channel', 'time').
      Note: The 'obs' dimension is special and typically represents independent
      samples.

      :type: Tuple[str, ...]

   .. attribute:: coords

      Coordinates/Labels for dimensions. Keys must be in `dims`.
      Values must match the length of the corresponding dimension in X.

      :type: Dict[str, Union[List, np.ndarray]]

   .. attribute:: y

      Target labels corresponding to the 'obs' dimension.
      Used for supervised learning or coloring plots.

      :type: Optional[np.ndarray], optional

   .. attribute:: ids

      Identifiers for observations (e.g., subject IDs, trial names).
      Should correspond to 'obs' dim in coords if provided.
      Kept separate from coords for convenient tracking.

      :type: Optional[np.ndarray], optional

   .. attribute:: meta

      Arbitrary metadata (sfreq, units, source path, etc).

      :type: Dict[str, Any]

   .. rubric:: Examples

   Accessing data:
   >>> container.X.shape
   (10, 64, 500)

   Accessing coordinates:
   >>> container.coords['channel'][:3]
   ['Fz', 'Cz', 'Pz']


   .. py:attribute:: X
      :type:  numpy.ndarray


   .. py:attribute:: dims
      :type:  Tuple[str, Ellipsis]


   .. py:attribute:: coords
      :type:  Dict[str, Union[List, numpy.ndarray, Sequence]]


   .. py:attribute:: y
      :type:  Optional[numpy.ndarray]
      :value: None


   .. py:attribute:: ids
      :type:  Optional[numpy.ndarray]
      :value: None


   .. py:attribute:: meta
      :type:  Dict[str, Any]


   .. py:method:: __post_init__()


   .. py:property:: shape
      :type: Tuple[int, Ellipsis]


   .. py:method:: save(path: Union[str, Any]) -> None

      Save the DataContainer to disk using joblib.

      :param path: Destination file path.
      :type path: str or Path


   .. py:method:: load(path: Union[str, Any]) -> DataContainer
      :classmethod:


      Load a DataContainer from disk.

      :param path: Source file path.
      :type path: str or Path

      :rtype: DataContainer


   .. py:method:: __repr__() -> str


   .. py:method:: obs_table(include_ids: bool = False, id_col: str = 'obs_id', include_y: bool = False, y_col: str = 'y', include_obs_coord: bool = False) -> pandas.DataFrame

      Return one-dimensional coordinates aligned to the observation axis.

      This helper is useful when exporting a row-wise table from a container.
      It only materializes metadata that can map cleanly to one row per
      observation, skipping coordinates that belong to other axes such as
      ``channel``, ``time``, ``feature``, or ``stat``.

      :param include_ids: If True, include ``self.ids`` as the first column.
      :type include_ids: bool, default=False
      :param id_col: Column name used when exporting ``self.ids``.
      :type id_col: str, default="obs_id"
      :param include_y: If True, include ``self.y`` as a column when present.
      :type include_y: bool, default=False
      :param y_col: Column name used when exporting ``self.y``.
      :type y_col: str, default="y"
      :param include_obs_coord: If True, include ``coords["obs"]`` when present.
      :type include_obs_coord: bool, default=False

      :returns: DataFrame containing only one-dimensional observation-aligned
                metadata columns.
      :rtype: pandas.DataFrame

      :raises ValueError: If the container has no ``obs`` dimension, or if ``include_ids`` is
          requested when ``self.ids`` is missing.


   .. py:method:: isel(**indexers) -> DataContainer

      Select data by integer indices on specified dimensions.

      This method is the integer-index equivalent of `select`. It operates
      directly on the dimensions of the data tensor `X`. It is robust and
      handles metadata splitting/alignment automatically.

      :param \*\*indexers: Key: Dimension name (e.g., 'obs', 'channel', 'time').
                           Value: Integer indices to select. Can be:
                               - List or numpy array of integers: [0, 1, 5]
                               - Slice object: slice(0, 10)
                               - Single integer: 0

                           Note: If you provide a list of indices with repeats (e.g., [0, 0, 1]),
                           the output will be oversampled accordingly.
      :type \*\*indexers: dict

      :returns: A new DataContainer instance with the sliced data and coordinates.
      :rtype: DataContainer

      .. rubric:: Examples

      >>> # Select first 10 observations
      >>> subset = container.isel(obs=slice(0, 10))

      >>> # Select specific channels by index
      >>> subset = container.isel(channel=[0, 5, 12])

      >>> # Select time range by index
      >>> subset = container.isel(time=slice(100, 200))

      >>> # Bootstrap/Resample (Select index 0 five times)
      >>> bootstrap = container.isel(obs=[0, 0, 0, 0, 0])


   .. py:method:: balance(target: str = 'y', strategy: str = 'undersample', covariates: Optional[List[str]] = None, random_state: int = 42, **kwargs) -> DataContainer

      Balance the dataset classes using undersampling or oversampling.

      This method adjusts the number of observations (rows) in the container
      so that class counts in `target` are equalized. It supports simple
      random sampling and stratified sampling based on covariates.

      :param target: Name of the target variable.
                     - 'y': Uses `self.y`.
                     - Any other string: Looks for the variable in `self.coords`.
      :type target: str, default='y'
      :param strategy:
                       - 'undersample': Downsample majority classes to match the minority
                         class count.
                       - 'oversample': Upsample minority classes (with replacement) to match
                         the majority class.
                       - 'auto': Heuristic choice. Uses undersampling if total size remains >
                         50% of original, else oversampling.
      :type strategy: {'undersample', 'oversample', 'auto'}, default='undersample'
      :param covariates: List of covariate names in `self.coords` to preserve distribution of.
                         If provided, the balancing is performed *within* strata defined by these
                         covariates.
      :type covariates: list of str, optional
      :param random_state: Seed for the random number generator.
                           Change this value to produce different random subsets (e.g., for bagging).
      :type random_state: int, default=42
      :param \*\*kwargs: Additional arguments passed to internal logic:
                         - n_bins (int): Number of bins for continuous covariates (default 5).
                         - binning (str): 'quantile' (default) or 'uniform' binning.
                         - prefer_clean_rows (bool): If True, weighs sampling to prefer rows
                           with fewer NaNs/artifacts.
      :type \*\*kwargs: dict

      :returns: A new DataContainer instance with balanced classes.
      :rtype: DataContainer

      .. rubric:: Examples

      >>> # 1. Simple Undersampling of 'y'
      >>> balanced = container.balance(strategy='undersample')

      >>> # 2. Balance based on a metadata column 'condition'
      >>> balanced = container.balance(target='condition')

      >>> # 3. Stratified Balancing (Balance 'y' while preserving 'sex' and 'age'
      >>> #    ratios)
      >>> balanced = container.balance(target='y', covariates=['sex', 'age'])

      >>> # 4. Iterative Bootstrapping (Different seeds)
      >>> for seed in [1, 2, 3]:
      ...     subset = container.balance(strategy='undersample', random_state=seed)
      ...     # process subset...


   .. py:method:: select(ignore_case: bool = False, fuzzy: bool = False, **selections) -> DataContainer

      Select data subsets based on coordinates, ids, or y.

      This method supports exact matching, wildcard matching, operator-based
      filtering, and custom callable filters.

      :param ignore_case: If True, string matching is case-insensitive (e.g., 'fz' matches 'Fz').
      :type ignore_case: bool, default=False
      :param fuzzy: If True, uses `difflib` to find closest matches for string queries
                    (e.g., 'Alpha' matches 'alpha'). Useful for handling typos.
      :type fuzzy: bool, default=False
      :param \*\*selections: Key is the dimension name (or special keys 'y', 'ids').
                             Value is the query. Supported query types:

                             1. **List/Array (Exact or Wildcard)**:
                                Matches values present in the list. Strings can use shell-style
                                wildcards ('*', '?').

                             2. **Dictionary (Operator Queries)**:
                                Filters numerical or string values using operators.
                                Keys: '>', '<', '>=', '<=', '==', '!=', 'in'.

                             3. **Callable**:
                                A function taking the coordinate array and returning a boolean mask.
      :type \*\*selections: dict

      :returns: A new DataContainer instance containing the selected subset.
      :rtype: DataContainer

      .. rubric:: Examples

      >>> # 1. Exact Selection (Sensors)
      >>> sub = container.select(channel=['Fz', 'Cz'])

      >>> # 2. Wildcard Selection (All Alpha features)
      >>> sub = container.select(feature='*alpha*')

      >>> # 3. Range Selection (Time)
      >>> sub = container.select(time={'>=': 0.1, '<': 0.5})

      >>> # 4. Case-Insensitive Fuzzy Matching
      >>> sub = container.select(channel=['fz'], ignore_case=True)

      >>> # 5. Filter by Target (y)
      >>> sub = container.select(y=['Patient'])

      >>> # 6. Complex Logic (Subjects 1-5 via Operator)
      >>> sub = container.select(subject_id={'>=': 1, '<=': 5})

      >>> # 7. Stratified Selection (First 2 epochs per subject via Callable)
      >>> def first_n(ids, n=2):
      ...     # ... logic ...
      ...     return mask
      >>> sub = container.select(ids=first_n)


   .. py:method:: flatten(preserve: Union[str, List[str]] = 'obs') -> DataContainer

      Flatten dimensions NOT in `preserve` into a single 'feature' dimension.

      This is useful for preparing N-dimensional data for standard 2D machine
      learning algorithms (scikit-learn). It automatically generates composite
      feature names (e.g., 'Fz_0.1s') for tracking.

      :param preserve: Dimensions to keep. All other dimensions will be collapsed into a
                       single 'feature' dimension.
                       - 'obs': Result shape (N_obs, N_features). Standard specifiction.
                       - ['obs', 'time']: Result shape (N_obs, N_time, N_features).
                         Useful for time-resolved decoding distributions.
      :type preserve: str or List[str], default='obs'

      :returns: A new DataContainer with reshaped X and generated 'feature' coordinates.
      :rtype: DataContainer

      .. rubric:: Examples

      >>> # Flatten (10, 64, 500) -> (10, 32000)
      >>> flat = container.flatten(preserve='obs')
      >>> flat.shape
      (10, 32000)
      >>> flat.coords['feature'][0]
      'Fz_0.0'

      >>> # Flatten spatial only, keep time (10, 64, 500) -> (10, 500, 64)
      >>> time_resolved = container.flatten(preserve=['obs', 'time'])


   .. py:method:: stack(dims: Sequence[str], new_dim: str = 'obs') -> DataContainer

      Stack multiple dimensions into a single new dimension.

      This reshapes N-dimensional data into (N-K) dimensions by combining
      specified dimensions. It is useful for transforming spatiotemporal data
      (Trials, Channels, Time) -> (Trials*Time, Channels) for trajectory analysis.

      :param dims: Dimensions to stack. The order determines the nesting (slowest to fastest).
                   e.g., ('obs', 'time') means 'obs' changes slowly, 'time' cycles fast.
      :type dims: sequence of str
      :param new_dim: Name of the resulting stacked dimension.
      :type new_dim: str, default='obs'

      :returns: New container with stacked dimension. Metadata (coords/ids) are
                expanded/tiled to match the new shape.
      :rtype: DataContainer

      .. rubric:: Examples

      >>> # Stack time into observations:
      >>> # (10 obs, 64 ch, 500 time) -> (5000 obs, 64 ch)
      >>> stacked = container.stack(dims=('obs', 'time'), new_dim='obs')
      >>> stacked.shape
      (5000, 64)


   .. py:method:: unstack(dim: str) -> DataContainer

      Unstack a dimension into multiple dimensions.

      Inverse operation of `stack`. Reshapes the data tensor by splitting one
      dimension into multiple using metadata stored during the `stack` operation.

      :param dim: Dimension to unstack (e.g. 'obs').
      :type dim: str

      :returns: New container with unstacked dimensions.
      :rtype: DataContainer

      :raises ValueError: If the container was not previously stacked (missing metadata).

      .. rubric:: Examples

      >>> # Stack 'trials' and 'time' -> 'obs'
      >>> stacked = container.stack(('trials', 'time'), new_dim='obs')
      >>> # Unstack 'obs' -> ('trials', 'time') (automatically inferred)
      >>> unstacked = stacked.unstack('obs')


   .. py:method:: center(dim: str = 'time', inplace: bool = False) -> DataContainer

      Remove mean along a specified dimension (Centering/Baseline Correction).

      This operation computes the mean along `dim` (ignoring NaNs) and subtracts it.
      Commonly used in EEG for baseline correction (subtracting mean of
      pre-stimulus interval) or centering features before covariance calculation.

      :param dim: Dimension name to center over (e.g., 'time', 'channel', 'obs').
      :type dim: str, default='time'
      :param inplace: If True, modifies X in-place to save memory.
                      Returns self.
      :type inplace: bool, default=False

      :returns: Container with centered data.
      :rtype: DataContainer

      .. rubric:: Examples

      >>> # Baseline correction over time
      >>> container.center(dim='time')


   .. py:method:: zscore(dim: str = 'time', eps: float = 1e-08, inplace: bool = False) -> DataContainer

      Standardize (Z-score) along a specified dimension.

      Computes `(X - mean) / std` along the given dimension. Robust to NaNs.
      Useful for normalizing features or standardizing temporal dynamics.

      :param dim: Dimension to standardize.
      :type dim: str
      :param eps: Stability epsilon to avoid division by zero.
      :type eps: float
      :param inplace:
      :type inplace: bool

      :rtype: DataContainer

      .. rubric:: Examples

      >>> # Standardize each channel's timecourse
      >>> container.zscore(dim='time')


   .. py:method:: rms_scale(dim: str = 'time', eps: float = 1e-08, inplace: bool = False) -> DataContainer

      Scale by Root Mean Square (RMS) amplitude along a dimension.

      Divides data by `sqrt(mean(X**2))` along the dimension.
      Preserves relative shape but normalizes energy.

      :param dim: Dimension to scale.
      :type dim: str
      :param eps: Stability epsilon.
      :type eps: float
      :param inplace:
      :type inplace: bool

      :rtype: DataContainer


   .. py:method:: baseline_correction(dim: str = 'time', inplace: bool = False) -> DataContainer

      Alias for center(). Common in EEG.


   .. py:method:: aggregate(by: Union[str, numpy.ndarray, List[Any]], stats: Union[str, Sequence[str]] = 'mean', min_count: int = 1, on_insufficient: str = 'raise') -> DataContainer

      Aggregate observations into grouped summaries along the ``obs`` axis.

      :param by: Group definition for the observation axis.
                 - If str: resolve the key from ``self.coords`` or from ``self.y``
                   when ``by == "y"``.
                 - If array-like: explicit group labels aligned with ``obs``.
      :type by: str or array-like
      :param stats: Aggregation statistic or ordered list of statistics. Supported
                    tokens are ``"mean"``, ``"median"``, ``"std"``, ``"var"``,
                    ``"sem"``, ``"mad"``, ``"iqr"``, ``"min"``, ``"max"``,
                    ``"count"``, and ``"first"``. Legacy ``"obs-*"`` aliases are
                    accepted and normalized.
      :type stats: str or sequence of str, default="mean"
      :param min_count: Minimum number of valid observations required per group. A valid
                        observation is one with at least one finite value across the
                        non-observation axes.
      :type min_count: int, default=1
      :param on_insufficient: Policy applied when a group has fewer than ``min_count`` valid
                              observations.
      :type on_insufficient: {"raise", "warn", "collect"}, default="raise"

      :returns: Aggregated container with grouped observations on the ``obs`` axis.
                When multiple stats are requested, a ``stat`` dimension is inserted
                immediately after ``obs``.
      :rtype: DataContainer

      :raises ValueError: If the container has no ``obs`` dimension, grouping is invalid,
          requested stats are unsupported, or ``min_count`` /
          ``on_insufficient`` are invalid.


   .. py:method:: aggregate_groups(by: Union[str, numpy.ndarray, List[Any]], groups: Sequence[Dict[str, Any]], min_count: int = 1, on_insufficient: str = 'raise', skip_empty: bool = True) -> DataContainer

      Aggregate selected feature groups with different statistics.

      This is a thin wrapper around :meth:`aggregate` for tabular feature
      containers. Each group spec selects a subset of feature columns and
      applies one or more stats to that subset. The outputs are concatenated
      along the ``feature`` dimension, and each resulting feature name is
      prefixed with its stat (for example ``"mean_band_log_abs_alpha"``).

      :param by: Group definition for the observation axis. Passed through to
                 :meth:`aggregate`.
      :type by: str or array-like
      :param groups: Ordered group specifications. Each group must provide ``"stats"``
                     and may optionally provide include/exclude selectors:

                     - ``names`` / ``exclude_names``
                     - ``prefixes`` / ``exclude_prefixes``
                     - ``suffixes`` / ``exclude_suffixes``
                     - ``contains`` / ``exclude_contains``
                     - ``regex`` / ``exclude_regex``

                     If a group provides no include selectors, it starts from all
                     features and then applies exclusions.
      :type groups: sequence of dict
      :param min_count: Minimum number of valid observations required per group. Passed
                        through to :meth:`aggregate`.
      :type min_count: int, default=1
      :param on_insufficient: Policy applied when a group has fewer than ``min_count`` valid
                              observations. Passed through to :meth:`aggregate`.
      :type on_insufficient: {"raise", "warn", "collect"}, default="raise"
      :param skip_empty: If True, silently skip group specs that match no features. If
                         False, raise a ``ValueError`` when a group matches nothing.
      :type skip_empty: bool, default=True

      :returns: Aggregated container with dims ``("obs", "feature")`` and
                stat-prefixed feature names.
      :rtype: DataContainer

      :raises ValueError: If the container lacks a ``feature`` dimension or coord, no groups
          are provided, a group spec is invalid, multiple groups would emit
          the same output feature name, or no non-empty grouped outputs are
          produced.


.. py:class:: SklearnWrapper(transformer: sklearn.base.BaseEstimator)

   Bases: :py:obj:`sklearn.base.BaseEstimator`, :py:obj:`sklearn.base.TransformerMixin`


   Generic wrapper for ANY scikit-learn transformer (Scaler, PCA, etc.).

   This wrapper applies a standard scikit-learn transformer to the `.X` data
   matrix of a `DataContainer`, ensuring that the resulting container has
   correctly updated data while checking for dimension compatibility.

   :param transformer: An instantiated scikit-learn transformer (e.g., `StandardScaler()`,
                       `PCA(n_components=10)`).
   :type transformer: BaseEstimator

   .. attribute:: estimator_

      The fitted scikit-learn estimator.

      :type: BaseEstimator

   .. rubric:: Examples

   >>> from sklearn.preprocessing import RobustScaler
   >>> from coco_pipe.io import DataContainer, SklearnWrapper
   >>> import numpy as np

   >>> # Create formatted data (100 obs, 10 features)
   >>> X = np.random.randn(100, 10)
   >>> container = DataContainer(X, dims=('obs', 'feature'))

   >>> # Wrap a Scaler
   >>> scaler = SklearnWrapper(RobustScaler())
   >>> scaled_container = scaler.fit_transform(container)

   >>> # Metadata is preserved
   >>> scaled_container.dims == container.dims
   True


   .. py:attribute:: transformer


   .. py:attribute:: estimator_
      :value: None


   .. py:method:: fit(container: coco_pipe.io.structures.DataContainer, y=None)


   .. py:method:: transform(container: coco_pipe.io.structures.DataContainer) -> coco_pipe.io.structures.DataContainer


   .. py:method:: fit_transform(container: coco_pipe.io.structures.DataContainer, y=None)

      Fit to data, then transform it.

      Fits transformer to `X` and `y` with optional parameters `fit_params`
      and returns a transformed version of `X`.

      :param X: Input samples.
      :type X: array-like of shape (n_samples, n_features)
      :param y: Target values (None for unsupervised transformations).
      :type y: array-like of shape (n_samples,) or (n_samples, n_outputs),                 default=None
      :param \*\*fit_params: Additional fit parameters.
      :type \*\*fit_params: dict

      :returns: **X_new** -- Transformed array.
      :rtype: ndarray array of shape (n_samples, n_features_new)


   .. py:method:: inverse_transform(container: coco_pipe.io.structures.DataContainer) -> coco_pipe.io.structures.DataContainer


.. py:class:: SpatialWhitener(method: str = 'pca', n_components: Optional[Union[int, float]] = None)

   Bases: :py:obj:`sklearn.base.BaseEstimator`, :py:obj:`sklearn.base.TransformerMixin`


   M/EEG Spatial Whitening using Covariance Decorrelation.

   This transformer removes spatial correlations between channels, effectively
   transforming the noise covariance matrix towards the identity matrix. It supports
   standard PCA, ZCA (Zero-phase Component Analysis which preserves topography),
   and robust shrinkage covariance estimation (OAS).

   It requires a dimension named 'channel' in the input `DataContainer`.
   The operation is performed spatially: :math:`X_{white} = X \cdot W^T`

   :param method: Shape of the transformation:
                  - 'pca': Principal Component Analysis. Rotates data to principal axes and
                    scales to unit variance.
                  - 'zca': Zero-phase Component Analysis. Rotates, scales, and rotates back.
                    Preserves spatial topography (sensors stay in place).
                  - 'shrinkage': Uses Oracle Approximating Shrinkage (OAS) for robust
                    covariance estimation in high dimensions.
   :type method: {'pca', 'zca', 'shrinkage'}, default='pca'
   :param n_components: Number of components to keep (only for 'pca'/'zca' methods). If None, all
                        matches are kept.
   :type n_components: int or float, optional

   .. attribute:: whitener_

      The estimated whitening matrix (W). Shape (n_components, n_channels).

      :type: np.ndarray

   .. attribute:: mean_

      Per-channel mean vector.

      :type: np.ndarray

   .. attribute:: inverse_whitener_

      The inverse matrix used to project back to sensor space.

      :type: np.ndarray

   .. rubric:: Examples

   >>> # Whitening EEG epochs (100 epochs, 64 channels, 500 times)
   >>> container = DataContainer(
   ...     np.random.randn(100, 64, 500), dims=('obs', 'channel', 'time')
   ... )

   >>> # Use Shrinkage for robust covariance
   >>> whitener = SpatialWhitener(method='shrinkage')
   >>> white_data = whitener.fit_transform(container)

   >>> # Project back to sensor space for plotting
   >>> sensor_data = whitener.inverse_transform(white_data)


   .. py:attribute:: method
      :value: 'pca'


   .. py:attribute:: n_components
      :value: None


   .. py:attribute:: whitener_
      :value: None


   .. py:attribute:: mean_
      :value: None


   .. py:attribute:: inverse_whitener_
      :value: None


   .. py:method:: fit(container: coco_pipe.io.structures.DataContainer, y=None)


   .. py:method:: _fit_pca(X_flat: numpy.ndarray)


   .. py:method:: _fit_shrinkage(X_flat: numpy.ndarray)


   .. py:method:: transform(container: coco_pipe.io.structures.DataContainer) -> coco_pipe.io.structures.DataContainer


   .. py:method:: fit_transform(container: coco_pipe.io.structures.DataContainer, y=None)

      Fit to data, then transform it.

      Fits transformer to `X` and `y` with optional parameters `fit_params`
      and returns a transformed version of `X`.

      :param X: Input samples.
      :type X: array-like of shape (n_samples, n_features)
      :param y: Target values (None for unsupervised transformations).
      :type y: array-like of shape (n_samples,) or (n_samples, n_outputs),                 default=None
      :param \*\*fit_params: Additional fit parameters.
      :type \*\*fit_params: dict

      :returns: **X_new** -- Transformed array.
      :rtype: ndarray array of shape (n_samples, n_features_new)


   .. py:method:: inverse_transform(container: coco_pipe.io.structures.DataContainer) -> coco_pipe.io.structures.DataContainer


   .. py:method:: _apply_linear_op(container: coco_pipe.io.structures.DataContainer, W: numpy.ndarray, mean: Optional[numpy.ndarray]) -> numpy.ndarray