coco_pipe.io ============ .. py:module:: coco_pipe.io Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/coco_pipe/io/config/index /autoapi/coco_pipe/io/dataset/index /autoapi/coco_pipe/io/load/index /autoapi/coco_pipe/io/structures/index /autoapi/coco_pipe/io/transform/index /autoapi/coco_pipe/io/utils/index Classes ------- .. autoapisummary:: coco_pipe.io.BaseDatasetConfig coco_pipe.io.BIDSConfig coco_pipe.io.DatasetConfig coco_pipe.io.EmbeddingConfig coco_pipe.io.TabularConfig coco_pipe.io.DataContainer coco_pipe.io.SklearnWrapper coco_pipe.io.SpatialWhitener Functions --------- .. autoapisummary:: coco_pipe.io.load_data Package Contents ---------------- .. py:class:: BaseDatasetConfig(/, **data: Any) Bases: :py:obj:`pydantic.BaseModel` !!! abstract "Usage Documentation" [Models](../concepts/models.md) A base class for creating Pydantic models. .. attribute:: __class_vars__ The names of the class variables defined on the model. .. attribute:: __private_attributes__ Metadata about the private attributes of the model. .. attribute:: __signature__ The synthesized `__init__` [`Signature`][inspect.Signature] of the model. .. attribute:: __pydantic_complete__ Whether model building is completed, or if there are still undefined fields. .. attribute:: __pydantic_core_schema__ The core schema of the model. .. attribute:: __pydantic_custom_init__ Whether the model has a custom `__init__` function. .. attribute:: __pydantic_decorators__ Metadata containing the decorators defined on the model. This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1. .. attribute:: __pydantic_generic_metadata__ A dictionary containing metadata about generic Pydantic models. The `origin` and `args` items map to the [`__origin__`][genericalias.__origin__] and [`__args__`][genericalias.__args__] attributes of [generic aliases][types-genericalias], and the `parameter` item maps to the `__parameter__` attribute of generic classes. .. attribute:: __pydantic_parent_namespace__ Parent namespace of the model, used for automatic rebuilding of models. .. attribute:: __pydantic_post_init__ The name of the post-init method for the model, if defined. .. attribute:: __pydantic_root_model__ Whether the model is a [`RootModel`][pydantic.root_model.RootModel]. .. attribute:: __pydantic_serializer__ The `pydantic-core` `SchemaSerializer` used to dump instances of the model. .. attribute:: __pydantic_validator__ The `pydantic-core` `SchemaValidator` used to validate instances of the model. .. attribute:: __pydantic_fields__ A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects. .. attribute:: __pydantic_computed_fields__ A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects. .. attribute:: __pydantic_extra__ A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra] is set to `'allow'`. .. attribute:: __pydantic_fields_set__ The names of fields explicitly set during instantiation. .. attribute:: __pydantic_private__ Values of private attributes set on the model instance. .. py:attribute:: path :type: pathlib.Path :value: None .. py:attribute:: subjects :type: Optional[Union[int, List[Union[str, int]]]] :value: None .. py:class:: BIDSConfig(/, **data: Any) Bases: :py:obj:`BaseDatasetConfig` Configuration for BIDSDataset. .. py:attribute:: mode :type: Literal['bids'] :value: 'bids' .. py:attribute:: task :type: Optional[str] :value: None .. py:attribute:: session :type: Optional[Union[str, List[str]]] :value: None .. py:attribute:: datatype :type: str :value: 'eeg' .. py:attribute:: suffix :type: Optional[str] :value: None .. py:attribute:: loading_mode :type: str :value: None .. py:attribute:: window_length :type: Optional[float] :value: None .. py:attribute:: stride :type: Optional[float] :value: None .. py:class:: DatasetConfig(/, **data: Any) Bases: :py:obj:`pydantic.BaseModel` Master configuration container for IO. .. py:attribute:: dataset :type: Union[TabularConfig, BIDSConfig, EmbeddingConfig] :value: None .. py:class:: EmbeddingConfig(/, **data: Any) Bases: :py:obj:`BaseDatasetConfig` Configuration for EmbeddingDataset. .. py:attribute:: mode :type: Literal['embedding'] :value: 'embedding' .. py:attribute:: pattern :type: str :value: '*.pkl' .. py:attribute:: dims :type: Tuple[str, Ellipsis] :value: ('obs', 'feature') .. py:attribute:: coords :type: Optional[Dict[str, Union[List, Any]]] :value: None .. py:attribute:: task :type: Optional[str] :value: None .. py:attribute:: run :type: Optional[str] :value: None .. py:attribute:: processing :type: Optional[str] :value: None .. py:class:: TabularConfig(/, **data: Any) Bases: :py:obj:`BaseDatasetConfig` Configuration for TabularDataset. .. py:attribute:: mode :type: Literal['tabular'] :value: 'tabular' .. py:attribute:: target_col :type: Optional[str] :value: None .. py:attribute:: index_col :type: Optional[Union[str, int]] :value: None .. py:attribute:: sep :type: str :value: None .. py:attribute:: header :type: Optional[Union[int, List[int]]] :value: 0 .. py:attribute:: sheet_name :type: Union[str, int] :value: 0 .. py:attribute:: columns_to_dims :type: Optional[List[str]] :value: None .. py:attribute:: col_sep :type: str :value: '_' .. py:attribute:: meta_columns :type: Optional[List[str]] :value: None .. py:attribute:: clean :type: bool :value: False .. py:attribute:: clean_kwargs :type: Dict[str, Any] :value: None .. py:attribute:: select_kwargs :type: Dict[str, Any] :value: None .. py:function:: load_data(path: Union[str, pathlib.Path], mode: str = 'auto', target_col: Optional[str] = None, index_col: Optional[Union[str, int]] = None, sep: str = '\t', header: Optional[Union[int, List[int]]] = 0, sheet_name: Optional[Union[str, int]] = 0, columns_to_dims: Optional[List[str]] = None, col_sep: str = '_', meta_columns: Optional[List[str]] = None, clean: bool = False, clean_kwargs: Optional[Dict[str, Any]] = None, task: Optional[str] = None, session: Optional[Union[str, List[str]]] = None, datatype: str = 'eeg', suffix: Optional[str] = None, loading_mode: str = 'epochs', window_length: Optional[float] = None, stride: Optional[float] = None, subject_metadata_df: Optional[Any] = None, subject_key: Optional[str] = None, pattern: str = '*.pkl', dims: Tuple[str, Ellipsis] = ('obs', 'feature'), coords: Optional[Dict[str, Union[List, numpy.ndarray]]] = None, reader: Optional[Any] = None, id_fn: Optional[Any] = None, subjects: Optional[Union[str, List[str], int, List[int]]] = None, **kwargs) -> coco_pipe.io.structures.DataContainer Universal data loader factory. Dispatches to `BIDSDataset`, `TabularDataset`, or `EmbeddingDataset` based on `mode`. :param path: Path to data source (file or directory). :type path: str or Path :param mode: Type of data to load. - "auto": Infers type from file extension or directory structure. - "tabular": uses `TabularDataset` (CSV, TSV, Excel, TXT). - "bids": uses `BIDSDataset` (BIDS-compliant directories). - "embedding": uses `EmbeddingDataset` (NPY, PKL, H5, JSON). :type mode: {"auto", "tabular", "bids", "embedding"}, default="auto" :param Tabular Arguments (mode="tabular"): :param ----------------------------------: :param target_col: Name of the column to extract as target `y`. Removed from features `X`. :type target_col: str, optional :param index_col: Column to use as index (observation IDs). :type index_col: str or int, optional :param sep: Separator for text files (e.g. ',' for CSV). :type sep: str, default='\t' :param header: Row number(s) to use as column names. :type header: int or list of int, default=0 :param sheet_name: Sheet name or index for Excel files. :type sheet_name: str or int, default=0 :param columns_to_dims: If provided, attempts to reshape 2D feature columns into N-D dimensions. Columns must follow: `dim1_dim2_..._feature`. :type columns_to_dims: list of str, optional :param col_sep: Separator used in column names for reshaping. :type col_sep: str, default='_' :param meta_columns: Columns to extract as metadata coordinates instead of features. :type meta_columns: list of str, optional :param clean: Whether to perform automated cleaning (drop NaNs/Infs). :type clean: bool, default=False :param clean_kwargs: Arguments passed to `TabularDataset.clean`. :type clean_kwargs: dict, optional :param BIDS Arguments (mode="bids"): :param ----------------------------: :param task: BIDS task name (e.g., 'rest', 'audiovisual'). :type task: str, optional :param session: Session ID(s) to load. Defaults to all available. :type session: str or List[str], optional :param datatype: Data type folder (e.g., 'eeg', 'meg', 'ieeg'). :type datatype: str, default='eeg' :param suffix: File suffix to load (e.g., 'eeg', 'epo', 'ave'). :type suffix: str, optional :param loading_mode: How to process the data. passed as `mode` to BIDSDataset. - 'epochs': Splices continuous data into fixed-length windows. - 'continuous': Loads as single continuous segments. - 'load_existing': Loads pre-computed epochs. :type loading_mode: str, default='epochs' :param window_length: Window length in seconds (for 'epochs' mode). :type window_length: float, optional :param stride: Stride in seconds (for 'epochs' mode). :type stride: float, optional :param subject_metadata_df: External subject-level metadata to merge by subject during BIDS loading. :type subject_metadata_df: DataFrame, optional :param subject_key: Column in `subject_metadata_df` containing the BIDS subject identifier. :type subject_key: str, optional :param subjects: Specific subject IDs to load (without 'sub-'). :type subjects: str or List[str], optional :param Embedding Arguments (mode="embedding"): :param --------------------------------------: :param pattern: Glob pattern to match files. :type pattern: str, default='*.pkl' :param dims: Dimension labels for the data arrays. :type dims: tuple of str, default=('obs', 'feature') :param coords: Dictionary of coordinates for dimensions. :type coords: dict, optional :param reader: Custom file reader function. :type reader: callable, optional :param id_fn: Custom subject ID extraction function. :type id_fn: callable, optional :param subjects: If int, loads first N subjects. If list, filters by ID. :type subjects: int or list, optional :returns: Standardized data container with attributes: - X: (N_obs, ...) data array - y: Targets (if available) - ids: Observation identifiers - coords: Coordinate metadata :rtype: DataContainer .. py:class:: DataContainer Generic container for N-dimensional neurophysiological data. Acts as a lightweight labelled array (like xarray but simpler), managing dimensions, coordinates, and associated target labels (y) and IDs. .. attribute:: X The primary data tensor. Shape must match `dims`. :type: np.ndarray .. attribute:: dims Labels for each dimension of X. Examples: ('obs', 'feature'), ('obs', 'channel', 'time'). Note: The 'obs' dimension is special and typically represents independent samples. :type: Tuple[str, ...] .. attribute:: coords Coordinates/Labels for dimensions. Keys must be in `dims`. Values must match the length of the corresponding dimension in X. :type: Dict[str, Union[List, np.ndarray]] .. attribute:: y Target labels corresponding to the 'obs' dimension. Used for supervised learning or coloring plots. :type: Optional[np.ndarray], optional .. attribute:: ids Identifiers for observations (e.g., subject IDs, trial names). Should correspond to 'obs' dim in coords if provided. Kept separate from coords for convenient tracking. :type: Optional[np.ndarray], optional .. attribute:: meta Arbitrary metadata (sfreq, units, source path, etc). :type: Dict[str, Any] .. rubric:: Examples Accessing data: >>> container.X.shape (10, 64, 500) Accessing coordinates: >>> container.coords['channel'][:3] ['Fz', 'Cz', 'Pz'] .. py:attribute:: X :type: numpy.ndarray .. py:attribute:: dims :type: Tuple[str, Ellipsis] .. py:attribute:: coords :type: Dict[str, Union[List, numpy.ndarray, Sequence]] .. py:attribute:: y :type: Optional[numpy.ndarray] :value: None .. py:attribute:: ids :type: Optional[numpy.ndarray] :value: None .. py:attribute:: meta :type: Dict[str, Any] .. py:method:: __post_init__() .. py:property:: shape :type: Tuple[int, Ellipsis] .. py:method:: save(path: Union[str, Any]) -> None Save the DataContainer to disk using joblib. :param path: Destination file path. :type path: str or Path .. py:method:: load(path: Union[str, Any]) -> DataContainer :classmethod: Load a DataContainer from disk. :param path: Source file path. :type path: str or Path :rtype: DataContainer .. py:method:: __repr__() -> str .. py:method:: obs_table(include_ids: bool = False, id_col: str = 'obs_id', include_y: bool = False, y_col: str = 'y', include_obs_coord: bool = False) -> pandas.DataFrame Return one-dimensional coordinates aligned to the observation axis. This helper is useful when exporting a row-wise table from a container. It only materializes metadata that can map cleanly to one row per observation, skipping coordinates that belong to other axes such as ``channel``, ``time``, ``feature``, or ``stat``. :param include_ids: If True, include ``self.ids`` as the first column. :type include_ids: bool, default=False :param id_col: Column name used when exporting ``self.ids``. :type id_col: str, default="obs_id" :param include_y: If True, include ``self.y`` as a column when present. :type include_y: bool, default=False :param y_col: Column name used when exporting ``self.y``. :type y_col: str, default="y" :param include_obs_coord: If True, include ``coords["obs"]`` when present. :type include_obs_coord: bool, default=False :returns: DataFrame containing only one-dimensional observation-aligned metadata columns. :rtype: pandas.DataFrame :raises ValueError: If the container has no ``obs`` dimension, or if ``include_ids`` is requested when ``self.ids`` is missing. .. py:method:: isel(**indexers) -> DataContainer Select data by integer indices on specified dimensions. This method is the integer-index equivalent of `select`. It operates directly on the dimensions of the data tensor `X`. It is robust and handles metadata splitting/alignment automatically. :param \*\*indexers: Key: Dimension name (e.g., 'obs', 'channel', 'time'). Value: Integer indices to select. Can be: - List or numpy array of integers: [0, 1, 5] - Slice object: slice(0, 10) - Single integer: 0 Note: If you provide a list of indices with repeats (e.g., [0, 0, 1]), the output will be oversampled accordingly. :type \*\*indexers: dict :returns: A new DataContainer instance with the sliced data and coordinates. :rtype: DataContainer .. rubric:: Examples >>> # Select first 10 observations >>> subset = container.isel(obs=slice(0, 10)) >>> # Select specific channels by index >>> subset = container.isel(channel=[0, 5, 12]) >>> # Select time range by index >>> subset = container.isel(time=slice(100, 200)) >>> # Bootstrap/Resample (Select index 0 five times) >>> bootstrap = container.isel(obs=[0, 0, 0, 0, 0]) .. py:method:: balance(target: str = 'y', strategy: str = 'undersample', covariates: Optional[List[str]] = None, random_state: int = 42, **kwargs) -> DataContainer Balance the dataset classes using undersampling or oversampling. This method adjusts the number of observations (rows) in the container so that class counts in `target` are equalized. It supports simple random sampling and stratified sampling based on covariates. :param target: Name of the target variable. - 'y': Uses `self.y`. - Any other string: Looks for the variable in `self.coords`. :type target: str, default='y' :param strategy: - 'undersample': Downsample majority classes to match the minority class count. - 'oversample': Upsample minority classes (with replacement) to match the majority class. - 'auto': Heuristic choice. Uses undersampling if total size remains > 50% of original, else oversampling. :type strategy: {'undersample', 'oversample', 'auto'}, default='undersample' :param covariates: List of covariate names in `self.coords` to preserve distribution of. If provided, the balancing is performed *within* strata defined by these covariates. :type covariates: list of str, optional :param random_state: Seed for the random number generator. Change this value to produce different random subsets (e.g., for bagging). :type random_state: int, default=42 :param \*\*kwargs: Additional arguments passed to internal logic: - n_bins (int): Number of bins for continuous covariates (default 5). - binning (str): 'quantile' (default) or 'uniform' binning. - prefer_clean_rows (bool): If True, weighs sampling to prefer rows with fewer NaNs/artifacts. :type \*\*kwargs: dict :returns: A new DataContainer instance with balanced classes. :rtype: DataContainer .. rubric:: Examples >>> # 1. Simple Undersampling of 'y' >>> balanced = container.balance(strategy='undersample') >>> # 2. Balance based on a metadata column 'condition' >>> balanced = container.balance(target='condition') >>> # 3. Stratified Balancing (Balance 'y' while preserving 'sex' and 'age' >>> # ratios) >>> balanced = container.balance(target='y', covariates=['sex', 'age']) >>> # 4. Iterative Bootstrapping (Different seeds) >>> for seed in [1, 2, 3]: ... subset = container.balance(strategy='undersample', random_state=seed) ... # process subset... .. py:method:: select(ignore_case: bool = False, fuzzy: bool = False, **selections) -> DataContainer Select data subsets based on coordinates, ids, or y. This method supports exact matching, wildcard matching, operator-based filtering, and custom callable filters. :param ignore_case: If True, string matching is case-insensitive (e.g., 'fz' matches 'Fz'). :type ignore_case: bool, default=False :param fuzzy: If True, uses `difflib` to find closest matches for string queries (e.g., 'Alpha' matches 'alpha'). Useful for handling typos. :type fuzzy: bool, default=False :param \*\*selections: Key is the dimension name (or special keys 'y', 'ids'). Value is the query. Supported query types: 1. **List/Array (Exact or Wildcard)**: Matches values present in the list. Strings can use shell-style wildcards ('*', '?'). 2. **Dictionary (Operator Queries)**: Filters numerical or string values using operators. Keys: '>', '<', '>=', '<=', '==', '!=', 'in'. 3. **Callable**: A function taking the coordinate array and returning a boolean mask. :type \*\*selections: dict :returns: A new DataContainer instance containing the selected subset. :rtype: DataContainer .. rubric:: Examples >>> # 1. Exact Selection (Sensors) >>> sub = container.select(channel=['Fz', 'Cz']) >>> # 2. Wildcard Selection (All Alpha features) >>> sub = container.select(feature='*alpha*') >>> # 3. Range Selection (Time) >>> sub = container.select(time={'>=': 0.1, '<': 0.5}) >>> # 4. Case-Insensitive Fuzzy Matching >>> sub = container.select(channel=['fz'], ignore_case=True) >>> # 5. Filter by Target (y) >>> sub = container.select(y=['Patient']) >>> # 6. Complex Logic (Subjects 1-5 via Operator) >>> sub = container.select(subject_id={'>=': 1, '<=': 5}) >>> # 7. Stratified Selection (First 2 epochs per subject via Callable) >>> def first_n(ids, n=2): ... # ... logic ... ... return mask >>> sub = container.select(ids=first_n) .. py:method:: flatten(preserve: Union[str, List[str]] = 'obs') -> DataContainer Flatten dimensions NOT in `preserve` into a single 'feature' dimension. This is useful for preparing N-dimensional data for standard 2D machine learning algorithms (scikit-learn). It automatically generates composite feature names (e.g., 'Fz_0.1s') for tracking. :param preserve: Dimensions to keep. All other dimensions will be collapsed into a single 'feature' dimension. - 'obs': Result shape (N_obs, N_features). Standard specifiction. - ['obs', 'time']: Result shape (N_obs, N_time, N_features). Useful for time-resolved decoding distributions. :type preserve: str or List[str], default='obs' :returns: A new DataContainer with reshaped X and generated 'feature' coordinates. :rtype: DataContainer .. rubric:: Examples >>> # Flatten (10, 64, 500) -> (10, 32000) >>> flat = container.flatten(preserve='obs') >>> flat.shape (10, 32000) >>> flat.coords['feature'][0] 'Fz_0.0' >>> # Flatten spatial only, keep time (10, 64, 500) -> (10, 500, 64) >>> time_resolved = container.flatten(preserve=['obs', 'time']) .. py:method:: stack(dims: Sequence[str], new_dim: str = 'obs') -> DataContainer Stack multiple dimensions into a single new dimension. This reshapes N-dimensional data into (N-K) dimensions by combining specified dimensions. It is useful for transforming spatiotemporal data (Trials, Channels, Time) -> (Trials*Time, Channels) for trajectory analysis. :param dims: Dimensions to stack. The order determines the nesting (slowest to fastest). e.g., ('obs', 'time') means 'obs' changes slowly, 'time' cycles fast. :type dims: sequence of str :param new_dim: Name of the resulting stacked dimension. :type new_dim: str, default='obs' :returns: New container with stacked dimension. Metadata (coords/ids) are expanded/tiled to match the new shape. :rtype: DataContainer .. rubric:: Examples >>> # Stack time into observations: >>> # (10 obs, 64 ch, 500 time) -> (5000 obs, 64 ch) >>> stacked = container.stack(dims=('obs', 'time'), new_dim='obs') >>> stacked.shape (5000, 64) .. py:method:: unstack(dim: str) -> DataContainer Unstack a dimension into multiple dimensions. Inverse operation of `stack`. Reshapes the data tensor by splitting one dimension into multiple using metadata stored during the `stack` operation. :param dim: Dimension to unstack (e.g. 'obs'). :type dim: str :returns: New container with unstacked dimensions. :rtype: DataContainer :raises ValueError: If the container was not previously stacked (missing metadata). .. rubric:: Examples >>> # Stack 'trials' and 'time' -> 'obs' >>> stacked = container.stack(('trials', 'time'), new_dim='obs') >>> # Unstack 'obs' -> ('trials', 'time') (automatically inferred) >>> unstacked = stacked.unstack('obs') .. py:method:: center(dim: str = 'time', inplace: bool = False) -> DataContainer Remove mean along a specified dimension (Centering/Baseline Correction). This operation computes the mean along `dim` (ignoring NaNs) and subtracts it. Commonly used in EEG for baseline correction (subtracting mean of pre-stimulus interval) or centering features before covariance calculation. :param dim: Dimension name to center over (e.g., 'time', 'channel', 'obs'). :type dim: str, default='time' :param inplace: If True, modifies X in-place to save memory. Returns self. :type inplace: bool, default=False :returns: Container with centered data. :rtype: DataContainer .. rubric:: Examples >>> # Baseline correction over time >>> container.center(dim='time') .. py:method:: zscore(dim: str = 'time', eps: float = 1e-08, inplace: bool = False) -> DataContainer Standardize (Z-score) along a specified dimension. Computes `(X - mean) / std` along the given dimension. Robust to NaNs. Useful for normalizing features or standardizing temporal dynamics. :param dim: Dimension to standardize. :type dim: str :param eps: Stability epsilon to avoid division by zero. :type eps: float :param inplace: :type inplace: bool :rtype: DataContainer .. rubric:: Examples >>> # Standardize each channel's timecourse >>> container.zscore(dim='time') .. py:method:: rms_scale(dim: str = 'time', eps: float = 1e-08, inplace: bool = False) -> DataContainer Scale by Root Mean Square (RMS) amplitude along a dimension. Divides data by `sqrt(mean(X**2))` along the dimension. Preserves relative shape but normalizes energy. :param dim: Dimension to scale. :type dim: str :param eps: Stability epsilon. :type eps: float :param inplace: :type inplace: bool :rtype: DataContainer .. py:method:: baseline_correction(dim: str = 'time', inplace: bool = False) -> DataContainer Alias for center(). Common in EEG. .. py:method:: aggregate(by: Union[str, numpy.ndarray, List[Any]], stats: Union[str, Sequence[str]] = 'mean', min_count: int = 1, on_insufficient: str = 'raise') -> DataContainer Aggregate observations into grouped summaries along the ``obs`` axis. :param by: Group definition for the observation axis. - If str: resolve the key from ``self.coords`` or from ``self.y`` when ``by == "y"``. - If array-like: explicit group labels aligned with ``obs``. :type by: str or array-like :param stats: Aggregation statistic or ordered list of statistics. Supported tokens are ``"mean"``, ``"median"``, ``"std"``, ``"var"``, ``"sem"``, ``"mad"``, ``"iqr"``, ``"min"``, ``"max"``, ``"count"``, and ``"first"``. Legacy ``"obs-*"`` aliases are accepted and normalized. :type stats: str or sequence of str, default="mean" :param min_count: Minimum number of valid observations required per group. A valid observation is one with at least one finite value across the non-observation axes. :type min_count: int, default=1 :param on_insufficient: Policy applied when a group has fewer than ``min_count`` valid observations. :type on_insufficient: {"raise", "warn", "collect"}, default="raise" :returns: Aggregated container with grouped observations on the ``obs`` axis. When multiple stats are requested, a ``stat`` dimension is inserted immediately after ``obs``. :rtype: DataContainer :raises ValueError: If the container has no ``obs`` dimension, grouping is invalid, requested stats are unsupported, or ``min_count`` / ``on_insufficient`` are invalid. .. py:method:: aggregate_groups(by: Union[str, numpy.ndarray, List[Any]], groups: Sequence[Dict[str, Any]], min_count: int = 1, on_insufficient: str = 'raise', skip_empty: bool = True) -> DataContainer Aggregate selected feature groups with different statistics. This is a thin wrapper around :meth:`aggregate` for tabular feature containers. Each group spec selects a subset of feature columns and applies one or more stats to that subset. The outputs are concatenated along the ``feature`` dimension, and each resulting feature name is prefixed with its stat (for example ``"mean_band_log_abs_alpha"``). :param by: Group definition for the observation axis. Passed through to :meth:`aggregate`. :type by: str or array-like :param groups: Ordered group specifications. Each group must provide ``"stats"`` and may optionally provide include/exclude selectors: - ``names`` / ``exclude_names`` - ``prefixes`` / ``exclude_prefixes`` - ``suffixes`` / ``exclude_suffixes`` - ``contains`` / ``exclude_contains`` - ``regex`` / ``exclude_regex`` If a group provides no include selectors, it starts from all features and then applies exclusions. :type groups: sequence of dict :param min_count: Minimum number of valid observations required per group. Passed through to :meth:`aggregate`. :type min_count: int, default=1 :param on_insufficient: Policy applied when a group has fewer than ``min_count`` valid observations. Passed through to :meth:`aggregate`. :type on_insufficient: {"raise", "warn", "collect"}, default="raise" :param skip_empty: If True, silently skip group specs that match no features. If False, raise a ``ValueError`` when a group matches nothing. :type skip_empty: bool, default=True :returns: Aggregated container with dims ``("obs", "feature")`` and stat-prefixed feature names. :rtype: DataContainer :raises ValueError: If the container lacks a ``feature`` dimension or coord, no groups are provided, a group spec is invalid, multiple groups would emit the same output feature name, or no non-empty grouped outputs are produced. .. py:class:: SklearnWrapper(transformer: sklearn.base.BaseEstimator) Bases: :py:obj:`sklearn.base.BaseEstimator`, :py:obj:`sklearn.base.TransformerMixin` Generic wrapper for ANY scikit-learn transformer (Scaler, PCA, etc.). This wrapper applies a standard scikit-learn transformer to the `.X` data matrix of a `DataContainer`, ensuring that the resulting container has correctly updated data while checking for dimension compatibility. :param transformer: An instantiated scikit-learn transformer (e.g., `StandardScaler()`, `PCA(n_components=10)`). :type transformer: BaseEstimator .. attribute:: estimator_ The fitted scikit-learn estimator. :type: BaseEstimator .. rubric:: Examples >>> from sklearn.preprocessing import RobustScaler >>> from coco_pipe.io import DataContainer, SklearnWrapper >>> import numpy as np >>> # Create formatted data (100 obs, 10 features) >>> X = np.random.randn(100, 10) >>> container = DataContainer(X, dims=('obs', 'feature')) >>> # Wrap a Scaler >>> scaler = SklearnWrapper(RobustScaler()) >>> scaled_container = scaler.fit_transform(container) >>> # Metadata is preserved >>> scaled_container.dims == container.dims True .. py:attribute:: transformer .. py:attribute:: estimator_ :value: None .. py:method:: fit(container: coco_pipe.io.structures.DataContainer, y=None) .. py:method:: transform(container: coco_pipe.io.structures.DataContainer) -> coco_pipe.io.structures.DataContainer .. py:method:: fit_transform(container: coco_pipe.io.structures.DataContainer, y=None) Fit to data, then transform it. Fits transformer to `X` and `y` with optional parameters `fit_params` and returns a transformed version of `X`. :param X: Input samples. :type X: array-like of shape (n_samples, n_features) :param y: Target values (None for unsupervised transformations). :type y: array-like of shape (n_samples,) or (n_samples, n_outputs), default=None :param \*\*fit_params: Additional fit parameters. :type \*\*fit_params: dict :returns: **X_new** -- Transformed array. :rtype: ndarray array of shape (n_samples, n_features_new) .. py:method:: inverse_transform(container: coco_pipe.io.structures.DataContainer) -> coco_pipe.io.structures.DataContainer .. py:class:: SpatialWhitener(method: str = 'pca', n_components: Optional[Union[int, float]] = None) Bases: :py:obj:`sklearn.base.BaseEstimator`, :py:obj:`sklearn.base.TransformerMixin` M/EEG Spatial Whitening using Covariance Decorrelation. This transformer removes spatial correlations between channels, effectively transforming the noise covariance matrix towards the identity matrix. It supports standard PCA, ZCA (Zero-phase Component Analysis which preserves topography), and robust shrinkage covariance estimation (OAS). It requires a dimension named 'channel' in the input `DataContainer`. The operation is performed spatially: :math:`X_{white} = X \cdot W^T` :param method: Shape of the transformation: - 'pca': Principal Component Analysis. Rotates data to principal axes and scales to unit variance. - 'zca': Zero-phase Component Analysis. Rotates, scales, and rotates back. Preserves spatial topography (sensors stay in place). - 'shrinkage': Uses Oracle Approximating Shrinkage (OAS) for robust covariance estimation in high dimensions. :type method: {'pca', 'zca', 'shrinkage'}, default='pca' :param n_components: Number of components to keep (only for 'pca'/'zca' methods). If None, all matches are kept. :type n_components: int or float, optional .. attribute:: whitener_ The estimated whitening matrix (W). Shape (n_components, n_channels). :type: np.ndarray .. attribute:: mean_ Per-channel mean vector. :type: np.ndarray .. attribute:: inverse_whitener_ The inverse matrix used to project back to sensor space. :type: np.ndarray .. rubric:: Examples >>> # Whitening EEG epochs (100 epochs, 64 channels, 500 times) >>> container = DataContainer( ... np.random.randn(100, 64, 500), dims=('obs', 'channel', 'time') ... ) >>> # Use Shrinkage for robust covariance >>> whitener = SpatialWhitener(method='shrinkage') >>> white_data = whitener.fit_transform(container) >>> # Project back to sensor space for plotting >>> sensor_data = whitener.inverse_transform(white_data) .. py:attribute:: method :value: 'pca' .. py:attribute:: n_components :value: None .. py:attribute:: whitener_ :value: None .. py:attribute:: mean_ :value: None .. py:attribute:: inverse_whitener_ :value: None .. py:method:: fit(container: coco_pipe.io.structures.DataContainer, y=None) .. py:method:: _fit_pca(X_flat: numpy.ndarray) .. py:method:: _fit_shrinkage(X_flat: numpy.ndarray) .. py:method:: transform(container: coco_pipe.io.structures.DataContainer) -> coco_pipe.io.structures.DataContainer .. py:method:: fit_transform(container: coco_pipe.io.structures.DataContainer, y=None) Fit to data, then transform it. Fits transformer to `X` and `y` with optional parameters `fit_params` and returns a transformed version of `X`. :param X: Input samples. :type X: array-like of shape (n_samples, n_features) :param y: Target values (None for unsupervised transformations). :type y: array-like of shape (n_samples,) or (n_samples, n_outputs), default=None :param \*\*fit_params: Additional fit parameters. :type \*\*fit_params: dict :returns: **X_new** -- Transformed array. :rtype: ndarray array of shape (n_samples, n_features_new) .. py:method:: inverse_transform(container: coco_pipe.io.structures.DataContainer) -> coco_pipe.io.structures.DataContainer .. py:method:: _apply_linear_op(container: coco_pipe.io.structures.DataContainer, W: numpy.ndarray, mean: Optional[numpy.ndarray]) -> numpy.ndarray