coco_pipe.io.load

High-level data loading factory.

Author: Hamza Abdelhedi <hamza.abdelhedi@umontreal.ca>

Functions

load_data(, coords, Union[List, ...)

Universal data loader factory.

Module Contents

coco_pipe.io.load.load_data(path: str | pathlib.Path, mode: str = 'auto', target_col: str | None = None, index_col: str | int | None = None, sep: str = '\t', header: int | List[int] | None = 0, sheet_name: str | int | None = 0, columns_to_dims: List[str] | None = None, col_sep: str = '_', meta_columns: List[str] | None = None, clean: bool = False, clean_kwargs: Dict[str, Any] | None = None, task: str | None = None, session: str | List[str] | None = None, datatype: str = 'eeg', suffix: str | None = None, loading_mode: str = 'epochs', window_length: float | None = None, stride: float | None = None, subject_metadata_df: Any | None = None, subject_key: str | None = None, pattern: str = '*.pkl', dims: Tuple[str, Ellipsis] = ('obs', 'feature'), coords: Dict[str, List | numpy.ndarray] | None = None, reader: Any | None = None, id_fn: Any | None = None, subjects: str | List[str] | int | List[int] | None = None, **kwargs) coco_pipe.io.structures.DataContainer[source]

Universal data loader factory. Dispatches to BIDSDataset, TabularDataset, or EmbeddingDataset based on mode.

Parameters:
  • path (str or Path) – Path to data source (file or directory).

  • mode ({"auto", "tabular", "bids", "embedding"}, default="auto") – Type of data to load. - “auto”: Infers type from file extension or directory structure. - “tabular”: uses TabularDataset (CSV, TSV, Excel, TXT). - “bids”: uses BIDSDataset (BIDS-compliant directories). - “embedding”: uses EmbeddingDataset (NPY, PKL, H5, JSON).

  • (mode="tabular") (Tabular Arguments)

  • ----------------------------------

  • target_col (str, optional) – Name of the column to extract as target y. Removed from features X.

  • index_col (str or int, optional) – Column to use as index (observation IDs).

  • sep (str, default='t') – Separator for text files (e.g. ‘,’ for CSV).

  • header (int or list of int, default=0) – Row number(s) to use as column names.

  • sheet_name (str or int, default=0) – Sheet name or index for Excel files.

  • columns_to_dims (list of str, optional) – If provided, attempts to reshape 2D feature columns into N-D dimensions. Columns must follow: dim1_dim2_…_feature.

  • col_sep (str, default='_') – Separator used in column names for reshaping.

  • meta_columns (list of str, optional) – Columns to extract as metadata coordinates instead of features.

  • clean (bool, default=False) – Whether to perform automated cleaning (drop NaNs/Infs).

  • clean_kwargs (dict, optional) – Arguments passed to TabularDataset.clean.

  • (mode="bids") (BIDS Arguments)

  • ----------------------------

  • task (str, optional) – BIDS task name (e.g., ‘rest’, ‘audiovisual’).

  • session (str or List[str], optional) – Session ID(s) to load. Defaults to all available.

  • datatype (str, default='eeg') – Data type folder (e.g., ‘eeg’, ‘meg’, ‘ieeg’).

  • suffix (str, optional) – File suffix to load (e.g., ‘eeg’, ‘epo’, ‘ave’).

  • loading_mode (str, default='epochs') – How to process the data. passed as mode to BIDSDataset. - ‘epochs’: Splices continuous data into fixed-length windows. - ‘continuous’: Loads as single continuous segments. - ‘load_existing’: Loads pre-computed epochs.

  • window_length (float, optional) – Window length in seconds (for ‘epochs’ mode).

  • stride (float, optional) – Stride in seconds (for ‘epochs’ mode).

  • subject_metadata_df (DataFrame, optional) – External subject-level metadata to merge by subject during BIDS loading.

  • subject_key (str, optional) – Column in subject_metadata_df containing the BIDS subject identifier.

  • subjects (int or list, optional) – Specific subject IDs to load (without ‘sub-‘).

  • (mode="embedding") (Embedding Arguments)

  • --------------------------------------

  • pattern (str, default=’*.pkl’) – Glob pattern to match files.

  • dims (tuple of str, default=('obs', 'feature')) – Dimension labels for the data arrays.

  • coords (dict, optional) – Dictionary of coordinates for dimensions.

  • reader (callable, optional) – Custom file reader function.

  • id_fn (callable, optional) – Custom subject ID extraction function.

  • subjects – If int, loads first N subjects. If list, filters by ID.

Returns:

Standardized data container with attributes: - X: (N_obs, …) data array - y: Targets (if available) - ids: Observation identifiers - coords: Coordinate metadata

Return type:

DataContainer