coco_pipe.io.load¶
High-level data loading factory.
Author: Hamza Abdelhedi <hamza.abdelhedi@umontreal.ca>
Functions¶
|
Universal data loader factory. |
Module Contents¶
- coco_pipe.io.load.load_data(path: str | pathlib.Path, mode: str = 'auto', target_col: str | None = None, index_col: str | int | None = None, sep: str = '\t', header: int | List[int] | None = 0, sheet_name: str | int | None = 0, columns_to_dims: List[str] | None = None, col_sep: str = '_', meta_columns: List[str] | None = None, clean: bool = False, clean_kwargs: Dict[str, Any] | None = None, task: str | None = None, session: str | List[str] | None = None, datatype: str = 'eeg', suffix: str | None = None, loading_mode: str = 'epochs', window_length: float | None = None, stride: float | None = None, subject_metadata_df: Any | None = None, subject_key: str | None = None, pattern: str = '*.pkl', dims: Tuple[str, Ellipsis] = ('obs', 'feature'), coords: Dict[str, List | numpy.ndarray] | None = None, reader: Any | None = None, id_fn: Any | None = None, subjects: str | List[str] | int | List[int] | None = None, **kwargs) coco_pipe.io.structures.DataContainer[source]¶
Universal data loader factory. Dispatches to BIDSDataset, TabularDataset, or EmbeddingDataset based on mode.
- Parameters:
path (str or Path) – Path to data source (file or directory).
mode ({"auto", "tabular", "bids", "embedding"}, default="auto") – Type of data to load. - “auto”: Infers type from file extension or directory structure. - “tabular”: uses TabularDataset (CSV, TSV, Excel, TXT). - “bids”: uses BIDSDataset (BIDS-compliant directories). - “embedding”: uses EmbeddingDataset (NPY, PKL, H5, JSON).
(mode="tabular") (Tabular Arguments)
----------------------------------
target_col (str, optional) – Name of the column to extract as target y. Removed from features X.
index_col (str or int, optional) – Column to use as index (observation IDs).
sep (str, default='t') – Separator for text files (e.g. ‘,’ for CSV).
header (int or list of int, default=0) – Row number(s) to use as column names.
sheet_name (str or int, default=0) – Sheet name or index for Excel files.
columns_to_dims (list of str, optional) – If provided, attempts to reshape 2D feature columns into N-D dimensions. Columns must follow: dim1_dim2_…_feature.
col_sep (str, default='_') – Separator used in column names for reshaping.
meta_columns (list of str, optional) – Columns to extract as metadata coordinates instead of features.
clean (bool, default=False) – Whether to perform automated cleaning (drop NaNs/Infs).
clean_kwargs (dict, optional) – Arguments passed to TabularDataset.clean.
(mode="bids") (BIDS Arguments)
----------------------------
task (str, optional) – BIDS task name (e.g., ‘rest’, ‘audiovisual’).
session (str or List[str], optional) – Session ID(s) to load. Defaults to all available.
datatype (str, default='eeg') – Data type folder (e.g., ‘eeg’, ‘meg’, ‘ieeg’).
suffix (str, optional) – File suffix to load (e.g., ‘eeg’, ‘epo’, ‘ave’).
loading_mode (str, default='epochs') – How to process the data. passed as mode to BIDSDataset. - ‘epochs’: Splices continuous data into fixed-length windows. - ‘continuous’: Loads as single continuous segments. - ‘load_existing’: Loads pre-computed epochs.
window_length (float, optional) – Window length in seconds (for ‘epochs’ mode).
stride (float, optional) – Stride in seconds (for ‘epochs’ mode).
subject_metadata_df (DataFrame, optional) – External subject-level metadata to merge by subject during BIDS loading.
subject_key (str, optional) – Column in subject_metadata_df containing the BIDS subject identifier.
subjects (int or list, optional) – Specific subject IDs to load (without ‘sub-‘).
(mode="embedding") (Embedding Arguments)
--------------------------------------
pattern (str, default=’*.pkl’) – Glob pattern to match files.
dims (tuple of str, default=('obs', 'feature')) – Dimension labels for the data arrays.
coords (dict, optional) – Dictionary of coordinates for dimensions.
reader (callable, optional) – Custom file reader function.
id_fn (callable, optional) – Custom subject ID extraction function.
subjects – If int, loads first N subjects. If list, filters by ID.
- Returns:
Standardized data container with attributes: - X: (N_obs, …) data array - y: Targets (if available) - ids: Observation identifiers - coords: Coordinate metadata
- Return type: