coco_pipe.io.load
=================

.. py:module:: coco_pipe.io.load

.. autoapi-nested-parse::

   coco_pipe/io/load.py
   --------------------
   High-level data loading factory.

   Author: Hamza Abdelhedi <hamza.abdelhedi@umontreal.ca>


Functions
---------

.. autoapisummary::

   coco_pipe.io.load.load_data


Module Contents
---------------

.. py:function:: load_data(path: Union[str, pathlib.Path], mode: str = 'auto', target_col: Optional[str] = None, index_col: Optional[Union[str, int]] = None, sep: str = '\t', header: Optional[Union[int, List[int]]] = 0, sheet_name: Optional[Union[str, int]] = 0, columns_to_dims: Optional[List[str]] = None, col_sep: str = '_', meta_columns: Optional[List[str]] = None, clean: bool = False, clean_kwargs: Optional[Dict[str, Any]] = None, task: Optional[str] = None, session: Optional[Union[str, List[str]]] = None, datatype: str = 'eeg', suffix: Optional[str] = None, loading_mode: str = 'epochs', window_length: Optional[float] = None, stride: Optional[float] = None, subject_metadata_df: Optional[Any] = None, subject_key: Optional[str] = None, pattern: str = '*.pkl', dims: Tuple[str, Ellipsis] = ('obs', 'feature'), coords: Optional[Dict[str, Union[List, numpy.ndarray]]] = None, reader: Optional[Any] = None, id_fn: Optional[Any] = None, subjects: Optional[Union[str, List[str], int, List[int]]] = None, **kwargs) -> coco_pipe.io.structures.DataContainer

   Universal data loader factory.
   Dispatches to `BIDSDataset`, `TabularDataset`, or `EmbeddingDataset` based on
   `mode`.

   :param path: Path to data source (file or directory).
   :type path: str or Path
   :param mode: Type of data to load.
                - "auto": Infers type from file extension or directory structure.
                - "tabular": uses `TabularDataset` (CSV, TSV, Excel, TXT).
                - "bids": uses `BIDSDataset` (BIDS-compliant directories).
                - "embedding": uses `EmbeddingDataset` (NPY, PKL, H5, JSON).
   :type mode: {"auto", "tabular", "bids", "embedding"}, default="auto"
   :param Tabular Arguments (mode="tabular"):
   :param ----------------------------------:
   :param target_col: Name of the column to extract as target `y`. Removed from features `X`.
   :type target_col: str, optional
   :param index_col: Column to use as index (observation IDs).
   :type index_col: str or int, optional
   :param sep: Separator for text files (e.g. ',' for CSV).
   :type sep: str, default='\t'
   :param header: Row number(s) to use as column names.
   :type header: int or list of int, default=0
   :param sheet_name: Sheet name or index for Excel files.
   :type sheet_name: str or int, default=0
   :param columns_to_dims: If provided, attempts to reshape 2D feature columns into N-D dimensions.
                           Columns must follow: `dim1_dim2_..._feature`.
   :type columns_to_dims: list of str, optional
   :param col_sep: Separator used in column names for reshaping.
   :type col_sep: str, default='_'
   :param meta_columns: Columns to extract as metadata coordinates instead of features.
   :type meta_columns: list of str, optional
   :param clean: Whether to perform automated cleaning (drop NaNs/Infs).
   :type clean: bool, default=False
   :param clean_kwargs: Arguments passed to `TabularDataset.clean`.
   :type clean_kwargs: dict, optional
   :param BIDS Arguments (mode="bids"):
   :param ----------------------------:
   :param task: BIDS task name (e.g., 'rest', 'audiovisual').
   :type task: str, optional
   :param session: Session ID(s) to load. Defaults to all available.
   :type session: str or List[str], optional
   :param datatype: Data type folder (e.g., 'eeg', 'meg', 'ieeg').
   :type datatype: str, default='eeg'
   :param suffix: File suffix to load (e.g., 'eeg', 'epo', 'ave').
   :type suffix: str, optional
   :param loading_mode: How to process the data. passed as `mode` to BIDSDataset.
                        - 'epochs': Splices continuous data into fixed-length windows.
                        - 'continuous': Loads as single continuous segments.
                        - 'load_existing': Loads pre-computed epochs.
   :type loading_mode: str, default='epochs'
   :param window_length: Window length in seconds (for 'epochs' mode).
   :type window_length: float, optional
   :param stride: Stride in seconds (for 'epochs' mode).
   :type stride: float, optional
   :param subject_metadata_df: External subject-level metadata to merge by subject during BIDS loading.
   :type subject_metadata_df: DataFrame, optional
   :param subject_key: Column in `subject_metadata_df` containing the BIDS subject identifier.
   :type subject_key: str, optional
   :param subjects: Specific subject IDs to load (without 'sub-').
   :type subjects: str or List[str], optional
   :param Embedding Arguments (mode="embedding"):
   :param --------------------------------------:
   :param pattern: Glob pattern to match files.
   :type pattern: str, default='*.pkl'
   :param dims: Dimension labels for the data arrays.
   :type dims: tuple of str, default=('obs', 'feature')
   :param coords: Dictionary of coordinates for dimensions.
   :type coords: dict, optional
   :param reader: Custom file reader function.
   :type reader: callable, optional
   :param id_fn: Custom subject ID extraction function.
   :type id_fn: callable, optional
   :param subjects: If int, loads first N subjects. If list, filters by ID.
   :type subjects: int or list, optional

   :returns: Standardized data container with attributes:
             - X: (N_obs, ...) data array
             - y: Targets (if available)
             - ids: Observation identifiers
             - coords: Coordinate metadata
   :rtype: DataContainer