CoCo Pipe¶

Codecov

CoCo Pipe is a comprehensive Python framework designed for advanced processing and analysis of bio M/EEG data. It seamlessly integrates traditional machine learning, deep learning, and signal processing techniques into a unified pipeline architecture. Key features include:

Flexible Data Processing: Support for various data formats (tabular, M/EEG, embeddings) with automated preprocessing and feature extraction
Advanced ML Capabilities: Integrated classification and regression pipelines with automated feature selection and hyperparameter optimization
Modular Design: Easy-to-extend architecture for adding custom processing steps, models, and analysis methods
Experiment Management: Built-in tools for experiment configuration, reproducibility, and results tracking
Visualization & Reporting: Comprehensive visualization tools and automated report generation for both signal processing and ML results
Scientific Workflow: End-to-end support for neuroimaging research, from raw data processing to publication-ready results

Whether you’re conducting clinical research, developing ML models for brain-computer interfaces, or exploring neural signal patterns, CoCo Pipe provides the tools and flexibility to streamline your workflow.

Installation¶

Clone the Repository:

git clone https://github.com/BabaSanfour/coco-pipe.git
cd coco-pipe

(Optional) Create and Activate a Virtual Environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the Package:
```
pip install -e .
```
Note: This will install all runtime dependencies. for development dependencies, use pip install -e .[dev].

For detailed development instructions, please see CONTRIBUTING.md.

Using the ML Module¶

CoCo Pipe provides two main ways to use the ML module:

1. Direct Python API Usage¶

You can use the ML module directly in your Python scripts by importing from coco_pipe.io for data loading and coco_pipe.ml for machine learning pipelines:

from coco_pipe.io import load_data
from coco_pipe.ml import MLPipeline

# Load your data into the canonical package container
container = load_data(
    "data/your_dataset.csv",
    mode="tabular",
    target_col="target_class",
    sep=",",
)

# Select a subset explicitly from the container when needed
container = container.select(feature=["feat1", "feat2"], y=["case", "control"])
X = container.X
y = container.y

# Configure and run ML pipeline
config = {
    "task": "classification",  # or 'regression'
    "analysis_type": "baseline",  # Options: 'baseline', 'feature_selection', 'hp_search', 'hp_search_fs'
    "models": "all",  # or list of specific models
    "metrics": ["accuracy", "f1-score"],
    "cv_strategy": "stratified",
    "n_splits": 5,
    "n_features": 10,  # For feature selection
    "direction": "forward",  # For feature selection
    "search_type": "grid",  # For hyperparameter search
    "n_iter": 100,  # For random search
    "scoring": "accuracy",
    "n_jobs": -1
}

pipeline = MLPipeline(X=X, y=y, config=config)
results = pipeline.run()

2. Using the CLI Tool¶

For batch processing or experiment management, use the CLI tool with a YAML configuration file:

# -----------------------------------------------------------------------------
# Toy config for MLPipeline
# -----------------------------------------------------------------------------

# Global parameters shared across analyses
global_experiment_id: "toy_ml_config"
data_path: "../datasets/toy_dataset.csv"
results_dir: "../results"
results_file: "toy_ml_config"

# Default analysis parameters (can be overridden per analysis)
defaults:
  random_state: 42
  n_jobs: -1
  cv_kwargs:
    strategy: "stratified"
    n_splits: 5
    shuffle: true
    random_state: 42
  covariates: ["age"]
  spatial_units: ["regionX", "regionY"]
  feature_names: ["feat1", "feat2", "feat3"]

# List of analyses to run
analyses:
  - id: "classification_baseline"
    task: "classification"
    analysis_type: "baseline"
    target_columns: ["target_class"]
    row_filter:
      - column: "age"
        values: 13
        operator: ">"
      - column: "sex"
        values: ["male"]
    models:
      - "Logistic Regression"
      - "Random Forest"
    metrics:
      - "accuracy"
      - "roc_auc"

  - id: "regression_hp_search"
    task: "regression"
    analysis_type: "hp_search"
    target_columns: ["target_reg"]
    feature_names: ["feat1"]
    spatial_units: ["regionX"]
    models: "all"
    metrics:
      - "r2"
      - "neg_mse"
    cv_kwargs:
      strategy: "kfold"
      n_splits: 3
    search_type: "grid"
    n_iter: 20
    scoring: "r2"

Run the analysis using:

python scripts/run_ml.py --config configs/your_config.yml

The pipeline will:

Load and preprocess your data
Run all specified analyses
Save results for each model/analysis
Generate a combined results file

Documentation¶

Full documentation for CoCo Pipe is available at: https://cocopipe.readthedocs.io/en/latest/index.html

Contributing¶

Contributions are welcome! If you have suggestions or find any bugs, please open issues or submit pull requests.

TODO¶

IO Module¶

Implement CSV loading and M/EEG data loading functionalities.
Develop comprehensive unit tests.

ML Module¶

Restructure to mirror the design of the dim_reduction module.
Consolidate scripts within the main pipeline.
Add regression support and enhance cross-validation methods.
Update and expand unit tests.

DL Module¶

Define and implement deep learning functionalities.
Create corresponding unit tests.

Visualization Module¶

Plan and implement enhancements for visualization features.
Integrate new visual components and testing.

Descriptors Module¶

Add a future connectivity descriptor family built on mne-connectivity.
Start that connectivity family with phase-based measures such as PLV, with room for later extensions like ciPLV, PLI, and wPLI.
Add a future wavelet-based descriptor batch built on PyWavelets.
Start that wavelet batch with sure_entropy.
Keep log_energy_entropy on the roadmap, but finalize its scientific definition before implementation.

Dim reduction:¶

Add parallelism

License¶

TODO