CoCo Pipe¶
CoCo Pipe is a comprehensive Python framework designed for advanced processing and analysis of bio M/EEG data. It seamlessly integrates traditional machine learning, deep learning, and signal processing techniques into a unified pipeline architecture. Key features include:
Flexible Data Processing: Support for various data formats (tabular, M/EEG, embeddings) with automated preprocessing and feature extraction
Advanced ML Capabilities: Integrated classification and regression pipelines with automated feature selection and hyperparameter optimization
Modular Design: Easy-to-extend architecture for adding custom processing steps, models, and analysis methods
Experiment Management: Built-in tools for experiment configuration, reproducibility, and results tracking
Visualization & Reporting: Comprehensive visualization tools and automated report generation for both signal processing and ML results
Scientific Workflow: End-to-end support for neuroimaging research, from raw data processing to publication-ready results
Whether you’re conducting clinical research, developing ML models for brain-computer interfaces, or exploring neural signal patterns, CoCo Pipe provides the tools and flexibility to streamline your workflow.
Installation¶
Clone the Repository:
git clone https://github.com/BabaSanfour/coco-pipe.git cd coco-pipe
(Optional) Create and Activate a Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
Install the Package:
pip install -e .
Note: This will install all runtime dependencies. for development dependencies, use
pip install -e .[dev].
For detailed development instructions, please see CONTRIBUTING.md.
Using the ML Module¶
CoCo Pipe provides two main ways to use the ML module:
1. Direct Python API Usage¶
You can use the ML module directly in your Python scripts by importing from coco_pipe.io for data loading and coco_pipe.ml for machine learning pipelines:
from coco_pipe.io import load_data
from coco_pipe.ml import MLPipeline
# Load your data into the canonical package container
container = load_data(
"data/your_dataset.csv",
mode="tabular",
target_col="target_class",
sep=",",
)
# Select a subset explicitly from the container when needed
container = container.select(feature=["feat1", "feat2"], y=["case", "control"])
X = container.X
y = container.y
# Configure and run ML pipeline
config = {
"task": "classification", # or 'regression'
"analysis_type": "baseline", # Options: 'baseline', 'feature_selection', 'hp_search', 'hp_search_fs'
"models": "all", # or list of specific models
"metrics": ["accuracy", "f1-score"],
"cv_strategy": "stratified",
"n_splits": 5,
"n_features": 10, # For feature selection
"direction": "forward", # For feature selection
"search_type": "grid", # For hyperparameter search
"n_iter": 100, # For random search
"scoring": "accuracy",
"n_jobs": -1
}
pipeline = MLPipeline(X=X, y=y, config=config)
results = pipeline.run()
2. Using the CLI Tool¶
For batch processing or experiment management, use the CLI tool with a YAML configuration file:
# -----------------------------------------------------------------------------
# Toy config for MLPipeline
# -----------------------------------------------------------------------------
# Global parameters shared across analyses
global_experiment_id: "toy_ml_config"
data_path: "../datasets/toy_dataset.csv"
results_dir: "../results"
results_file: "toy_ml_config"
# Default analysis parameters (can be overridden per analysis)
defaults:
random_state: 42
n_jobs: -1
cv_kwargs:
strategy: "stratified"
n_splits: 5
shuffle: true
random_state: 42
covariates: ["age"]
spatial_units: ["regionX", "regionY"]
feature_names: ["feat1", "feat2", "feat3"]
# List of analyses to run
analyses:
- id: "classification_baseline"
task: "classification"
analysis_type: "baseline"
target_columns: ["target_class"]
row_filter:
- column: "age"
values: 13
operator: ">"
- column: "sex"
values: ["male"]
models:
- "Logistic Regression"
- "Random Forest"
metrics:
- "accuracy"
- "roc_auc"
- id: "regression_hp_search"
task: "regression"
analysis_type: "hp_search"
target_columns: ["target_reg"]
feature_names: ["feat1"]
spatial_units: ["regionX"]
models: "all"
metrics:
- "r2"
- "neg_mse"
cv_kwargs:
strategy: "kfold"
n_splits: 3
search_type: "grid"
n_iter: 20
scoring: "r2"
Run the analysis using:
python scripts/run_ml.py --config configs/your_config.yml
The pipeline will:
Load and preprocess your data
Run all specified analyses
Save results for each model/analysis
Generate a combined results file
Documentation¶
Full documentation for CoCo Pipe is available at: https://cocopipe.readthedocs.io/en/latest/index.html
Contributing¶
Contributions are welcome! If you have suggestions or find any bugs, please open issues or submit pull requests.
TODO¶
IO Module¶
Implement CSV loading and M/EEG data loading functionalities.
Develop comprehensive unit tests.
ML Module¶
Restructure to mirror the design of the dim_reduction module.
Consolidate scripts within the main pipeline.
Add regression support and enhance cross-validation methods.
Update and expand unit tests.
DL Module¶
Define and implement deep learning functionalities.
Create corresponding unit tests.
Visualization Module¶
Plan and implement enhancements for visualization features.
Integrate new visual components and testing.
Descriptors Module¶
Add a future connectivity descriptor family built on
mne-connectivity.Start that connectivity family with phase-based measures such as
PLV, with room for later extensions likeciPLV,PLI, andwPLI.Add a future wavelet-based descriptor batch built on
PyWavelets.Start that wavelet batch with
sure_entropy.Keep
log_energy_entropyon the roadmap, but finalize its scientific definition before implementation.
Dim reduction:¶
Add parallelism
License¶
TODO