# CoCo Pipe

![Codecov](https://img.shields.io/codecov/c/github/BabaSanfour/coco-pipe)
[![Test Status](https://img.shields.io/github/actions/workflow/status/BabaSanfour/coco-pipe/python-tests.yml?branch=main&label=tests)](https://github.com/BabaSanfour/coco-pipe/actions?query=workflow%3Apython-tests)
[![Documentation Status](https://readthedocs.org/projects/cocopipe/badge/?version=latest)](https://cocopipe.readthedocs.io/en/latest/?badge=latest)
[![GitHub Repository](https://img.shields.io/badge/Source%20Code-BabaSanfour%2Fcocopipe-blue)](https://github.com/BabaSanfour/coco-pipe)

CoCo Pipe is a comprehensive Python framework designed for advanced processing and analysis of bio M/EEG data. It seamlessly integrates traditional machine learning, deep learning, and signal processing techniques into a unified pipeline architecture. Key features include:

- **Flexible Data Processing**: Support for various data formats (tabular, M/EEG, embeddings) with automated preprocessing and feature extraction
- **Advanced ML Capabilities**: Integrated classification and regression pipelines with automated feature selection and hyperparameter optimization
- **Modular Design**: Easy-to-extend architecture for adding custom processing steps, models, and analysis methods
- **Experiment Management**: Built-in tools for experiment configuration, reproducibility, and results tracking
- **Visualization & Reporting**: Comprehensive visualization tools and automated report generation for both signal processing and ML results
- **Scientific Workflow**: End-to-end support for neuroimaging research, from raw data processing to publication-ready results

Whether you're conducting clinical research, developing ML models for brain-computer interfaces, or exploring neural signal patterns, CoCo Pipe provides the tools and flexibility to streamline your workflow.

## Installation

1. **Clone the Repository:**

   ```bash
   git clone https://github.com/BabaSanfour/coco-pipe.git
   cd coco-pipe
   ```

2. **(Optional) Create and Activate a Virtual Environment:**

   ```bash
   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   ```

3. **Install the Package:**

   ```bash
   pip install -e .
   ```

   *Note: This will install all runtime dependencies. for development dependencies, use `pip install -e .[dev]`.*

For detailed development instructions, please see [CONTRIBUTING.md](CONTRIBUTING.md).

## Using the ML Module

CoCo Pipe provides two main ways to use the ML module:

### 1. Direct Python API Usage

You can use the ML module directly in your Python scripts by importing from `coco_pipe.io` for data loading and `coco_pipe.ml` for machine learning pipelines:

```python
from coco_pipe.io import load_data
from coco_pipe.ml import MLPipeline

# Load your data into the canonical package container
container = load_data(
    "data/your_dataset.csv",
    mode="tabular",
    target_col="target_class",
    sep=",",
)

# Select a subset explicitly from the container when needed
container = container.select(feature=["feat1", "feat2"], y=["case", "control"])
X = container.X
y = container.y

# Configure and run ML pipeline
config = {
    "task": "classification",  # or 'regression'
    "analysis_type": "baseline",  # Options: 'baseline', 'feature_selection', 'hp_search', 'hp_search_fs'
    "models": "all",  # or list of specific models
    "metrics": ["accuracy", "f1-score"],
    "cv_strategy": "stratified",
    "n_splits": 5,
    "n_features": 10,  # For feature selection
    "direction": "forward",  # For feature selection
    "search_type": "grid",  # For hyperparameter search
    "n_iter": 100,  # For random search
    "scoring": "accuracy",
    "n_jobs": -1
}

pipeline = MLPipeline(X=X, y=y, config=config)
results = pipeline.run()
```

### 2. Using the CLI Tool

For batch processing or experiment management, use the CLI tool with a YAML configuration file:

```yaml
# -----------------------------------------------------------------------------
# Toy config for MLPipeline
# -----------------------------------------------------------------------------

# Global parameters shared across analyses
global_experiment_id: "toy_ml_config"
data_path: "../datasets/toy_dataset.csv"
results_dir: "../results"
results_file: "toy_ml_config"

# Default analysis parameters (can be overridden per analysis)
defaults:
  random_state: 42
  n_jobs: -1
  cv_kwargs:
    strategy: "stratified"
    n_splits: 5
    shuffle: true
    random_state: 42
  covariates: ["age"]
  spatial_units: ["regionX", "regionY"]
  feature_names: ["feat1", "feat2", "feat3"]

# List of analyses to run
analyses:
  - id: "classification_baseline"
    task: "classification"
    analysis_type: "baseline"
    target_columns: ["target_class"]
    row_filter:
      - column: "age"
        values: 13
        operator: ">"
      - column: "sex"
        values: ["male"]
    models:
      - "Logistic Regression"
      - "Random Forest"
    metrics:
      - "accuracy"
      - "roc_auc"

  - id: "regression_hp_search"
    task: "regression"
    analysis_type: "hp_search"
    target_columns: ["target_reg"]
    feature_names: ["feat1"]
    spatial_units: ["regionX"]
    models: "all"
    metrics:
      - "r2"
      - "neg_mse"
    cv_kwargs:
      strategy: "kfold"
      n_splits: 3
    search_type: "grid"
    n_iter: 20
    scoring: "r2"
```

Run the analysis using:

```bash
python scripts/run_ml.py --config configs/your_config.yml
```

The pipeline will:
- Load and preprocess your data
- Run all specified analyses
- Save results for each model/analysis
- Generate a combined results file

## Documentation

Full documentation for CoCo Pipe is available at:
https://cocopipe.readthedocs.io/en/latest/index.html

## Contributing

Contributions are welcome! If you have suggestions or find any bugs, please open issues or submit pull requests.

### TODO

#### IO Module
- Implement CSV loading and M/EEG data loading functionalities.
- Develop comprehensive unit tests.

#### ML Module
- Restructure to mirror the design of the dim_reduction module.
- Consolidate scripts within the main pipeline.
- Add regression support and enhance cross-validation methods.
- Update and expand unit tests.

#### DL Module
- Define and implement deep learning functionalities.
- Create corresponding unit tests.

#### Visualization Module
- Plan and implement enhancements for visualization features.
- Integrate new visual components and testing.

#### Descriptors Module
- Add a future connectivity descriptor family built on `mne-connectivity`.
- Start that connectivity family with phase-based measures such as `PLV`, with room for later extensions like `ciPLV`, `PLI`, and `wPLI`.
- Add a future wavelet-based descriptor batch built on `PyWavelets`.
- Start that wavelet batch with `sure_entropy`.
- Keep `log_energy_entropy` on the roadmap, but finalize its scientific definition before implementation.

#### Dim reduction:
- Add parallelism

## License

*TODO*