Dim Reduction Workflows

Current State

The dim-reduction stack is centered on the existing DimReduction manager and reducer contracts.

  • Optional dependencies are now lazy at import time.

  • DimReduction caches normalized state on:

    • metrics_

    • metric_records_

    • quality_metadata_

    • diagnostics_

    • interpretation_

    • interpretation_records_

  • Plotting is done through coco_pipe.viz.dim_reduction, not through manager methods on DimReduction.

  • DimReduction does not cache embeddings. Embeddings are returned explicitly from transform() and fit_transform() and must be passed explicitly to score(), plotting, or report-building paths that need them.

  • coco_pipe.dim_reduction.evaluation.core is the evaluation authority used by DimReduction.score().

  • MethodSelector is now a post-hoc comparison layer over already-scored DimReduction objects and exposes tidy metric observations via to_frame().

Core Interfaces

Use DimReduction directly for most workflows:

from coco_pipe.dim_reduction import DimReduction

reducer = DimReduction("PCA", n_components=2, random_state=42)
embedding = reducer.fit_transform(X, y=labels)
scores = reducer.score(embedding, X=X, labels=labels, times=timepoints)
interpretation = reducer.interpret(
    X,
    X_emb=embedding,
    analyses=["correlation"],
    feature_names=feature_names,
)
summary = reducer.get_summary()

DimReduction.get_summary() returns cached scalar metrics, reducer metadata, diagnostics, tidy metric records, cached interpretation payloads, cached interpretation records, and capability flags. It does not carry an embedding payload.

Evaluation can be narrowed to specific metric families:

from coco_pipe.dim_reduction.config import EvaluationConfig

config = EvaluationConfig(
    metrics=["trustworthiness", "continuity"],
    selection_metric="trustworthiness",
    selection_k=10,
    tie_breakers=["continuity"],
    separation_method="centroid",
)

Each reducer is scored directly:

for reducer in reducers:
    embedding = reducer.fit_transform(X, y=labels)
    reducer.score(
        embedding,
        X=X,
        metrics=config.metrics,
        k_values=config.k_range,
        separation_method=config.separation_method,
    )
    reducer.interpret(
        X,
        X_emb=embedding,
        analyses=["correlation"],
        feature_names=feature_names,
    )

Then compare the scored reducers:

from coco_pipe.dim_reduction.evaluation import MethodSelector

selector = MethodSelector(reducers).collect()
ranked = selector.rank_methods(
    selection_metric=config.selection_metric,
    selection_k=config.selection_k,
    tie_breakers=config.tie_breakers,
)
best_name = ranked.iloc[0]["method"]
best = selector.reducers[best_name]

When trajectory labels are available, separation_method is passed through during score() to trajectory_separation(..., method=...) for evaluator-level separation summaries.

Feature interpretation is separate from preservation scoring:

  • score() evaluates whether the embedding preserves structure

  • interpret() evaluates which input features appear to drive the embedding

interpret() delegates to the pure backend coco_pipe.dim_reduction.analysis.interpret_features(...) and currently supports:

  • correlation

  • perturbation

  • gradient

Custom Reducers

BaseReducer is a supported extension point and is re-exported from coco_pipe.dim_reduction.

from sklearn.decomposition import PCA

from coco_pipe.dim_reduction import BaseReducer


class CustomPCAReducer(BaseReducer):
    @property
    def capabilities(self):
        caps = super().capabilities
        caps.update({"is_linear": True, "has_components": True})
        return caps

    def fit(self, X, y=None):
        self.model = PCA(n_components=self.n_components, **self.params)
        self.model.fit(X)
        return self

    def transform(self, X):
        return self.model.transform(X)

For reducers with nonstandard inputs, override capabilities to declare the expected input_ndim and input_layout.

If a reducer depends on heavy optional libraries, keep those imports inside fit() / transform() paths. The helper coco_pipe.utils.import_optional_dependency(...) exists for built-in reducers and custom advanced integrations, but it is not the main public entry point.

Supported Metric Shapes

Metric plotting and reporting now work from tidy observations with these columns:

  • method

  • metric

  • value

  • scope

  • scope_value

Optional columns such as group, condition, pair, subject, session, seed, and fold are preserved when present.

Metric Plot Types

Use plot_metrics(..., plot_type=...) or the report comparison helpers. Embedding visualizations are also external to DimReduction; pass the explicit embedding array to the plotting function you need.

  • grouped_bar: one scalar per method/metric

  • box / boxen: repeated observations

  • violin: dense repeated observations

  • raincloud: violin + box + points

  • strip / swarm: small repeated samples

  • heatmap: method x metric or method x scope

  • line: metric sweeps over k, time, or windows

  • dumbbell: direct two-method deltas

Default behavior:

  • global scalars -> grouped bars

  • repeated observations -> raincloud

  • varying scope_value -> line

  • explicit matrix summaries -> heatmap

Visualization Entry Points

The dim-reduction viz surface is data-first and explicit. Plotting helpers do not read manager-owned embedding or context state.

  • plot_embedding(embedding, labels=..., metadata=...)

  • plot_metrics(metric_records, metric=..., scope=..., method=...)

  • plot_shepard_diagram(X, embedding, distances=...)

  • plot_trajectory(trajectories, times=..., labels=..., values=...)

  • plot_trajectory_metric_series(series, times=..., labels=...)

  • plot_feature_importance(scores_or_records, analysis=..., method=..., dimension=...)

  • plot_feature_correlation_heatmap(correlation_payload, method=...)

  • plot_interpretation(interpretation_payload, analysis=..., method=..., dimension=...)

plot_trajectory(...) and plot_trajectory_metric_series(...) require native trajectory tensors or explicit time-series arrays. They do not reshape flat 2D embeddings or infer grouping metadata.

Generic Trajectories

Trajectory scoring is not EEG-specific. Any grouped or ordered embedding can use trajectory-native metrics when:

  • the embedding is already a 3D tensor (trajectory, time, dim)

Trajectory reshaping or unstacking must happen upstream. The evaluation module does not reconstruct 3D trajectories from flat 2D embeddings.

Trajectory outputs include:

  • trajectory_speed_mean

  • trajectory_speed_peak

  • trajectory_acceleration_mean

  • trajectory_acceleration_peak

  • trajectory_curvature_mean

  • trajectory_curvature_peak

  • trajectory_turning_angle_mean

  • trajectory_turning_angle_peak

  • trajectory_dispersion_mean

  • trajectory_dispersion_peak

  • trajectory_path_length_final

  • trajectory_displacement_final

  • trajectory_tortuosity_final

  • pairwise separation AUC / peak summaries when labels exist per trajectory

Detailed timecourses are cached under diagnostics_.

trajectory_dispersion in the evaluation pipeline is currently the global, unlabeled dispersion over all trajectories. This is narrower than the lower-level geometry.py primitive, which can also compute label-conditioned dispersion. Trajectory labels are only used automatically for trajectory_separation.

Trajectory metrics are descriptive outputs for plotting and reporting. They are not used as automatic method-selection metrics by default.

Reports

Report.add_reduction() consumes get_summary() when available and accepts an explicit embedding payload when the section should render an embedding or trajectory plot.

It can render:

  • interactive embeddings when X_emb is provided explicitly

  • trajectory plots for 3D embeddings

  • scalar metric tables and charts

  • loss and scree diagnostics

  • co-ranking heatmaps

  • trajectory metric timecourses

  • interpretation plots from interpretation / interpretation_records

Report.add_comparison() accepts tidy metric frames or MethodSelector instances directly.

from_reductions(...) follows the same rule: pass embeddings=[...] explicitly when the report should include embedding or trajectory plots.

Shepard plots and comparison/report views reuse cached diagnostics such as shepard_distances_ and coranking_matrix_ when those artifacts already exist.

End-to-End Execution

Batch execution should use coco_pipe.io.load_data plus DimReduction directly. The old DimReductionPipeline compatibility wrapper has been removed.

IO Workflow

The recommended IO-to-dim-reduction flow is explicit:

from coco_pipe.dim_reduction import DimReduction
from coco_pipe.io import load_data

container = load_data("data.csv", mode="tabular", target_col="label", sep=",")

X = container.X
labels = container.y

reducer = DimReduction("UMAP", n_components=2, random_state=42)
embedding = reducer.fit_transform(X, y=labels)
scores = reducer.score(embedding, X=X, labels=labels)

Use DataContainer to inspect and reshape data upstream:

  • container.dims

  • container.coords

  • container.flatten(...)

  • container.stack(...)

  • container.unstack(...)

For example:

container = load_data("embeddings/", mode="embedding")
container_2d = container.flatten(preserve="obs")

X = container_2d.X
embedding = reducer.fit_transform(X)

Trajectory scoring requires embeddings that are already shaped as (n_trajectories, n_times, n_dims). The evaluation module does not reconstruct 3D trajectories from flat 2D arrays. Any reshaping or unstacking must happen upstream through IO or explicit user code before calling score().

coco_pipe.io.utils remains available for lower-level and advanced workflows, but it is not the recommended starting point for dim reduction. Prefer load_data(...), DataContainer, and explicit reshaping operations first.

Dependency Notes

Heavy optional libraries such as torch, umap, meegkit, and pydmd are loaded inside reducer methods rather than at package import time.

For a complete dim-reduction install, use the umbrella extra:

pip install coco-pipe[dim-red]

Selective extras remain available when you only need part of the reducer stack:

pip install coco-pipe[dask]
pip install coco-pipe[neighbor]
pip install coco-pipe[parametric-umap]
pip install coco-pipe[ivis]
pip install coco-pipe[topology]
pip install coco-pipe[spatiotemporal]
pip install coco-pipe[eeg]

The neighbor and dim-red extras include faiss-cpu, so Pacmap can use nn_backend="faiss" by default on supported platforms.

Base imports that should remain lightweight:

  • import coco_pipe.io

  • import coco_pipe.report

  • import coco_pipe.dim_reduction

Migration Notes

Legacy local note files under coco_pipe/dim_reduction/ were consolidated into this page and can be removed once no longer referenced.