Dim Reduction Workflows¶

Current State¶

The dim-reduction stack is centered on the existing DimReduction manager and reducer contracts.

Optional dependencies are now lazy at import time.
DimReduction caches normalized state on:
- metrics_
- metric_records_
- quality_metadata_
- diagnostics_
- interpretation_
- interpretation_records_
Plotting is done through coco_pipe.viz.dim_reduction, not through manager methods on DimReduction.
DimReduction does not cache embeddings. Embeddings are returned explicitly from transform() and fit_transform() and must be passed explicitly to score(), plotting, or report-building paths that need them.
coco_pipe.dim_reduction.evaluation.core is the evaluation authority used by DimReduction.score().
MethodSelector is now a post-hoc comparison layer over already-scored DimReduction objects and exposes tidy metric observations via to_frame().

Core Interfaces¶

Use DimReduction directly for most workflows:

from coco_pipe.dim_reduction import DimReduction

reducer = DimReduction("PCA", n_components=2, random_state=42)
embedding = reducer.fit_transform(X, y=labels)
scores = reducer.score(embedding, X=X, labels=labels, times=timepoints)
interpretation = reducer.interpret(
    X,
    X_emb=embedding,
    analyses=["correlation"],
    feature_names=feature_names,
)
summary = reducer.get_summary()

DimReduction.get_summary() returns cached scalar metrics, reducer metadata, diagnostics, tidy metric records, cached interpretation payloads, cached interpretation records, and capability flags. It does not carry an embedding payload.

Evaluation can be narrowed to specific metric families:

from coco_pipe.dim_reduction.config import EvaluationConfig

config = EvaluationConfig(
    metrics=["trustworthiness", "continuity"],
    selection_metric="trustworthiness",
    selection_k=10,
    tie_breakers=["continuity"],
    separation_method="centroid",
)

Each reducer is scored directly:

for reducer in reducers:
    embedding = reducer.fit_transform(X, y=labels)
    reducer.score(
        embedding,
        X=X,
        metrics=config.metrics,
        k_values=config.k_range,
        separation_method=config.separation_method,
    )
    reducer.interpret(
        X,
        X_emb=embedding,
        analyses=["correlation"],
        feature_names=feature_names,
    )

Then compare the scored reducers:

from coco_pipe.dim_reduction.evaluation import MethodSelector

selector = MethodSelector(reducers).collect()
ranked = selector.rank_methods(
    selection_metric=config.selection_metric,
    selection_k=config.selection_k,
    tie_breakers=config.tie_breakers,
)
best_name = ranked.iloc[0]["method"]
best = selector.reducers[best_name]

When trajectory labels are available, separation_method is passed through during score() to trajectory_separation(..., method=...) for evaluator-level separation summaries.

Feature interpretation is separate from preservation scoring:

score() evaluates whether the embedding preserves structure
interpret() evaluates which input features appear to drive the embedding

interpret() delegates to the pure backend coco_pipe.dim_reduction.analysis.interpret_features(...) and currently supports:

correlation
perturbation
gradient

Custom Reducers¶

BaseReducer is a supported extension point and is re-exported from coco_pipe.dim_reduction.

from sklearn.decomposition import PCA

from coco_pipe.dim_reduction import BaseReducer


class CustomPCAReducer(BaseReducer):
    @property
    def capabilities(self):
        caps = super().capabilities
        caps.update({"is_linear": True, "has_components": True})
        return caps

    def fit(self, X, y=None):
        self.model = PCA(n_components=self.n_components, **self.params)
        self.model.fit(X)
        return self

    def transform(self, X):
        return self.model.transform(X)

For reducers with nonstandard inputs, override capabilities to declare the expected input_ndim and input_layout.

If a reducer depends on heavy optional libraries, keep those imports inside fit() / transform() paths. The helper coco_pipe.utils.import_optional_dependency(...) exists for built-in reducers and custom advanced integrations, but it is not the main public entry point.

Supported Metric Shapes¶

Metric plotting and reporting now work from tidy observations with these columns:

method
metric
value
scope
scope_value

Optional columns such as group, condition, pair, subject, session, seed, and fold are preserved when present.

Metric Plot Types¶

Use plot_metrics(..., plot_type=...) or the report comparison helpers. Embedding visualizations are also external to DimReduction; pass the explicit embedding array to the plotting function you need.

grouped_bar: one scalar per method/metric
box / boxen: repeated observations
violin: dense repeated observations
raincloud: violin + box + points
strip / swarm: small repeated samples
heatmap: method x metric or method x scope
line: metric sweeps over k, time, or windows
dumbbell: direct two-method deltas

Default behavior:

global scalars -> grouped bars
repeated observations -> raincloud
varying scope_value -> line
explicit matrix summaries -> heatmap

Visualization Entry Points¶

The dim-reduction viz surface is data-first and explicit. Plotting helpers do not read manager-owned embedding or context state.

plot_embedding(embedding, labels=..., metadata=...)
plot_metrics(metric_records, metric=..., scope=..., method=...)
plot_shepard_diagram(X, embedding, distances=...)
plot_trajectory(trajectories, times=..., labels=..., values=...)
plot_trajectory_metric_series(series, times=..., labels=...)
plot_feature_importance(scores_or_records, analysis=..., method=..., dimension=...)
plot_feature_correlation_heatmap(correlation_payload, method=...)
plot_interpretation(interpretation_payload, analysis=..., method=..., dimension=...)

plot_trajectory(...) and plot_trajectory_metric_series(...) require native trajectory tensors or explicit time-series arrays. They do not reshape flat 2D embeddings or infer grouping metadata.

Generic Trajectories¶

Trajectory scoring is not EEG-specific. Any grouped or ordered embedding can use trajectory-native metrics when:

the embedding is already a 3D tensor (trajectory, time, dim)

Trajectory reshaping or unstacking must happen upstream. The evaluation module does not reconstruct 3D trajectories from flat 2D embeddings.

Trajectory outputs include:

trajectory_speed_mean
trajectory_speed_peak
trajectory_acceleration_mean
trajectory_acceleration_peak
trajectory_curvature_mean
trajectory_curvature_peak
trajectory_turning_angle_mean
trajectory_turning_angle_peak
trajectory_dispersion_mean
trajectory_dispersion_peak
trajectory_path_length_final
trajectory_displacement_final
trajectory_tortuosity_final
pairwise separation AUC / peak summaries when labels exist per trajectory

Detailed timecourses are cached under diagnostics_.

trajectory_dispersion in the evaluation pipeline is currently the global, unlabeled dispersion over all trajectories. This is narrower than the lower-level geometry.py primitive, which can also compute label-conditioned dispersion. Trajectory labels are only used automatically for trajectory_separation.

Trajectory metrics are descriptive outputs for plotting and reporting. They are not used as automatic method-selection metrics by default.

Reports¶

Report.add_reduction() consumes get_summary() when available and accepts an explicit embedding payload when the section should render an embedding or trajectory plot.

It can render:

interactive embeddings when X_emb is provided explicitly
trajectory plots for 3D embeddings
scalar metric tables and charts
loss and scree diagnostics
co-ranking heatmaps
trajectory metric timecourses
interpretation plots from interpretation / interpretation_records

Report.add_comparison() accepts tidy metric frames or MethodSelector instances directly.

from_reductions(...) follows the same rule: pass embeddings=[...] explicitly when the report should include embedding or trajectory plots.

Shepard plots and comparison/report views reuse cached diagnostics such as shepard_distances_ and coranking_matrix_ when those artifacts already exist.

End-to-End Execution¶

Batch execution should use coco_pipe.io.load_data plus DimReduction directly. The old DimReductionPipeline compatibility wrapper has been removed.

IO Workflow¶

The recommended IO-to-dim-reduction flow is explicit:

from coco_pipe.dim_reduction import DimReduction
from coco_pipe.io import load_data

container = load_data("data.csv", mode="tabular", target_col="label", sep=",")

X = container.X
labels = container.y

reducer = DimReduction("UMAP", n_components=2, random_state=42)
embedding = reducer.fit_transform(X, y=labels)
scores = reducer.score(embedding, X=X, labels=labels)

Use DataContainer to inspect and reshape data upstream:

container.dims
container.coords
container.flatten(...)
container.stack(...)
container.unstack(...)

For example:

container = load_data("embeddings/", mode="embedding")
container_2d = container.flatten(preserve="obs")

X = container_2d.X
embedding = reducer.fit_transform(X)

Trajectory scoring requires embeddings that are already shaped as (n_trajectories, n_times, n_dims). The evaluation module does not reconstruct 3D trajectories from flat 2D arrays. Any reshaping or unstacking must happen upstream through IO or explicit user code before calling score().

coco_pipe.io.utils remains available for lower-level and advanced workflows, but it is not the recommended starting point for dim reduction. Prefer load_data(...), DataContainer, and explicit reshaping operations first.

Dependency Notes¶

Heavy optional libraries such as torch, umap, meegkit, and pydmd are loaded inside reducer methods rather than at package import time.

For a complete dim-reduction install, use the umbrella extra:

pip install coco-pipe[dim-red]

Selective extras remain available when you only need part of the reducer stack:

pip install coco-pipe[dask]
pip install coco-pipe[neighbor]
pip install coco-pipe[parametric-umap]
pip install coco-pipe[ivis]
pip install coco-pipe[topology]
pip install coco-pipe[spatiotemporal]
pip install coco-pipe[eeg]

The neighbor and dim-red extras include faiss-cpu, so Pacmap can use nn_backend="faiss" by default on supported platforms.

Base imports that should remain lightweight:

import coco_pipe.io
import coco_pipe.report
import coco_pipe.dim_reduction

Migration Notes¶

Legacy local note files under coco_pipe/dim_reduction/ were consolidated into this page and can be removed once no longer referenced.