Dim Reduction Workflows¶
Current State¶
The dim-reduction stack is centered on the existing DimReduction manager and
reducer contracts.
Optional dependencies are now lazy at import time.
DimReductioncaches normalized state on:metrics_metric_records_quality_metadata_diagnostics_interpretation_interpretation_records_
Plotting is done through
coco_pipe.viz.dim_reduction, not through manager methods onDimReduction.DimReductiondoes not cache embeddings. Embeddings are returned explicitly fromtransform()andfit_transform()and must be passed explicitly toscore(), plotting, or report-building paths that need them.coco_pipe.dim_reduction.evaluation.coreis the evaluation authority used byDimReduction.score().MethodSelectoris now a post-hoc comparison layer over already-scoredDimReductionobjects and exposes tidy metric observations viato_frame().
Core Interfaces¶
Use DimReduction directly for most workflows:
from coco_pipe.dim_reduction import DimReduction
reducer = DimReduction("PCA", n_components=2, random_state=42)
embedding = reducer.fit_transform(X, y=labels)
scores = reducer.score(embedding, X=X, labels=labels, times=timepoints)
interpretation = reducer.interpret(
X,
X_emb=embedding,
analyses=["correlation"],
feature_names=feature_names,
)
summary = reducer.get_summary()
DimReduction.get_summary() returns cached scalar metrics, reducer metadata,
diagnostics, tidy metric records, cached interpretation payloads, cached
interpretation records, and capability flags. It does not carry an embedding
payload.
Evaluation can be narrowed to specific metric families:
from coco_pipe.dim_reduction.config import EvaluationConfig
config = EvaluationConfig(
metrics=["trustworthiness", "continuity"],
selection_metric="trustworthiness",
selection_k=10,
tie_breakers=["continuity"],
separation_method="centroid",
)
Each reducer is scored directly:
for reducer in reducers:
embedding = reducer.fit_transform(X, y=labels)
reducer.score(
embedding,
X=X,
metrics=config.metrics,
k_values=config.k_range,
separation_method=config.separation_method,
)
reducer.interpret(
X,
X_emb=embedding,
analyses=["correlation"],
feature_names=feature_names,
)
Then compare the scored reducers:
from coco_pipe.dim_reduction.evaluation import MethodSelector
selector = MethodSelector(reducers).collect()
ranked = selector.rank_methods(
selection_metric=config.selection_metric,
selection_k=config.selection_k,
tie_breakers=config.tie_breakers,
)
best_name = ranked.iloc[0]["method"]
best = selector.reducers[best_name]
When trajectory labels are available, separation_method is passed through
during score() to trajectory_separation(..., method=...) for evaluator-level
separation summaries.
Feature interpretation is separate from preservation scoring:
score()evaluates whether the embedding preserves structureinterpret()evaluates which input features appear to drive the embedding
interpret() delegates to the pure backend
coco_pipe.dim_reduction.analysis.interpret_features(...) and currently
supports:
correlationperturbationgradient
Custom Reducers¶
BaseReducer is a supported extension point and is re-exported from
coco_pipe.dim_reduction.
from sklearn.decomposition import PCA
from coco_pipe.dim_reduction import BaseReducer
class CustomPCAReducer(BaseReducer):
@property
def capabilities(self):
caps = super().capabilities
caps.update({"is_linear": True, "has_components": True})
return caps
def fit(self, X, y=None):
self.model = PCA(n_components=self.n_components, **self.params)
self.model.fit(X)
return self
def transform(self, X):
return self.model.transform(X)
For reducers with nonstandard inputs, override capabilities to declare the
expected input_ndim and input_layout.
If a reducer depends on heavy optional libraries, keep those imports inside
fit() / transform() paths. The helper
coco_pipe.utils.import_optional_dependency(...) exists for built-in reducers
and custom advanced integrations, but it is not the main public entry point.
Supported Metric Shapes¶
Metric plotting and reporting now work from tidy observations with these columns:
methodmetricvaluescopescope_value
Optional columns such as group, condition, pair, subject, session,
seed, and fold are preserved when present.
Metric Plot Types¶
Use plot_metrics(..., plot_type=...) or the report comparison helpers.
Embedding visualizations are also external to DimReduction; pass the explicit
embedding array to the plotting function you need.
grouped_bar: one scalar per method/metricbox/boxen: repeated observationsviolin: dense repeated observationsraincloud: violin + box + pointsstrip/swarm: small repeated samplesheatmap: method x metric or method x scopeline: metric sweeps overk, time, or windowsdumbbell: direct two-method deltas
Default behavior:
global scalars -> grouped bars
repeated observations -> raincloud
varying
scope_value-> lineexplicit matrix summaries -> heatmap
Visualization Entry Points¶
The dim-reduction viz surface is data-first and explicit. Plotting helpers do not read manager-owned embedding or context state.
plot_embedding(embedding, labels=..., metadata=...)plot_metrics(metric_records, metric=..., scope=..., method=...)plot_shepard_diagram(X, embedding, distances=...)plot_trajectory(trajectories, times=..., labels=..., values=...)plot_trajectory_metric_series(series, times=..., labels=...)plot_feature_importance(scores_or_records, analysis=..., method=..., dimension=...)plot_feature_correlation_heatmap(correlation_payload, method=...)plot_interpretation(interpretation_payload, analysis=..., method=..., dimension=...)
plot_trajectory(...) and plot_trajectory_metric_series(...) require native
trajectory tensors or explicit time-series arrays. They do not reshape flat 2D
embeddings or infer grouping metadata.
Generic Trajectories¶
Trajectory scoring is not EEG-specific. Any grouped or ordered embedding can use trajectory-native metrics when:
the embedding is already a 3D tensor
(trajectory, time, dim)
Trajectory reshaping or unstacking must happen upstream. The evaluation module does not reconstruct 3D trajectories from flat 2D embeddings.
Trajectory outputs include:
trajectory_speed_meantrajectory_speed_peaktrajectory_acceleration_meantrajectory_acceleration_peaktrajectory_curvature_meantrajectory_curvature_peaktrajectory_turning_angle_meantrajectory_turning_angle_peaktrajectory_dispersion_meantrajectory_dispersion_peaktrajectory_path_length_finaltrajectory_displacement_finaltrajectory_tortuosity_finalpairwise separation AUC / peak summaries when labels exist per trajectory
Detailed timecourses are cached under diagnostics_.
trajectory_dispersion in the evaluation pipeline is currently the global,
unlabeled dispersion over all trajectories. This is narrower than the lower-level
geometry.py primitive, which can also compute label-conditioned dispersion.
Trajectory labels are only used automatically for trajectory_separation.
Trajectory metrics are descriptive outputs for plotting and reporting. They are not used as automatic method-selection metrics by default.
Reports¶
Report.add_reduction() consumes get_summary() when available and accepts an
explicit embedding payload when the section should render an embedding or
trajectory plot.
It can render:
interactive embeddings when
X_embis provided explicitlytrajectory plots for 3D embeddings
scalar metric tables and charts
loss and scree diagnostics
co-ranking heatmaps
trajectory metric timecourses
interpretation plots from
interpretation/interpretation_records
Report.add_comparison() accepts tidy metric frames or MethodSelector
instances directly.
from_reductions(...) follows the same rule: pass embeddings=[...] explicitly
when the report should include embedding or trajectory plots.
Shepard plots and comparison/report views reuse cached diagnostics such as
shepard_distances_ and coranking_matrix_ when those artifacts already exist.
End-to-End Execution¶
Batch execution should use coco_pipe.io.load_data plus DimReduction
directly. The old DimReductionPipeline compatibility wrapper has been
removed.
IO Workflow¶
The recommended IO-to-dim-reduction flow is explicit:
from coco_pipe.dim_reduction import DimReduction
from coco_pipe.io import load_data
container = load_data("data.csv", mode="tabular", target_col="label", sep=",")
X = container.X
labels = container.y
reducer = DimReduction("UMAP", n_components=2, random_state=42)
embedding = reducer.fit_transform(X, y=labels)
scores = reducer.score(embedding, X=X, labels=labels)
Use DataContainer to inspect and reshape data upstream:
container.dimscontainer.coordscontainer.flatten(...)container.stack(...)container.unstack(...)
For example:
container = load_data("embeddings/", mode="embedding")
container_2d = container.flatten(preserve="obs")
X = container_2d.X
embedding = reducer.fit_transform(X)
Trajectory scoring requires embeddings that are already shaped as
(n_trajectories, n_times, n_dims). The evaluation module does not reconstruct
3D trajectories from flat 2D arrays. Any reshaping or unstacking must happen
upstream through IO or explicit user code before calling score().
coco_pipe.io.utils remains available for lower-level and advanced workflows,
but it is not the recommended starting point for dim reduction. Prefer
load_data(...), DataContainer, and explicit reshaping operations first.
Dependency Notes¶
Heavy optional libraries such as torch, umap, meegkit, and pydmd are
loaded inside reducer methods rather than at package import time.
For a complete dim-reduction install, use the umbrella extra:
pip install coco-pipe[dim-red]
Selective extras remain available when you only need part of the reducer stack:
pip install coco-pipe[dask]
pip install coco-pipe[neighbor]
pip install coco-pipe[parametric-umap]
pip install coco-pipe[ivis]
pip install coco-pipe[topology]
pip install coco-pipe[spatiotemporal]
pip install coco-pipe[eeg]
The neighbor and dim-red extras include faiss-cpu, so Pacmap can use
nn_backend="faiss" by default on supported platforms.
Base imports that should remain lightweight:
import coco_pipe.ioimport coco_pipe.reportimport coco_pipe.dim_reduction
Migration Notes¶
Legacy local note files under coco_pipe/dim_reduction/ were consolidated into
this page and can be removed once no longer referenced.