coco_pipe.dim_reduction.reducers.neighbor

Neighbor-embedding and graph-based reducers.

This module provides wrappers for neighborhood-preserving and graph-based nonlinear dimensionality reduction methods, including t-SNE, UMAP, PaCMAP, TriMap, PHATE, and Parametric UMAP.

Classes

TSNEReducer

t-Distributed Stochastic Neighbor Embedding wrapper.

UMAPReducer

Uniform Manifold Approximation and Projection wrapper.

PacmapReducer

Pairwise Controlled Manifold Approximation wrapper.

TrimapReducer

Triplet-based manifold embedding wrapper.

PHATEReducer

Diffusion-based PHATE embedding wrapper.

ParametricUMAPReducer

Neural-network-backed Parametric UMAP wrapper.

References

Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca)

Sina Esmaeili (sina.esmaeili@umontreal.ca)

Classes

TSNEReducer

t-SNE reducer.

UMAPReducer

UMAP reducer.

PacmapReducer

PaCMAP reducer.

TrimapReducer

TriMap reducer.

PHATEReducer

PHATE reducer.

ParametricUMAPReducer

Parametric UMAP reducer.

Module Contents

class coco_pipe.dim_reduction.reducers.neighbor.TSNEReducer(n_components: int = 2, **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

t-SNE reducer.

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a neighborhood- preserving method designed primarily for visualization. It optimizes a low-dimensional embedding by matching pairwise similarities between the original space and the embedding.

Parameters:
  • n_components (int, default=2) – Number of embedding dimensions.

  • **kwargs (dict) – Additional keyword arguments forwarded to sklearn.manifold.TSNE after signature filtering. Common options include perplexity, learning_rate, max_iter, init, and random_state.

embedding_

Learned training-set embedding after fit or fit_transform.

Type:

np.ndarray or None

model

Fitted t-SNE estimator after fit or fit_transform.

Type:

sklearn.manifold.TSNE or None

Notes

transform is not supported because scikit-learn t-SNE does not provide an out-of-sample projection API.

See also

UMAPReducer

Nonlinear graph-based embedding with transform support.

PacmapReducer

Nonlinear embedding balancing local and global structure.

TrimapReducer

Nonlinear triplet-based embedding preserving global layout.

PHATEReducer

Diffusion-based embedding for continuous trajectories.

PCAReducer

Linear baseline for global variance preservation.

IsomapReducer

Nonlinear geodesic-distance manifold embedding.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import TSNEReducer
>>> X = np.random.rand(100, 10)
>>> reducer = TSNEReducer(n_components=2, perplexity=20, random_state=42)
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(100, 2)
>>> reducer.get_quality_metadata()["kl_divergence_"] >= 0
True
>>> _ = reducer.fit(X)
>>> reducer.embedding_.shape
(100, 2)
property capabilities: dict

Return capability metadata for t-SNE.

Returns:

Capability mapping describing t-SNE as a nonlinear stochastic reducer without out-of-sample transform support.

Return type:

dict

embedding_ = None
fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) TSNEReducer[source]

Fit t-SNE on the input data.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

TSNEReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import TSNEReducer
>>> X = np.random.rand(30, 6)
>>> reducer = TSNEReducer(n_components=2, perplexity=5, max_iter=250)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
abstract transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Raise because t-SNE does not support out-of-sample transformation.

Parameters:

X (ArrayLike) – Ignored input included for API compatibility.

Raises:

NotImplementedError – Always raised because t-SNE does not support transforming new data.

fit_transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) numpy.ndarray[source]

Fit t-SNE and return the embedding coordinates.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Embedded coordinates produced by t-SNE.

Return type:

np.ndarray of shape (n_samples, n_components)

class coco_pipe.dim_reduction.reducers.neighbor.UMAPReducer(n_components: int = 2, **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

UMAP reducer.

Uniform Manifold Approximation and Projection (UMAP) constructs a graph in the high-dimensional space and optimizes a low-dimensional representation of that graph. Unlike t-SNE, UMAP supports out-of-sample transformation.

Parameters:
  • n_components (int, default=2) – Number of embedding dimensions.

  • **kwargs (dict) – Additional keyword arguments forwarded to umap.UMAP after signature filtering. Common options include n_neighbors, min_dist, metric, and random_state.

model

Fitted UMAP estimator after fit.

Type:

umap.UMAP or None

See also

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

PacmapReducer

Nonlinear embedding balancing local and global structure.

TrimapReducer

Nonlinear triplet-based embedding preserving global layout.

PHATEReducer

Diffusion-based embedding for continuous trajectories.

IsomapReducer

Nonlinear geodesic-distance manifold embedding.

PCAReducer

Linear baseline for global variance preservation.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import UMAPReducer
>>> X = np.random.rand(100, 10)
>>> reducer = UMAPReducer(n_components=2, n_neighbors=10, random_state=42)
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:10]).shape
(10, 2)
>>> reducer.get_diagnostics()["graph_"] is not None
True
>>> reducer.fit_transform(X).shape
(100, 2)
property capabilities: dict

Return capability metadata for UMAP.

Returns:

Capability mapping describing UMAP as a nonlinear stochastic reducer with transform support and a native plotting path.

Return type:

dict

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) UMAPReducer[source]

Fit UMAP on the input data.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Optional supervision supported by UMAP.

Returns:

Fitted reducer instance.

Return type:

UMAPReducer

Raises:
  • ImportError – If umap-learn is not installed.

  • RuntimeError – If umap-learn is installed but fails during initialization.

transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Project data using the fitted UMAP model.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Data to project.

Returns:

Low-dimensional embedding coordinates.

Return type:

np.ndarray of shape (n_samples, n_components)

Raises:

RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.reducers.neighbor.PacmapReducer(n_components: int = 2, n_neighbors: int = 10, MN_ratio: float = 0.5, FP_ratio: float = 2.0, nn_backend: str = 'faiss', init: str = 'pca', **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

PaCMAP reducer.

Pairwise Controlled Manifold Approximation (PaCMAP) preserves local and global structure by balancing near, mid-near, and far pairs during the optimization.

Parameters:
  • n_components (int, default=2) – Number of embedding dimensions.

  • n_neighbors (int, default=10) – Number of neighbors used to form local pairs.

  • MN_ratio (float, default=0.5) – Ratio of mid-near pairs.

  • FP_ratio (float, default=2.0) – Ratio of far pairs.

  • nn_backend ({"faiss", "annoy", "voyager"}, default="faiss") – Nearest-neighbor backend used by recent PaCMAP versions. Older PaCMAP releases that do not expose this argument will ignore it through signature filtering.

  • init (str, default="pca") – Initialization strategy passed to fit_transform.

  • **kwargs (dict) – Additional keyword arguments forwarded to pacmap.PaCMAP after signature filtering.

embedding_

Learned training-set embedding after fit or fit_transform.

Type:

np.ndarray or None

model

Fitted PaCMAP estimator after fit or fit_transform.

Type:

pacmap.PaCMAP or None

Notes

transform is not supported because PaCMAP does not provide an efficient out-of-sample projection API.

See also

UMAPReducer

Nonlinear graph-based embedding with transform support.

TrimapReducer

Nonlinear triplet-based embedding preserving global layout.

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

PHATEReducer

Diffusion-based embedding for continuous trajectories.

PCAReducer

Linear baseline for global variance preservation.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import PacmapReducer
>>> X = np.random.rand(100, 10)
>>> reducer = PacmapReducer(
...     n_components=2,
...     n_neighbors=10,
...     nn_backend="faiss",
...     init="random",
... )
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(100, 2)
>>> reducer.embedding_.shape
(100, 2)
property capabilities: dict

Return capability metadata for PaCMAP.

Returns:

Capability mapping describing PaCMAP as a nonlinear stochastic reducer without out-of-sample transform support.

Return type:

dict

n_neighbors = 10
MN_ratio = 0.5
FP_ratio = 2.0
nn_backend = 'faiss'
init = 'pca'
embedding_ = None
fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) PacmapReducer[source]

Fit PaCMAP on the input data.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

PacmapReducer

Raises:
  • ImportError – If pacmap is not installed.

  • RuntimeError – If pacmap is installed but fails during initialization.

abstract transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Raise because PaCMAP does not support out-of-sample transformation.

Parameters:

X (ArrayLike) – Ignored input included for API compatibility.

Raises:

NotImplementedError – Always raised because PaCMAP does not support transforming new data without refitting.

fit_transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) numpy.ndarray[source]

Fit PaCMAP and return the embedding coordinates.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Embedded coordinates produced by PaCMAP.

Return type:

np.ndarray of shape (n_samples, n_components)

class coco_pipe.dim_reduction.reducers.neighbor.TrimapReducer(n_components: int = 2, n_inliers: int = 10, n_outliers: int = 5, n_random: int = 5, **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

TriMap reducer.

TriMap uses triplet constraints to preserve relative similarities while emphasizing global layout preservation.

Parameters:
  • n_components (int, default=2) – Number of embedding dimensions.

  • n_inliers (int, default=10) – Number of nearest-neighbor inlier triplets.

  • n_outliers (int, default=5) – Number of outlier triplets.

  • n_random (int, default=5) – Number of random triplets per sample.

  • **kwargs (dict) – Additional keyword arguments forwarded to trimap.TRIMAP after signature filtering.

embedding_

Learned training-set embedding after fit or fit_transform.

Type:

np.ndarray or None

model

Fitted TriMap estimator after fit or fit_transform.

Type:

trimap.TRIMAP or None

Notes

transform is not supported because TriMap does not provide an out-of-sample projection API.

See also

UMAPReducer

Nonlinear graph-based embedding with transform support.

PacmapReducer

Nonlinear embedding balancing local and global structure.

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

PHATEReducer

Diffusion-based embedding for continuous trajectories.

IsomapReducer

Nonlinear geodesic-distance manifold embedding.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import TrimapReducer
>>> X = np.random.rand(100, 10)
>>> reducer = TrimapReducer(n_components=2)
>>> reducer.fit_transform(X).shape
(100, 2)
property capabilities: dict

Return capability metadata for TriMap.

Returns:

Capability mapping describing TriMap as a nonlinear stochastic reducer without out-of-sample transform support.

Return type:

dict

n_inliers = 10
n_outliers = 5
n_random = 5
embedding_ = None
fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) TrimapReducer[source]

Fit TriMap on the input data.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

TrimapReducer

Raises:
  • ImportError – If trimap is not installed.

  • RuntimeError – If trimap is installed but fails during initialization.

abstract transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Raise because TriMap does not support out-of-sample transformation.

Parameters:

X (ArrayLike) – Ignored input included for API compatibility.

Raises:

NotImplementedError – Always raised because TriMap does not support transforming new data without refitting.

fit_transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) numpy.ndarray[source]

Fit TriMap and return the embedding coordinates.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Embedded coordinates produced by TriMap.

Return type:

np.ndarray of shape (n_samples, n_components)

class coco_pipe.dim_reduction.reducers.neighbor.PHATEReducer(n_components: int = 2, knn: int = 5, decay: int = 40, t: Any = 'auto', **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

PHATE reducer.

Potential of Heat-diffusion for Affinity-based Transition Embedding (PHATE) is designed for data with continuous progression structure and uses diffusion-based distances to construct the embedding.

Parameters:
  • n_components (int, default=2) – Number of embedding dimensions.

  • knn (int, default=5) – Number of nearest neighbors used in the kernel graph.

  • decay (int, default=40) – Decay rate for the kernel.

  • t (int or str, default="auto") – Diffusion time.

  • **kwargs (dict) – Additional keyword arguments forwarded to phate.PHATE after signature filtering.

model

Fitted PHATE estimator after fit.

Type:

phate.PHATE or None

See also

UMAPReducer

Nonlinear graph-based embedding with transform support.

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

PacmapReducer

Nonlinear embedding balancing local and global structure.

TrimapReducer

Nonlinear triplet-based embedding preserving global layout.

ParametricUMAPReducer

Neural-network-backed UMAP approximation.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import PHATEReducer
>>> X = np.random.rand(100, 10)
>>> reducer = PHATEReducer(n_components=2, knn=5)
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:10]).shape
(10, 2)
>>> reducer.get_diagnostics()["diff_potential"] is not None
True
property capabilities: dict

Return capability metadata for PHATE.

Returns:

Capability mapping describing PHATE as a nonlinear reducer with transform support and a native plotting path.

Return type:

dict

knn = 5
decay = 40
t = 'auto'
fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) PHATEReducer[source]

Fit PHATE on the input data.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

PHATEReducer

Raises:
  • ImportError – If phate is not installed.

  • RuntimeError – If phate is installed but fails during initialization.

transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Project data using the fitted PHATE model.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Data to project.

Returns:

Low-dimensional embedding coordinates.

Return type:

np.ndarray of shape (n_samples, n_components)

Raises:

RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.reducers.neighbor.ParametricUMAPReducer(n_components: int = 2, n_neighbors: int = 15, min_dist: float = 0.1, metric: str = 'euclidean', n_epochs: int | None = None, batch_size: int = 1000, verbose: bool = False, **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Parametric UMAP reducer.

Parametric UMAP learns a neural network that approximates the UMAP embedding, enabling reusable out-of-sample projection through the trained network.

Parameters:
  • n_components (int, default=2) – Number of embedding dimensions.

  • n_neighbors (int, default=15) – Size of the local neighborhood.

  • min_dist (float, default=0.1) – Effective minimum distance between embedded points.

  • metric (str, default="euclidean") – Metric used for distance computation.

  • n_epochs (int, optional) – Number of training epochs.

  • batch_size (int, default=1000) – Batch size used during training.

  • verbose (bool, default=False) – Whether to print backend training progress.

  • **kwargs (dict) – Additional keyword arguments forwarded to umap.parametric_umap.ParametricUMAP after signature filtering.

model

Fitted Parametric UMAP estimator after fit.

Type:

umap.parametric_umap.ParametricUMAP or None

See also

UMAPReducer

Non-parametric UMAP with graph-based transform support.

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

PHATEReducer

Diffusion-based embedding for continuous trajectories.

IVISReducer

Neural metric-learning-based embedding.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import ParametricUMAPReducer
>>> X = np.random.rand(50, 10).astype(np.float32)
>>> reducer = ParametricUMAPReducer(n_components=2, n_epochs=5, verbose=False)
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:10]).shape
(10, 2)
property capabilities: dict

Return capability metadata for Parametric UMAP.

Returns:

Capability mapping describing Parametric UMAP as a nonlinear stochastic reducer with transform support.

Return type:

dict

n_neighbors = 15
min_dist = 0.1
metric = 'euclidean'
n_epochs = None
batch_size = 1000
verbose = False
fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) ParametricUMAPReducer[source]

Fit Parametric UMAP on the input data.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Optional supervision supported by Parametric UMAP.

Returns:

Fitted reducer instance.

Return type:

ParametricUMAPReducer

Raises:
  • ImportError – If umap-learn is not installed.

  • RuntimeError – If umap-learn is installed but fails during initialization.

transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Project data using the fitted Parametric UMAP model.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Data to project.

Returns:

Low-dimensional embedding coordinates.

Return type:

np.ndarray of shape (n_samples, n_components)

Raises:

RuntimeError – If the reducer has not been fitted.

property loss_history_: list

Training loss history for the parametric model.

Returns:

Recorded loss values across training epochs.

Return type:

list

Raises:

RuntimeError – If the reducer has not been fitted.

save(filepath: str) None[source]

Serialize the fitted reducer with joblib.

Parameters:

filepath (str) – Output path for the serialized reducer.

Raises:

RuntimeError – If the reducer has not been fitted.