coco_pipe.dim_reduction.reducers.neighbor ========================================= .. py:module:: coco_pipe.dim_reduction.reducers.neighbor .. autoapi-nested-parse:: Neighbor-embedding and graph-based reducers. This module provides wrappers for neighborhood-preserving and graph-based nonlinear dimensionality reduction methods, including t-SNE, UMAP, PaCMAP, TriMap, PHATE, and Parametric UMAP. Classes ------- TSNEReducer t-Distributed Stochastic Neighbor Embedding wrapper. UMAPReducer Uniform Manifold Approximation and Projection wrapper. PacmapReducer Pairwise Controlled Manifold Approximation wrapper. TrimapReducer Triplet-based manifold embedding wrapper. PHATEReducer Diffusion-based PHATE embedding wrapper. ParametricUMAPReducer Neural-network-backed Parametric UMAP wrapper. .. rubric:: References .. [1] van der Maaten, L., and Hinton, G. (2008). "Visualizing data using t-SNE". Journal of Machine Learning Research, 9, 2579-2605. .. [2] McInnes, L., Healy, J., and Melville, J. (2018). "UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction". arXiv. .. [3] Wang, Y., et al. (2021). "PaCMAP: Pairwise Controlled Manifold Approximation". Journal of Machine Learning Research, 22(201), 1-47. .. [4] Amid, E., and Warmuth, M. K. (2019). "TriMap: Large-scale Dimensionality Reduction Using Triplets". arXiv. .. [5] Moon, K. R., et al. (2019). "Visualizing structure and transitions in high-dimensional biological data". Nature Biotechnology, 37, 1482-1492. Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca) Sina Esmaeili (sina.esmaeili@umontreal.ca) Classes ------- .. autoapisummary:: coco_pipe.dim_reduction.reducers.neighbor.TSNEReducer coco_pipe.dim_reduction.reducers.neighbor.UMAPReducer coco_pipe.dim_reduction.reducers.neighbor.PacmapReducer coco_pipe.dim_reduction.reducers.neighbor.TrimapReducer coco_pipe.dim_reduction.reducers.neighbor.PHATEReducer coco_pipe.dim_reduction.reducers.neighbor.ParametricUMAPReducer Module Contents --------------- .. py:class:: TSNEReducer(n_components: int = 2, **kwargs) Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer` t-SNE reducer. t-Distributed Stochastic Neighbor Embedding (t-SNE) is a neighborhood- preserving method designed primarily for visualization. It optimizes a low-dimensional embedding by matching pairwise similarities between the original space and the embedding. :param n_components: Number of embedding dimensions. :type n_components: int, default=2 :param \*\*kwargs: Additional keyword arguments forwarded to `sklearn.manifold.TSNE` after signature filtering. Common options include `perplexity`, `learning_rate`, `max_iter`, `init`, and `random_state`. :type \*\*kwargs: dict .. attribute:: embedding_ Learned training-set embedding after `fit` or `fit_transform`. :type: np.ndarray or None .. attribute:: model Fitted t-SNE estimator after `fit` or `fit_transform`. :type: sklearn.manifold.TSNE or None .. rubric:: Notes `transform` is not supported because scikit-learn t-SNE does not provide an out-of-sample projection API. .. seealso:: :obj:`UMAPReducer` Nonlinear graph-based embedding with transform support. :obj:`PacmapReducer` Nonlinear embedding balancing local and global structure. :obj:`TrimapReducer` Nonlinear triplet-based embedding preserving global layout. :obj:`PHATEReducer` Diffusion-based embedding for continuous trajectories. :obj:`PCAReducer` Linear baseline for global variance preservation. :obj:`IsomapReducer` Nonlinear geodesic-distance manifold embedding. .. rubric:: Examples >>> import numpy as np >>> from coco_pipe.dim_reduction import TSNEReducer >>> X = np.random.rand(100, 10) >>> reducer = TSNEReducer(n_components=2, perplexity=20, random_state=42) >>> embedding = reducer.fit_transform(X) >>> embedding.shape (100, 2) >>> reducer.get_quality_metadata()["kl_divergence_"] >= 0 True >>> _ = reducer.fit(X) >>> reducer.embedding_.shape (100, 2) .. py:property:: capabilities :type: dict Return capability metadata for t-SNE. :returns: Capability mapping describing t-SNE as a nonlinear stochastic reducer without out-of-sample transform support. :rtype: dict .. py:attribute:: embedding_ :value: None .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> TSNEReducer Fit t-SNE on the input data. :param X: Training data. :type X: ArrayLike of shape (n_samples, n_features) :param y: Ignored. Present for API compatibility. :type y: ArrayLike, optional :returns: Fitted reducer instance. :rtype: TSNEReducer .. rubric:: Examples >>> import numpy as np >>> from coco_pipe.dim_reduction import TSNEReducer >>> X = np.random.rand(30, 6) >>> reducer = TSNEReducer(n_components=2, perplexity=5, max_iter=250) >>> _ = reducer.fit(X) >>> reducer.model is not None True .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> numpy.ndarray :abstractmethod: Raise because t-SNE does not support out-of-sample transformation. :param X: Ignored input included for API compatibility. :type X: ArrayLike :raises NotImplementedError: Always raised because t-SNE does not support transforming new data. .. py:method:: fit_transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> numpy.ndarray Fit t-SNE and return the embedding coordinates. :param X: Training data. :type X: ArrayLike of shape (n_samples, n_features) :param y: Ignored. Present for API compatibility. :type y: ArrayLike, optional :returns: Embedded coordinates produced by t-SNE. :rtype: np.ndarray of shape (n_samples, n_components) .. py:class:: UMAPReducer(n_components: int = 2, **kwargs) Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer` UMAP reducer. Uniform Manifold Approximation and Projection (UMAP) constructs a graph in the high-dimensional space and optimizes a low-dimensional representation of that graph. Unlike t-SNE, UMAP supports out-of-sample transformation. :param n_components: Number of embedding dimensions. :type n_components: int, default=2 :param \*\*kwargs: Additional keyword arguments forwarded to `umap.UMAP` after signature filtering. Common options include `n_neighbors`, `min_dist`, `metric`, and `random_state`. :type \*\*kwargs: dict .. attribute:: model Fitted UMAP estimator after `fit`. :type: umap.UMAP or None .. seealso:: :obj:`TSNEReducer` Nonlinear neighborhood-preserving visualization method. :obj:`PacmapReducer` Nonlinear embedding balancing local and global structure. :obj:`TrimapReducer` Nonlinear triplet-based embedding preserving global layout. :obj:`PHATEReducer` Diffusion-based embedding for continuous trajectories. :obj:`IsomapReducer` Nonlinear geodesic-distance manifold embedding. :obj:`PCAReducer` Linear baseline for global variance preservation. .. rubric:: Examples >>> import numpy as np >>> from coco_pipe.dim_reduction import UMAPReducer >>> X = np.random.rand(100, 10) >>> reducer = UMAPReducer(n_components=2, n_neighbors=10, random_state=42) >>> _ = reducer.fit(X) >>> reducer.transform(X[:10]).shape (10, 2) >>> reducer.get_diagnostics()["graph_"] is not None True >>> reducer.fit_transform(X).shape (100, 2) .. py:property:: capabilities :type: dict Return capability metadata for UMAP. :returns: Capability mapping describing UMAP as a nonlinear stochastic reducer with transform support and a native plotting path. :rtype: dict .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> UMAPReducer Fit UMAP on the input data. :param X: Training data. :type X: ArrayLike of shape (n_samples, n_features) :param y: Optional supervision supported by UMAP. :type y: ArrayLike, optional :returns: Fitted reducer instance. :rtype: UMAPReducer :raises ImportError: If `umap-learn` is not installed. :raises RuntimeError: If `umap-learn` is installed but fails during initialization. .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> numpy.ndarray Project data using the fitted UMAP model. :param X: Data to project. :type X: ArrayLike of shape (n_samples, n_features) :returns: Low-dimensional embedding coordinates. :rtype: np.ndarray of shape (n_samples, n_components) :raises RuntimeError: If the reducer has not been fitted. .. py:class:: PacmapReducer(n_components: int = 2, n_neighbors: int = 10, MN_ratio: float = 0.5, FP_ratio: float = 2.0, nn_backend: str = 'faiss', init: str = 'pca', **kwargs) Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer` PaCMAP reducer. Pairwise Controlled Manifold Approximation (PaCMAP) preserves local and global structure by balancing near, mid-near, and far pairs during the optimization. :param n_components: Number of embedding dimensions. :type n_components: int, default=2 :param n_neighbors: Number of neighbors used to form local pairs. :type n_neighbors: int, default=10 :param MN_ratio: Ratio of mid-near pairs. :type MN_ratio: float, default=0.5 :param FP_ratio: Ratio of far pairs. :type FP_ratio: float, default=2.0 :param nn_backend: Nearest-neighbor backend used by recent PaCMAP versions. Older PaCMAP releases that do not expose this argument will ignore it through signature filtering. :type nn_backend: {"faiss", "annoy", "voyager"}, default="faiss" :param init: Initialization strategy passed to `fit_transform`. :type init: str, default="pca" :param \*\*kwargs: Additional keyword arguments forwarded to `pacmap.PaCMAP` after signature filtering. :type \*\*kwargs: dict .. attribute:: embedding_ Learned training-set embedding after `fit` or `fit_transform`. :type: np.ndarray or None .. attribute:: model Fitted PaCMAP estimator after `fit` or `fit_transform`. :type: pacmap.PaCMAP or None .. rubric:: Notes `transform` is not supported because PaCMAP does not provide an efficient out-of-sample projection API. .. seealso:: :obj:`UMAPReducer` Nonlinear graph-based embedding with transform support. :obj:`TrimapReducer` Nonlinear triplet-based embedding preserving global layout. :obj:`TSNEReducer` Nonlinear neighborhood-preserving visualization method. :obj:`PHATEReducer` Diffusion-based embedding for continuous trajectories. :obj:`PCAReducer` Linear baseline for global variance preservation. .. rubric:: Examples >>> import numpy as np >>> from coco_pipe.dim_reduction import PacmapReducer >>> X = np.random.rand(100, 10) >>> reducer = PacmapReducer( ... n_components=2, ... n_neighbors=10, ... nn_backend="faiss", ... init="random", ... ) >>> embedding = reducer.fit_transform(X) >>> embedding.shape (100, 2) >>> reducer.embedding_.shape (100, 2) .. py:property:: capabilities :type: dict Return capability metadata for PaCMAP. :returns: Capability mapping describing PaCMAP as a nonlinear stochastic reducer without out-of-sample transform support. :rtype: dict .. py:attribute:: n_neighbors :value: 10 .. py:attribute:: MN_ratio :value: 0.5 .. py:attribute:: FP_ratio :value: 2.0 .. py:attribute:: nn_backend :value: 'faiss' .. py:attribute:: init :value: 'pca' .. py:attribute:: embedding_ :value: None .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> PacmapReducer Fit PaCMAP on the input data. :param X: Training data. :type X: ArrayLike of shape (n_samples, n_features) :param y: Ignored. Present for API compatibility. :type y: ArrayLike, optional :returns: Fitted reducer instance. :rtype: PacmapReducer :raises ImportError: If `pacmap` is not installed. :raises RuntimeError: If `pacmap` is installed but fails during initialization. .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> numpy.ndarray :abstractmethod: Raise because PaCMAP does not support out-of-sample transformation. :param X: Ignored input included for API compatibility. :type X: ArrayLike :raises NotImplementedError: Always raised because PaCMAP does not support transforming new data without refitting. .. py:method:: fit_transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> numpy.ndarray Fit PaCMAP and return the embedding coordinates. :param X: Training data. :type X: ArrayLike of shape (n_samples, n_features) :param y: Ignored. Present for API compatibility. :type y: ArrayLike, optional :returns: Embedded coordinates produced by PaCMAP. :rtype: np.ndarray of shape (n_samples, n_components) .. py:class:: TrimapReducer(n_components: int = 2, n_inliers: int = 10, n_outliers: int = 5, n_random: int = 5, **kwargs) Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer` TriMap reducer. TriMap uses triplet constraints to preserve relative similarities while emphasizing global layout preservation. :param n_components: Number of embedding dimensions. :type n_components: int, default=2 :param n_inliers: Number of nearest-neighbor inlier triplets. :type n_inliers: int, default=10 :param n_outliers: Number of outlier triplets. :type n_outliers: int, default=5 :param n_random: Number of random triplets per sample. :type n_random: int, default=5 :param \*\*kwargs: Additional keyword arguments forwarded to `trimap.TRIMAP` after signature filtering. :type \*\*kwargs: dict .. attribute:: embedding_ Learned training-set embedding after `fit` or `fit_transform`. :type: np.ndarray or None .. attribute:: model Fitted TriMap estimator after `fit` or `fit_transform`. :type: trimap.TRIMAP or None .. rubric:: Notes `transform` is not supported because TriMap does not provide an out-of-sample projection API. .. seealso:: :obj:`UMAPReducer` Nonlinear graph-based embedding with transform support. :obj:`PacmapReducer` Nonlinear embedding balancing local and global structure. :obj:`TSNEReducer` Nonlinear neighborhood-preserving visualization method. :obj:`PHATEReducer` Diffusion-based embedding for continuous trajectories. :obj:`IsomapReducer` Nonlinear geodesic-distance manifold embedding. .. rubric:: Examples >>> import numpy as np >>> from coco_pipe.dim_reduction import TrimapReducer >>> X = np.random.rand(100, 10) >>> reducer = TrimapReducer(n_components=2) >>> reducer.fit_transform(X).shape (100, 2) .. py:property:: capabilities :type: dict Return capability metadata for TriMap. :returns: Capability mapping describing TriMap as a nonlinear stochastic reducer without out-of-sample transform support. :rtype: dict .. py:attribute:: n_inliers :value: 10 .. py:attribute:: n_outliers :value: 5 .. py:attribute:: n_random :value: 5 .. py:attribute:: embedding_ :value: None .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> TrimapReducer Fit TriMap on the input data. :param X: Training data. :type X: ArrayLike of shape (n_samples, n_features) :param y: Ignored. Present for API compatibility. :type y: ArrayLike, optional :returns: Fitted reducer instance. :rtype: TrimapReducer :raises ImportError: If `trimap` is not installed. :raises RuntimeError: If `trimap` is installed but fails during initialization. .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> numpy.ndarray :abstractmethod: Raise because TriMap does not support out-of-sample transformation. :param X: Ignored input included for API compatibility. :type X: ArrayLike :raises NotImplementedError: Always raised because TriMap does not support transforming new data without refitting. .. py:method:: fit_transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> numpy.ndarray Fit TriMap and return the embedding coordinates. :param X: Training data. :type X: ArrayLike of shape (n_samples, n_features) :param y: Ignored. Present for API compatibility. :type y: ArrayLike, optional :returns: Embedded coordinates produced by TriMap. :rtype: np.ndarray of shape (n_samples, n_components) .. py:class:: PHATEReducer(n_components: int = 2, knn: int = 5, decay: int = 40, t: Any = 'auto', **kwargs) Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer` PHATE reducer. Potential of Heat-diffusion for Affinity-based Transition Embedding (PHATE) is designed for data with continuous progression structure and uses diffusion-based distances to construct the embedding. :param n_components: Number of embedding dimensions. :type n_components: int, default=2 :param knn: Number of nearest neighbors used in the kernel graph. :type knn: int, default=5 :param decay: Decay rate for the kernel. :type decay: int, default=40 :param t: Diffusion time. :type t: int or str, default="auto" :param \*\*kwargs: Additional keyword arguments forwarded to `phate.PHATE` after signature filtering. :type \*\*kwargs: dict .. attribute:: model Fitted PHATE estimator after `fit`. :type: phate.PHATE or None .. seealso:: :obj:`UMAPReducer` Nonlinear graph-based embedding with transform support. :obj:`TSNEReducer` Nonlinear neighborhood-preserving visualization method. :obj:`PacmapReducer` Nonlinear embedding balancing local and global structure. :obj:`TrimapReducer` Nonlinear triplet-based embedding preserving global layout. :obj:`ParametricUMAPReducer` Neural-network-backed UMAP approximation. .. rubric:: Examples >>> import numpy as np >>> from coco_pipe.dim_reduction import PHATEReducer >>> X = np.random.rand(100, 10) >>> reducer = PHATEReducer(n_components=2, knn=5) >>> _ = reducer.fit(X) >>> reducer.transform(X[:10]).shape (10, 2) >>> reducer.get_diagnostics()["diff_potential"] is not None True .. py:property:: capabilities :type: dict Return capability metadata for PHATE. :returns: Capability mapping describing PHATE as a nonlinear reducer with transform support and a native plotting path. :rtype: dict .. py:attribute:: knn :value: 5 .. py:attribute:: decay :value: 40 .. py:attribute:: t :value: 'auto' .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> PHATEReducer Fit PHATE on the input data. :param X: Training data. :type X: ArrayLike of shape (n_samples, n_features) :param y: Ignored. Present for API compatibility. :type y: ArrayLike, optional :returns: Fitted reducer instance. :rtype: PHATEReducer :raises ImportError: If `phate` is not installed. :raises RuntimeError: If `phate` is installed but fails during initialization. .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> numpy.ndarray Project data using the fitted PHATE model. :param X: Data to project. :type X: ArrayLike of shape (n_samples, n_features) :returns: Low-dimensional embedding coordinates. :rtype: np.ndarray of shape (n_samples, n_components) :raises RuntimeError: If the reducer has not been fitted. .. py:class:: ParametricUMAPReducer(n_components: int = 2, n_neighbors: int = 15, min_dist: float = 0.1, metric: str = 'euclidean', n_epochs: Optional[int] = None, batch_size: int = 1000, verbose: bool = False, **kwargs) Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer` Parametric UMAP reducer. Parametric UMAP learns a neural network that approximates the UMAP embedding, enabling reusable out-of-sample projection through the trained network. :param n_components: Number of embedding dimensions. :type n_components: int, default=2 :param n_neighbors: Size of the local neighborhood. :type n_neighbors: int, default=15 :param min_dist: Effective minimum distance between embedded points. :type min_dist: float, default=0.1 :param metric: Metric used for distance computation. :type metric: str, default="euclidean" :param n_epochs: Number of training epochs. :type n_epochs: int, optional :param batch_size: Batch size used during training. :type batch_size: int, default=1000 :param verbose: Whether to print backend training progress. :type verbose: bool, default=False :param \*\*kwargs: Additional keyword arguments forwarded to `umap.parametric_umap.ParametricUMAP` after signature filtering. :type \*\*kwargs: dict .. attribute:: model Fitted Parametric UMAP estimator after `fit`. :type: umap.parametric_umap.ParametricUMAP or None .. seealso:: :obj:`UMAPReducer` Non-parametric UMAP with graph-based transform support. :obj:`TSNEReducer` Nonlinear neighborhood-preserving visualization method. :obj:`PHATEReducer` Diffusion-based embedding for continuous trajectories. :obj:`IVISReducer` Neural metric-learning-based embedding. .. rubric:: Examples >>> import numpy as np >>> from coco_pipe.dim_reduction import ParametricUMAPReducer >>> X = np.random.rand(50, 10).astype(np.float32) >>> reducer = ParametricUMAPReducer(n_components=2, n_epochs=5, verbose=False) >>> _ = reducer.fit(X) >>> reducer.transform(X[:10]).shape (10, 2) .. py:property:: capabilities :type: dict Return capability metadata for Parametric UMAP. :returns: Capability mapping describing Parametric UMAP as a nonlinear stochastic reducer with transform support. :rtype: dict .. py:attribute:: n_neighbors :value: 15 .. py:attribute:: min_dist :value: 0.1 .. py:attribute:: metric :value: 'euclidean' .. py:attribute:: n_epochs :value: None .. py:attribute:: batch_size :value: 1000 .. py:attribute:: verbose :value: False .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> ParametricUMAPReducer Fit Parametric UMAP on the input data. :param X: Training data. :type X: ArrayLike of shape (n_samples, n_features) :param y: Optional supervision supported by Parametric UMAP. :type y: ArrayLike, optional :returns: Fitted reducer instance. :rtype: ParametricUMAPReducer :raises ImportError: If `umap-learn` is not installed. :raises RuntimeError: If `umap-learn` is installed but fails during initialization. .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> numpy.ndarray Project data using the fitted Parametric UMAP model. :param X: Data to project. :type X: ArrayLike of shape (n_samples, n_features) :returns: Low-dimensional embedding coordinates. :rtype: np.ndarray of shape (n_samples, n_components) :raises RuntimeError: If the reducer has not been fitted. .. py:property:: loss_history_ :type: list Training loss history for the parametric model. :returns: Recorded loss values across training epochs. :rtype: list :raises RuntimeError: If the reducer has not been fitted. .. py:method:: save(filepath: str) -> None Serialize the fitted reducer with joblib. :param filepath: Output path for the serialized reducer. :type filepath: str :raises RuntimeError: If the reducer has not been fitted.