coco_pipe.dim_reduction.evaluation.metrics¶
Rank-based dimensionality reduction quality metrics.
This module provides co-ranking-matrix metrics for comparing high-dimensional data with a low-dimensional embedding. The implementations are reducer-agnostic and operate directly on NumPy arrays.
Functions¶
- compute_coranking_matrix
Compute the co-ranking matrix between the original and embedded spaces.
- trustworthiness
Measure how well original neighbors remain neighbors after embedding.
- continuity
Measure how well embedded neighbors are close in the original space.
- lcmc
Compute the local continuity meta-criterion.
- compute_mrre
Compute mean relative rank errors for intrusions and extrusions.
- shepard_diagram_data
Sample pairwise distances for Shepard-diagram visualization.
References
Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca)
Functions¶
|
Compute the co-ranking matrix between two sample spaces. |
|
Compute trustworthiness from a co-ranking matrix. |
|
Compute continuity from a co-ranking matrix. |
|
Compute the local continuity meta-criterion (LCMC). |
|
Compute mean relative rank errors (MRRE). |
|
Compute sampled pairwise distances for a Shepard diagram. |
Module Contents¶
- coco_pipe.dim_reduction.evaluation.metrics.compute_coranking_matrix(X: numpy.ndarray, X_emb: numpy.ndarray) numpy.ndarray[source]¶
Compute the co-ranking matrix between two sample spaces.
The co-ranking matrix
Qcounts how often each point pair appears with high-dimensional rankkand low-dimensional rankl. Self-neighbors are excluded from the rank construction.- Parameters:
X (np.ndarray of shape (n_samples, n_features)) – Original high-dimensional data.
X_emb (np.ndarray of shape (n_samples, n_components)) – Low-dimensional embedding of the same samples.
- Returns:
Integer co-ranking matrix where
Q[k, l]corresponds to ranksk + 1andl + 1in the original and embedded spaces.- Return type:
np.ndarray of shape (n_samples - 1, n_samples - 1)
- Raises:
ValueError – If the inputs are not two-dimensional, do not share the same sample count, or contain fewer than two samples.
See also
trustworthinessCompute intrusion-based neighborhood preservation.
continuityCompute extrusion-based neighborhood preservation.
lcmcCompute the local continuity meta-criterion.
Examples
>>> import numpy as np >>> X = np.array([[0.0], [1.0], [2.0]]) >>> X_emb = np.array([[0.0], [2.0], [4.0]]) >>> Q = compute_coranking_matrix(X, X_emb) >>> Q.shape (2, 2)
- coco_pipe.dim_reduction.evaluation.metrics.trustworthiness(Q: numpy.ndarray, k: int) float[source]¶
Compute trustworthiness from a co-ranking matrix.
Trustworthiness penalizes intrusions, i.e. points that appear among the
knearest neighbors in the embedding but were farther away in the original space.- Parameters:
Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.
k (int) – Neighborhood size. The normalization used by trustworthiness requires
2 * n_samples - 3 * k - 1 > 0.
- Returns:
Trustworthiness score in
[0, 1]. Higher is better.- Return type:
float
- Raises:
ValueError – If
Qis invalid or ifkfalls outside the valid domain.
See also
continuityComplementary extrusion-based metric.
compute_coranking_matrixConstruct the required co-ranking matrix.
Examples
>>> import numpy as np >>> Q = np.diag([1, 1, 1, 1]) >>> trustworthiness(Q, k=1) 1.0
- coco_pipe.dim_reduction.evaluation.metrics.continuity(Q: numpy.ndarray, k: int) float[source]¶
Compute continuity from a co-ranking matrix.
Continuity penalizes extrusions, i.e. points that are among the
knearest neighbors in the original space but are pushed farther away in the embedding.- Parameters:
Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.
k (int) – Neighborhood size. The normalization used by continuity requires
2 * n_samples - 3 * k - 1 > 0.
- Returns:
Continuity score in
[0, 1]. Higher is better.- Return type:
float
- Raises:
ValueError – If
Qis invalid or ifkfalls outside the valid domain.
See also
trustworthinessComplementary intrusion-based metric.
compute_coranking_matrixConstruct the required co-ranking matrix.
Examples
>>> import numpy as np >>> Q = np.diag([1, 1, 1, 1]) >>> continuity(Q, k=1) 1.0
- coco_pipe.dim_reduction.evaluation.metrics.lcmc(Q: numpy.ndarray, k: int) float[source]¶
Compute the local continuity meta-criterion (LCMC).
- Parameters:
Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.
k (int) – Neighborhood size.
- Returns:
LCMC score. Higher is better.
- Return type:
float
- Raises:
ValueError – If
Qis invalid or ifkfalls outside the valid domain.
See also
trustworthinessNeighbor-preservation metric.
continuityNeighbor-consistency metric.
Examples
>>> import numpy as np >>> Q = np.diag([1, 1, 1, 1]) >>> isinstance(lcmc(Q, k=1), float) True
- coco_pipe.dim_reduction.evaluation.metrics.compute_mrre(Q: numpy.ndarray, k: int) Tuple[float, float][source]¶
Compute mean relative rank errors (MRRE).
Both intrusion and extrusion MRRE are returned. These are error metrics, so lower values are better and
0indicates perfect rank preservation.- Parameters:
Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.
k (int) – Neighborhood size.
- Returns:
(mrre_intrusion, mrre_extrusion).- Return type:
tuple[float, float]
- Raises:
ValueError – If
Qis invalid or ifkfalls outside the valid domain.
See also
trustworthinessIntrusion-sensitive preservation score.
continuityExtrusion-sensitive preservation score.
Examples
>>> import numpy as np >>> Q = np.diag([1, 1, 1, 1]) >>> compute_mrre(Q, k=1) (0.0, 0.0)
- coco_pipe.dim_reduction.evaluation.metrics.shepard_diagram_data(X: numpy.ndarray, X_embedded: numpy.ndarray, sample_size: int = 1000, random_state: int | None = None) Tuple[numpy.ndarray, numpy.ndarray][source]¶
Compute sampled pairwise distances for a Shepard diagram.
- Parameters:
X (np.ndarray of shape (n_samples, n_features)) – Original high-dimensional data.
X_embedded (np.ndarray of shape (n_samples, n_components)) – Low-dimensional embedding of the same samples.
sample_size (int, default=1000) – Number of samples to keep before computing pairwise distances. If
sample_sizeis at leastn_samples, all samples are used.random_state (int, optional) – Random seed used when subsampling.
- Returns:
Pairwise distances in the original and embedded spaces.
- Return type:
tuple[np.ndarray, np.ndarray]
- Raises:
ValueError – If the inputs are invalid or if
sample_size <= 1.
See also
compute_coranking_matrixRank-based global quality summary.
Examples
>>> import numpy as np >>> X = np.random.RandomState(0).rand(10, 3) >>> X_emb = X[:, :2] >>> d_orig, d_emb = shepard_diagram_data(X, X_emb, sample_size=5, random_state=0) >>> len(d_orig) == len(d_emb) True