coco_pipe.dim_reduction.evaluation.metrics

Rank-based dimensionality reduction quality metrics.

This module provides co-ranking-matrix metrics for comparing high-dimensional data with a low-dimensional embedding. The implementations are reducer-agnostic and operate directly on NumPy arrays.

Functions

compute_coranking_matrix

Compute the co-ranking matrix between the original and embedded spaces.

trustworthiness

Measure how well original neighbors remain neighbors after embedding.

continuity

Measure how well embedded neighbors are close in the original space.

lcmc

Compute the local continuity meta-criterion.

compute_mrre

Compute mean relative rank errors for intrusions and extrusions.

shepard_diagram_data

Sample pairwise distances for Shepard-diagram visualization.

References

Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca)

Functions

compute_coranking_matrix(→ numpy.ndarray)

Compute the co-ranking matrix between two sample spaces.

trustworthiness(→ float)

Compute trustworthiness from a co-ranking matrix.

continuity(→ float)

Compute continuity from a co-ranking matrix.

lcmc(→ float)

Compute the local continuity meta-criterion (LCMC).

compute_mrre(→ Tuple[float, float])

Compute mean relative rank errors (MRRE).

shepard_diagram_data(→ Tuple[numpy.ndarray, numpy.ndarray])

Compute sampled pairwise distances for a Shepard diagram.

Module Contents

coco_pipe.dim_reduction.evaluation.metrics.compute_coranking_matrix(X: numpy.ndarray, X_emb: numpy.ndarray) numpy.ndarray[source]

Compute the co-ranking matrix between two sample spaces.

The co-ranking matrix Q counts how often each point pair appears with high-dimensional rank k and low-dimensional rank l. Self-neighbors are excluded from the rank construction.

Parameters:
  • X (np.ndarray of shape (n_samples, n_features)) – Original high-dimensional data.

  • X_emb (np.ndarray of shape (n_samples, n_components)) – Low-dimensional embedding of the same samples.

Returns:

Integer co-ranking matrix where Q[k, l] corresponds to ranks k + 1 and l + 1 in the original and embedded spaces.

Return type:

np.ndarray of shape (n_samples - 1, n_samples - 1)

Raises:

ValueError – If the inputs are not two-dimensional, do not share the same sample count, or contain fewer than two samples.

See also

trustworthiness

Compute intrusion-based neighborhood preservation.

continuity

Compute extrusion-based neighborhood preservation.

lcmc

Compute the local continuity meta-criterion.

Examples

>>> import numpy as np
>>> X = np.array([[0.0], [1.0], [2.0]])
>>> X_emb = np.array([[0.0], [2.0], [4.0]])
>>> Q = compute_coranking_matrix(X, X_emb)
>>> Q.shape
(2, 2)
coco_pipe.dim_reduction.evaluation.metrics.trustworthiness(Q: numpy.ndarray, k: int) float[source]

Compute trustworthiness from a co-ranking matrix.

Trustworthiness penalizes intrusions, i.e. points that appear among the k nearest neighbors in the embedding but were farther away in the original space.

Parameters:
  • Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.

  • k (int) – Neighborhood size. The normalization used by trustworthiness requires 2 * n_samples - 3 * k - 1 > 0.

Returns:

Trustworthiness score in [0, 1]. Higher is better.

Return type:

float

Raises:

ValueError – If Q is invalid or if k falls outside the valid domain.

See also

continuity

Complementary extrusion-based metric.

compute_coranking_matrix

Construct the required co-ranking matrix.

Examples

>>> import numpy as np
>>> Q = np.diag([1, 1, 1, 1])
>>> trustworthiness(Q, k=1)
1.0
coco_pipe.dim_reduction.evaluation.metrics.continuity(Q: numpy.ndarray, k: int) float[source]

Compute continuity from a co-ranking matrix.

Continuity penalizes extrusions, i.e. points that are among the k nearest neighbors in the original space but are pushed farther away in the embedding.

Parameters:
  • Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.

  • k (int) – Neighborhood size. The normalization used by continuity requires 2 * n_samples - 3 * k - 1 > 0.

Returns:

Continuity score in [0, 1]. Higher is better.

Return type:

float

Raises:

ValueError – If Q is invalid or if k falls outside the valid domain.

See also

trustworthiness

Complementary intrusion-based metric.

compute_coranking_matrix

Construct the required co-ranking matrix.

Examples

>>> import numpy as np
>>> Q = np.diag([1, 1, 1, 1])
>>> continuity(Q, k=1)
1.0
coco_pipe.dim_reduction.evaluation.metrics.lcmc(Q: numpy.ndarray, k: int) float[source]

Compute the local continuity meta-criterion (LCMC).

Parameters:
  • Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.

  • k (int) – Neighborhood size.

Returns:

LCMC score. Higher is better.

Return type:

float

Raises:

ValueError – If Q is invalid or if k falls outside the valid domain.

See also

trustworthiness

Neighbor-preservation metric.

continuity

Neighbor-consistency metric.

Examples

>>> import numpy as np
>>> Q = np.diag([1, 1, 1, 1])
>>> isinstance(lcmc(Q, k=1), float)
True
coco_pipe.dim_reduction.evaluation.metrics.compute_mrre(Q: numpy.ndarray, k: int) Tuple[float, float][source]

Compute mean relative rank errors (MRRE).

Both intrusion and extrusion MRRE are returned. These are error metrics, so lower values are better and 0 indicates perfect rank preservation.

Parameters:
  • Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.

  • k (int) – Neighborhood size.

Returns:

(mrre_intrusion, mrre_extrusion).

Return type:

tuple[float, float]

Raises:

ValueError – If Q is invalid or if k falls outside the valid domain.

See also

trustworthiness

Intrusion-sensitive preservation score.

continuity

Extrusion-sensitive preservation score.

Examples

>>> import numpy as np
>>> Q = np.diag([1, 1, 1, 1])
>>> compute_mrre(Q, k=1)
(0.0, 0.0)
coco_pipe.dim_reduction.evaluation.metrics.shepard_diagram_data(X: numpy.ndarray, X_embedded: numpy.ndarray, sample_size: int = 1000, random_state: int | None = None) Tuple[numpy.ndarray, numpy.ndarray][source]

Compute sampled pairwise distances for a Shepard diagram.

Parameters:
  • X (np.ndarray of shape (n_samples, n_features)) – Original high-dimensional data.

  • X_embedded (np.ndarray of shape (n_samples, n_components)) – Low-dimensional embedding of the same samples.

  • sample_size (int, default=1000) – Number of samples to keep before computing pairwise distances. If sample_size is at least n_samples, all samples are used.

  • random_state (int, optional) – Random seed used when subsampling.

Returns:

Pairwise distances in the original and embedded spaces.

Return type:

tuple[np.ndarray, np.ndarray]

Raises:

ValueError – If the inputs are invalid or if sample_size <= 1.

See also

compute_coranking_matrix

Rank-based global quality summary.

Examples

>>> import numpy as np
>>> X = np.random.RandomState(0).rand(10, 3)
>>> X_emb = X[:, :2]
>>> d_orig, d_emb = shepard_diagram_data(X, X_emb, sample_size=5, random_state=0)
>>> len(d_orig) == len(d_emb)
True