coco_pipe.dim_reduction.evaluation.metrics ========================================== .. py:module:: coco_pipe.dim_reduction.evaluation.metrics .. autoapi-nested-parse:: Rank-based dimensionality reduction quality metrics. This module provides co-ranking-matrix metrics for comparing high-dimensional data with a low-dimensional embedding. The implementations are reducer-agnostic and operate directly on NumPy arrays. Functions --------- compute_coranking_matrix Compute the co-ranking matrix between the original and embedded spaces. trustworthiness Measure how well original neighbors remain neighbors after embedding. continuity Measure how well embedded neighbors are close in the original space. lcmc Compute the local continuity meta-criterion. compute_mrre Compute mean relative rank errors for intrusions and extrusions. shepard_diagram_data Sample pairwise distances for Shepard-diagram visualization. .. rubric:: References .. [1] Lee, J. A., & Verleysen, M. (2009). Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing. Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca) Functions --------- .. autoapisummary:: coco_pipe.dim_reduction.evaluation.metrics.compute_coranking_matrix coco_pipe.dim_reduction.evaluation.metrics.trustworthiness coco_pipe.dim_reduction.evaluation.metrics.continuity coco_pipe.dim_reduction.evaluation.metrics.lcmc coco_pipe.dim_reduction.evaluation.metrics.compute_mrre coco_pipe.dim_reduction.evaluation.metrics.shepard_diagram_data Module Contents --------------- .. py:function:: compute_coranking_matrix(X: numpy.ndarray, X_emb: numpy.ndarray) -> numpy.ndarray Compute the co-ranking matrix between two sample spaces. The co-ranking matrix ``Q`` counts how often each point pair appears with high-dimensional rank ``k`` and low-dimensional rank ``l``. Self-neighbors are excluded from the rank construction. :param X: Original high-dimensional data. :type X: np.ndarray of shape (n_samples, n_features) :param X_emb: Low-dimensional embedding of the same samples. :type X_emb: np.ndarray of shape (n_samples, n_components) :returns: Integer co-ranking matrix where ``Q[k, l]`` corresponds to ranks ``k + 1`` and ``l + 1`` in the original and embedded spaces. :rtype: np.ndarray of shape (n_samples - 1, n_samples - 1) :raises ValueError: If the inputs are not two-dimensional, do not share the same sample count, or contain fewer than two samples. .. seealso:: :obj:`trustworthiness` Compute intrusion-based neighborhood preservation. :obj:`continuity` Compute extrusion-based neighborhood preservation. :obj:`lcmc` Compute the local continuity meta-criterion. .. rubric:: Examples >>> import numpy as np >>> X = np.array([[0.0], [1.0], [2.0]]) >>> X_emb = np.array([[0.0], [2.0], [4.0]]) >>> Q = compute_coranking_matrix(X, X_emb) >>> Q.shape (2, 2) .. py:function:: trustworthiness(Q: numpy.ndarray, k: int) -> float Compute trustworthiness from a co-ranking matrix. Trustworthiness penalizes intrusions, i.e. points that appear among the ``k`` nearest neighbors in the embedding but were farther away in the original space. :param Q: Co-ranking matrix. :type Q: np.ndarray of shape (n_samples - 1, n_samples - 1) :param k: Neighborhood size. The normalization used by trustworthiness requires ``2 * n_samples - 3 * k - 1 > 0``. :type k: int :returns: Trustworthiness score in ``[0, 1]``. Higher is better. :rtype: float :raises ValueError: If ``Q`` is invalid or if ``k`` falls outside the valid domain. .. seealso:: :obj:`continuity` Complementary extrusion-based metric. :obj:`compute_coranking_matrix` Construct the required co-ranking matrix. .. rubric:: Examples >>> import numpy as np >>> Q = np.diag([1, 1, 1, 1]) >>> trustworthiness(Q, k=1) 1.0 .. py:function:: continuity(Q: numpy.ndarray, k: int) -> float Compute continuity from a co-ranking matrix. Continuity penalizes extrusions, i.e. points that are among the ``k`` nearest neighbors in the original space but are pushed farther away in the embedding. :param Q: Co-ranking matrix. :type Q: np.ndarray of shape (n_samples - 1, n_samples - 1) :param k: Neighborhood size. The normalization used by continuity requires ``2 * n_samples - 3 * k - 1 > 0``. :type k: int :returns: Continuity score in ``[0, 1]``. Higher is better. :rtype: float :raises ValueError: If ``Q`` is invalid or if ``k`` falls outside the valid domain. .. seealso:: :obj:`trustworthiness` Complementary intrusion-based metric. :obj:`compute_coranking_matrix` Construct the required co-ranking matrix. .. rubric:: Examples >>> import numpy as np >>> Q = np.diag([1, 1, 1, 1]) >>> continuity(Q, k=1) 1.0 .. py:function:: lcmc(Q: numpy.ndarray, k: int) -> float Compute the local continuity meta-criterion (LCMC). :param Q: Co-ranking matrix. :type Q: np.ndarray of shape (n_samples - 1, n_samples - 1) :param k: Neighborhood size. :type k: int :returns: LCMC score. Higher is better. :rtype: float :raises ValueError: If ``Q`` is invalid or if ``k`` falls outside the valid domain. .. seealso:: :obj:`trustworthiness` Neighbor-preservation metric. :obj:`continuity` Neighbor-consistency metric. .. rubric:: Examples >>> import numpy as np >>> Q = np.diag([1, 1, 1, 1]) >>> isinstance(lcmc(Q, k=1), float) True .. py:function:: compute_mrre(Q: numpy.ndarray, k: int) -> Tuple[float, float] Compute mean relative rank errors (MRRE). Both intrusion and extrusion MRRE are returned. These are error metrics, so lower values are better and ``0`` indicates perfect rank preservation. :param Q: Co-ranking matrix. :type Q: np.ndarray of shape (n_samples - 1, n_samples - 1) :param k: Neighborhood size. :type k: int :returns: ``(mrre_intrusion, mrre_extrusion)``. :rtype: tuple[float, float] :raises ValueError: If ``Q`` is invalid or if ``k`` falls outside the valid domain. .. seealso:: :obj:`trustworthiness` Intrusion-sensitive preservation score. :obj:`continuity` Extrusion-sensitive preservation score. .. rubric:: Examples >>> import numpy as np >>> Q = np.diag([1, 1, 1, 1]) >>> compute_mrre(Q, k=1) (0.0, 0.0) .. py:function:: shepard_diagram_data(X: numpy.ndarray, X_embedded: numpy.ndarray, sample_size: int = 1000, random_state: Optional[int] = None) -> Tuple[numpy.ndarray, numpy.ndarray] Compute sampled pairwise distances for a Shepard diagram. :param X: Original high-dimensional data. :type X: np.ndarray of shape (n_samples, n_features) :param X_embedded: Low-dimensional embedding of the same samples. :type X_embedded: np.ndarray of shape (n_samples, n_components) :param sample_size: Number of samples to keep before computing pairwise distances. If ``sample_size`` is at least ``n_samples``, all samples are used. :type sample_size: int, default=1000 :param random_state: Random seed used when subsampling. :type random_state: int, optional :returns: Pairwise distances in the original and embedded spaces. :rtype: tuple[np.ndarray, np.ndarray] :raises ValueError: If the inputs are invalid or if ``sample_size <= 1``. .. seealso:: :obj:`compute_coranking_matrix` Rank-based global quality summary. .. rubric:: Examples >>> import numpy as np >>> X = np.random.RandomState(0).rand(10, 3) >>> X_emb = X[:, :2] >>> d_orig, d_emb = shepard_diagram_data(X, X_emb, sample_size=5, random_state=0) >>> len(d_orig) == len(d_emb) True