coco_pipe.dim_reduction.evaluation.metrics
==========================================

.. py:module:: coco_pipe.dim_reduction.evaluation.metrics

.. autoapi-nested-parse::

   Rank-based dimensionality reduction quality metrics.

   This module provides co-ranking-matrix metrics for comparing high-dimensional
   data with a low-dimensional embedding. The implementations are reducer-agnostic
   and operate directly on NumPy arrays.

   Functions
   ---------
   compute_coranking_matrix
       Compute the co-ranking matrix between the original and embedded spaces.
   trustworthiness
       Measure how well original neighbors remain neighbors after embedding.
   continuity
       Measure how well embedded neighbors are close in the original space.
   lcmc
       Compute the local continuity meta-criterion.
   compute_mrre
       Compute mean relative rank errors for intrusions and extrusions.
   shepard_diagram_data
       Sample pairwise distances for Shepard-diagram visualization.

   .. rubric:: References

   .. [1] Lee, J. A., & Verleysen, M. (2009). Quality assessment of
          dimensionality reduction: Rank-based criteria. Neurocomputing.

   Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca)


Functions
---------

.. autoapisummary::

   coco_pipe.dim_reduction.evaluation.metrics.compute_coranking_matrix
   coco_pipe.dim_reduction.evaluation.metrics.trustworthiness
   coco_pipe.dim_reduction.evaluation.metrics.continuity
   coco_pipe.dim_reduction.evaluation.metrics.lcmc
   coco_pipe.dim_reduction.evaluation.metrics.compute_mrre
   coco_pipe.dim_reduction.evaluation.metrics.shepard_diagram_data


Module Contents
---------------

.. py:function:: compute_coranking_matrix(X: numpy.ndarray, X_emb: numpy.ndarray) -> numpy.ndarray

   Compute the co-ranking matrix between two sample spaces.

   The co-ranking matrix ``Q`` counts how often each point pair appears with
   high-dimensional rank ``k`` and low-dimensional rank ``l``. Self-neighbors
   are excluded from the rank construction.

   :param X: Original high-dimensional data.
   :type X: np.ndarray of shape (n_samples, n_features)
   :param X_emb: Low-dimensional embedding of the same samples.
   :type X_emb: np.ndarray of shape (n_samples, n_components)

   :returns: Integer co-ranking matrix where ``Q[k, l]`` corresponds to ranks
             ``k + 1`` and ``l + 1`` in the original and embedded spaces.
   :rtype: np.ndarray of shape (n_samples - 1, n_samples - 1)

   :raises ValueError: If the inputs are not two-dimensional, do not share the same sample
       count, or contain fewer than two samples.

   .. seealso::

      :obj:`trustworthiness`
          Compute intrusion-based neighborhood preservation.

      :obj:`continuity`
          Compute extrusion-based neighborhood preservation.

      :obj:`lcmc`
          Compute the local continuity meta-criterion.

   .. rubric:: Examples

   >>> import numpy as np
   >>> X = np.array([[0.0], [1.0], [2.0]])
   >>> X_emb = np.array([[0.0], [2.0], [4.0]])
   >>> Q = compute_coranking_matrix(X, X_emb)
   >>> Q.shape
   (2, 2)


.. py:function:: trustworthiness(Q: numpy.ndarray, k: int) -> float

   Compute trustworthiness from a co-ranking matrix.

   Trustworthiness penalizes intrusions, i.e. points that appear among the
   ``k`` nearest neighbors in the embedding but were farther away in the
   original space.

   :param Q: Co-ranking matrix.
   :type Q: np.ndarray of shape (n_samples - 1, n_samples - 1)
   :param k: Neighborhood size. The normalization used by trustworthiness requires
             ``2 * n_samples - 3 * k - 1 > 0``.
   :type k: int

   :returns: Trustworthiness score in ``[0, 1]``. Higher is better.
   :rtype: float

   :raises ValueError: If ``Q`` is invalid or if ``k`` falls outside the valid domain.

   .. seealso::

      :obj:`continuity`
          Complementary extrusion-based metric.

      :obj:`compute_coranking_matrix`
          Construct the required co-ranking matrix.

   .. rubric:: Examples

   >>> import numpy as np
   >>> Q = np.diag([1, 1, 1, 1])
   >>> trustworthiness(Q, k=1)
   1.0


.. py:function:: continuity(Q: numpy.ndarray, k: int) -> float

   Compute continuity from a co-ranking matrix.

   Continuity penalizes extrusions, i.e. points that are among the
   ``k`` nearest neighbors in the original space but are pushed farther away in
   the embedding.

   :param Q: Co-ranking matrix.
   :type Q: np.ndarray of shape (n_samples - 1, n_samples - 1)
   :param k: Neighborhood size. The normalization used by continuity requires
             ``2 * n_samples - 3 * k - 1 > 0``.
   :type k: int

   :returns: Continuity score in ``[0, 1]``. Higher is better.
   :rtype: float

   :raises ValueError: If ``Q`` is invalid or if ``k`` falls outside the valid domain.

   .. seealso::

      :obj:`trustworthiness`
          Complementary intrusion-based metric.

      :obj:`compute_coranking_matrix`
          Construct the required co-ranking matrix.

   .. rubric:: Examples

   >>> import numpy as np
   >>> Q = np.diag([1, 1, 1, 1])
   >>> continuity(Q, k=1)
   1.0


.. py:function:: lcmc(Q: numpy.ndarray, k: int) -> float

   Compute the local continuity meta-criterion (LCMC).

   :param Q: Co-ranking matrix.
   :type Q: np.ndarray of shape (n_samples - 1, n_samples - 1)
   :param k: Neighborhood size.
   :type k: int

   :returns: LCMC score. Higher is better.
   :rtype: float

   :raises ValueError: If ``Q`` is invalid or if ``k`` falls outside the valid domain.

   .. seealso::

      :obj:`trustworthiness`
          Neighbor-preservation metric.

      :obj:`continuity`
          Neighbor-consistency metric.

   .. rubric:: Examples

   >>> import numpy as np
   >>> Q = np.diag([1, 1, 1, 1])
   >>> isinstance(lcmc(Q, k=1), float)
   True


.. py:function:: compute_mrre(Q: numpy.ndarray, k: int) -> Tuple[float, float]

   Compute mean relative rank errors (MRRE).

   Both intrusion and extrusion MRRE are returned. These are error metrics, so
   lower values are better and ``0`` indicates perfect rank preservation.

   :param Q: Co-ranking matrix.
   :type Q: np.ndarray of shape (n_samples - 1, n_samples - 1)
   :param k: Neighborhood size.
   :type k: int

   :returns: ``(mrre_intrusion, mrre_extrusion)``.
   :rtype: tuple[float, float]

   :raises ValueError: If ``Q`` is invalid or if ``k`` falls outside the valid domain.

   .. seealso::

      :obj:`trustworthiness`
          Intrusion-sensitive preservation score.

      :obj:`continuity`
          Extrusion-sensitive preservation score.

   .. rubric:: Examples

   >>> import numpy as np
   >>> Q = np.diag([1, 1, 1, 1])
   >>> compute_mrre(Q, k=1)
   (0.0, 0.0)


.. py:function:: shepard_diagram_data(X: numpy.ndarray, X_embedded: numpy.ndarray, sample_size: int = 1000, random_state: Optional[int] = None) -> Tuple[numpy.ndarray, numpy.ndarray]

   Compute sampled pairwise distances for a Shepard diagram.

   :param X: Original high-dimensional data.
   :type X: np.ndarray of shape (n_samples, n_features)
   :param X_embedded: Low-dimensional embedding of the same samples.
   :type X_embedded: np.ndarray of shape (n_samples, n_components)
   :param sample_size: Number of samples to keep before computing pairwise distances. If
                       ``sample_size`` is at least ``n_samples``, all samples are used.
   :type sample_size: int, default=1000
   :param random_state: Random seed used when subsampling.
   :type random_state: int, optional

   :returns: Pairwise distances in the original and embedded spaces.
   :rtype: tuple[np.ndarray, np.ndarray]

   :raises ValueError: If the inputs are invalid or if ``sample_size <= 1``.

   .. seealso::

      :obj:`compute_coranking_matrix`
          Rank-based global quality summary.

   .. rubric:: Examples

   >>> import numpy as np
   >>> X = np.random.RandomState(0).rand(10, 3)
   >>> X_emb = X[:, :2]
   >>> d_orig, d_emb = shepard_diagram_data(X, X_emb, sample_size=5, random_state=0)
   >>> len(d_orig) == len(d_emb)
   True