coco_pipe.dim_reduction.analysis
================================

.. py:module:: coco_pipe.dim_reduction.analysis

.. autoapi-nested-parse::

   Feature Attribution and Analysis
   ================================

   Pure attribution and interpretability utilities for dimensionality reduction.

   This module is intentionally separate from the preservation-focused evaluation
   stack. The functions here answer a different question:

   - ``evaluate_embedding(...)`` in :mod:`coco_pipe.dim_reduction.evaluation`
     asks whether an embedding preserves structure well.
   - ``analysis.py`` asks which input features appear to drive an embedding.

   The public surface is explicit and array-first:

   - ``correlate_features(...)`` computes feature-to-dimension correlations.
   - ``perturbation_importance(...)`` measures embedding sensitivity to shuffled
     features.
   - ``gradient_importance(...)`` computes encoder saliency for supported
     torch-based reducers.
   - ``interpret_features(...)`` is a pure backend that combines one or more of
     these analyses and returns normalized payloads plus tidy records for future
     manager/report integration.

   Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca)


Functions
---------

.. autoapisummary::

   coco_pipe.dim_reduction.analysis.correlate_features
   coco_pipe.dim_reduction.analysis.perturbation_importance
   coco_pipe.dim_reduction.analysis.gradient_importance
   coco_pipe.dim_reduction.analysis.interpret_features


Module Contents
---------------

.. py:function:: correlate_features(X_orig: numpy.ndarray, X_emb: numpy.ndarray, feature_names: Sequence[str]) -> Dict[str, Dict[str, float]]

   Compute Spearman correlations between original features and embedding axes.

   :param X_orig: Original data with shape ``(n_samples, n_features)``.
   :type X_orig: np.ndarray
   :param X_emb: Embedded data with shape ``(n_samples, n_dimensions)``.
   :type X_emb: np.ndarray
   :param feature_names: Feature names aligned with the columns of ``X_orig``.
   :type feature_names: sequence of str

   :returns: Nested mapping of dimension names to feature-correlation mappings,
             sorted by descending absolute correlation magnitude within each
             dimension.
   :rtype: dict

   :raises ValueError: If ``X_orig`` or ``X_emb`` is not 2D, if sample counts do not match,
       or if ``feature_names`` has the wrong length.

   .. rubric:: Notes

   Constant features or constant embedding dimensions can yield undefined
   Spearman coefficients. These are reported as ``0.0`` to keep the output
   stable and sortable.

   .. seealso::

      :obj:`perturbation_importance`
          Model-agnostic feature importance by embedding perturbation.

      :obj:`gradient_importance`
          Encoder saliency for supported torch-based reducers.

      :obj:`interpret_features`
          Higher-level backend that packages correlation and importance outputs.

   .. rubric:: Examples

   >>> import numpy as np
   >>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]])
   >>> X_emb = np.array([[0.0, 0.5], [1.0, 0.0], [2.0, 0.5]])
   >>> result = correlate_features(X, X_emb, feature_names=["f1", "f2"])
   >>> sorted(result)
   ['Dimension 1', 'Dimension 2']


.. py:function:: perturbation_importance(model: Any, X: numpy.ndarray, feature_names: Sequence[str], X_emb: numpy.ndarray, n_repeats: int = 5, random_state: Optional[int] = None) -> Dict[str, float]

   Compute model-agnostic feature importance by feature shuffling.

   :param model: Fitted reducer or estimator exposing ``transform(X)``.
   :type model: Any
   :param X: Input data with shape ``(n_samples, n_features)``.
   :type X: np.ndarray
   :param feature_names: Feature names aligned with the columns of ``X``.
   :type feature_names: sequence of str
   :param X_emb: Explicit embedding of ``X`` used as the perturbation reference.
   :type X_emb: np.ndarray
   :param n_repeats: Number of independent shuffles per feature.
   :type n_repeats: int, default=5
   :param random_state: Random seed for reproducible shuffling.
   :type random_state: int, optional

   :returns: Mapping of feature name to normalized importance score. Scores sum to 1
             when the perturbation signal is nonzero; otherwise all scores are 0.
   :rtype: dict

   :raises ValueError: If ``X`` is not 2D, if ``X_emb`` does not align with ``X`` along the
       sample axis, or if ``feature_names`` has the wrong length.

   .. seealso::

      :obj:`correlate_features`
          Cheap feature-to-dimension interpretation based on correlations.

      :obj:`gradient_importance`
          Encoder saliency for supported torch-based reducers.

      :obj:`interpret_features`
          Higher-level backend that packages correlation and importance outputs.

   .. rubric:: Examples

   >>> import numpy as np
   >>> class MockReducer:
   ...     def transform(self, X):
   ...         return X[:, :2]
   >>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]])
   >>> X_emb = X[:, :2]
   >>> scores = perturbation_importance(
   ...     MockReducer(),
   ...     X,
   ...     feature_names=["f1", "f2"],
   ...     X_emb=X_emb,
   ...     n_repeats=1,
   ...     random_state=0,
   ... )
   >>> sorted(scores)
   ['f1', 'f2']


.. py:function:: gradient_importance(wrapper: Any, X: numpy.ndarray, feature_names: Optional[Sequence[str]] = None) -> Dict[str, Any]

   Compute encoder saliency by differentiating embedding magnitude w.r.t. input.

   :param wrapper: Fitted encoder-based reducer wrapper exposing
                   ``get_pytorch_module()``.
   :type wrapper: Any
   :param X: Input array. The sample axis is assumed to be axis 0. Remaining axes are
             treated as feature dimensions.
   :type X: np.ndarray
   :param feature_names: Feature names for 2D inputs. Named outputs are only supported when the
                         reduced saliency is one-dimensional.
   :type feature_names: sequence of str, optional

   :returns: For one-dimensional reduced saliency with names, returns a mapping of
             feature name to normalized importance score. For higher-dimensional
             saliency, returns ``{"importance_matrix": scores}``.
   :rtype: dict

   :raises ValueError: If ``X`` has fewer than 2 dimensions, or if ``feature_names`` is
       incompatible with the reduced saliency shape.

   .. rubric:: Notes

   This function assumes an encoder-based torch wrapper that exposes
   ``get_pytorch_module()`` and an ``encoder`` submodule.

   .. seealso::

      :obj:`perturbation_importance`
          Model-agnostic importance that only requires ``transform``.

      :obj:`correlate_features`
          Cheap feature-to-dimension interpretation from explicit embeddings.

      :obj:`interpret_features`
          Higher-level backend that packages gradient and perturbation outputs.

   .. rubric:: Examples

   >>> import numpy as np
   >>> class Encoder:
   ...     def __call__(self, X):
   ...         return X
   >>> class MockModule:
   ...     def __init__(self):
   ...         self.encoder = Encoder()
   ...     def eval(self):
   ...         return None
   ...     def parameters(self):
   ...         return iter(())
   >>> class MockWrapper:
   ...     def get_pytorch_module(self):
   ...         return MockModule()
   >>> X = np.array([[1.0, 2.0], [3.0, 4.0]])
   >>> result = gradient_importance(MockWrapper(), X)
   >>> isinstance(result, dict)
   True


.. py:function:: interpret_features(X: numpy.ndarray, *, X_emb: Optional[numpy.ndarray] = None, model: Optional[Any] = None, analyses: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None, method_name: str = 'embedding', n_repeats: int = 5, random_state: Optional[int] = None) -> Dict[str, Any]

   Run one or more feature interpretation analyses.

   :param X: Original input data.
   :type X: np.ndarray
   :param X_emb: Explicit embedding used by correlation-based analysis.
   :type X_emb: np.ndarray, optional
   :param model: Fitted reducer or model used by importance analyses.
   :type model: Any, optional
   :param analyses: Analyses to compute. ``None`` defaults to ``("correlation",)``.
   :type analyses: sequence of {"correlation", "perturbation", "gradient"}, optional
   :param feature_names: Feature names aligned with ``X`` when the requested analysis returns
                         feature-keyed outputs.
   :type feature_names: sequence of str, optional
   :param method_name: Display name written into the returned analysis records.
   :type method_name: str, default="embedding"
   :param n_repeats: Number of permutations per feature for perturbation importance.
   :type n_repeats: int, default=5
   :param random_state: Random seed for perturbation importance.
   :type random_state: int, optional

   :returns: Dictionary with keys:

             - ``analysis``: nested analysis payloads
             - ``records``: tidy analysis records as ``list[dict]``
   :rtype: dict

   :raises ValueError: If a requested analysis is unsupported, missing required inputs, or
       lacks required feature names.

   .. rubric:: Notes

   This function is a pure interpretation backend for manager, report, or
   visualization workflows. It does not fit models, compute embeddings, or
   mutate reducer state.

   .. seealso::

      :obj:`correlate_features`
          Feature-to-dimension interpretation from explicit embeddings.

      :obj:`perturbation_importance`
          Model-agnostic importance based on shuffled features.

      :obj:`gradient_importance`
          Encoder saliency for supported torch-based reducers.

   .. rubric:: Examples

   >>> import numpy as np
   >>> class MockReducer:
   ...     def transform(self, X):
   ...         return X[:, :2]
   >>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]])
   >>> X_emb = X[:, :2]
   >>> result = interpret_features(
   ...     X,
   ...     X_emb=X_emb,
   ...     model=MockReducer(),
   ...     analyses=["correlation", "perturbation"],
   ...     feature_names=["f1", "f2"],
   ...     n_repeats=1,
   ...     random_state=0,
   ... )
   >>> sorted(result)
   ['analysis', 'records']