coco_pipe.dim_reduction.analysis ================================ .. py:module:: coco_pipe.dim_reduction.analysis .. autoapi-nested-parse:: Feature Attribution and Analysis ================================ Pure attribution and interpretability utilities for dimensionality reduction. This module is intentionally separate from the preservation-focused evaluation stack. The functions here answer a different question: - ``evaluate_embedding(...)`` in :mod:`coco_pipe.dim_reduction.evaluation` asks whether an embedding preserves structure well. - ``analysis.py`` asks which input features appear to drive an embedding. The public surface is explicit and array-first: - ``correlate_features(...)`` computes feature-to-dimension correlations. - ``perturbation_importance(...)`` measures embedding sensitivity to shuffled features. - ``gradient_importance(...)`` computes encoder saliency for supported torch-based reducers. - ``interpret_features(...)`` is a pure backend that combines one or more of these analyses and returns normalized payloads plus tidy records for future manager/report integration. Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca) Functions --------- .. autoapisummary:: coco_pipe.dim_reduction.analysis.correlate_features coco_pipe.dim_reduction.analysis.perturbation_importance coco_pipe.dim_reduction.analysis.gradient_importance coco_pipe.dim_reduction.analysis.interpret_features Module Contents --------------- .. py:function:: correlate_features(X_orig: numpy.ndarray, X_emb: numpy.ndarray, feature_names: Sequence[str]) -> Dict[str, Dict[str, float]] Compute Spearman correlations between original features and embedding axes. :param X_orig: Original data with shape ``(n_samples, n_features)``. :type X_orig: np.ndarray :param X_emb: Embedded data with shape ``(n_samples, n_dimensions)``. :type X_emb: np.ndarray :param feature_names: Feature names aligned with the columns of ``X_orig``. :type feature_names: sequence of str :returns: Nested mapping of dimension names to feature-correlation mappings, sorted by descending absolute correlation magnitude within each dimension. :rtype: dict :raises ValueError: If ``X_orig`` or ``X_emb`` is not 2D, if sample counts do not match, or if ``feature_names`` has the wrong length. .. rubric:: Notes Constant features or constant embedding dimensions can yield undefined Spearman coefficients. These are reported as ``0.0`` to keep the output stable and sortable. .. seealso:: :obj:`perturbation_importance` Model-agnostic feature importance by embedding perturbation. :obj:`gradient_importance` Encoder saliency for supported torch-based reducers. :obj:`interpret_features` Higher-level backend that packages correlation and importance outputs. .. rubric:: Examples >>> import numpy as np >>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]]) >>> X_emb = np.array([[0.0, 0.5], [1.0, 0.0], [2.0, 0.5]]) >>> result = correlate_features(X, X_emb, feature_names=["f1", "f2"]) >>> sorted(result) ['Dimension 1', 'Dimension 2'] .. py:function:: perturbation_importance(model: Any, X: numpy.ndarray, feature_names: Sequence[str], X_emb: numpy.ndarray, n_repeats: int = 5, random_state: Optional[int] = None) -> Dict[str, float] Compute model-agnostic feature importance by feature shuffling. :param model: Fitted reducer or estimator exposing ``transform(X)``. :type model: Any :param X: Input data with shape ``(n_samples, n_features)``. :type X: np.ndarray :param feature_names: Feature names aligned with the columns of ``X``. :type feature_names: sequence of str :param X_emb: Explicit embedding of ``X`` used as the perturbation reference. :type X_emb: np.ndarray :param n_repeats: Number of independent shuffles per feature. :type n_repeats: int, default=5 :param random_state: Random seed for reproducible shuffling. :type random_state: int, optional :returns: Mapping of feature name to normalized importance score. Scores sum to 1 when the perturbation signal is nonzero; otherwise all scores are 0. :rtype: dict :raises ValueError: If ``X`` is not 2D, if ``X_emb`` does not align with ``X`` along the sample axis, or if ``feature_names`` has the wrong length. .. seealso:: :obj:`correlate_features` Cheap feature-to-dimension interpretation based on correlations. :obj:`gradient_importance` Encoder saliency for supported torch-based reducers. :obj:`interpret_features` Higher-level backend that packages correlation and importance outputs. .. rubric:: Examples >>> import numpy as np >>> class MockReducer: ... def transform(self, X): ... return X[:, :2] >>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]]) >>> X_emb = X[:, :2] >>> scores = perturbation_importance( ... MockReducer(), ... X, ... feature_names=["f1", "f2"], ... X_emb=X_emb, ... n_repeats=1, ... random_state=0, ... ) >>> sorted(scores) ['f1', 'f2'] .. py:function:: gradient_importance(wrapper: Any, X: numpy.ndarray, feature_names: Optional[Sequence[str]] = None) -> Dict[str, Any] Compute encoder saliency by differentiating embedding magnitude w.r.t. input. :param wrapper: Fitted encoder-based reducer wrapper exposing ``get_pytorch_module()``. :type wrapper: Any :param X: Input array. The sample axis is assumed to be axis 0. Remaining axes are treated as feature dimensions. :type X: np.ndarray :param feature_names: Feature names for 2D inputs. Named outputs are only supported when the reduced saliency is one-dimensional. :type feature_names: sequence of str, optional :returns: For one-dimensional reduced saliency with names, returns a mapping of feature name to normalized importance score. For higher-dimensional saliency, returns ``{"importance_matrix": scores}``. :rtype: dict :raises ValueError: If ``X`` has fewer than 2 dimensions, or if ``feature_names`` is incompatible with the reduced saliency shape. .. rubric:: Notes This function assumes an encoder-based torch wrapper that exposes ``get_pytorch_module()`` and an ``encoder`` submodule. .. seealso:: :obj:`perturbation_importance` Model-agnostic importance that only requires ``transform``. :obj:`correlate_features` Cheap feature-to-dimension interpretation from explicit embeddings. :obj:`interpret_features` Higher-level backend that packages gradient and perturbation outputs. .. rubric:: Examples >>> import numpy as np >>> class Encoder: ... def __call__(self, X): ... return X >>> class MockModule: ... def __init__(self): ... self.encoder = Encoder() ... def eval(self): ... return None ... def parameters(self): ... return iter(()) >>> class MockWrapper: ... def get_pytorch_module(self): ... return MockModule() >>> X = np.array([[1.0, 2.0], [3.0, 4.0]]) >>> result = gradient_importance(MockWrapper(), X) >>> isinstance(result, dict) True .. py:function:: interpret_features(X: numpy.ndarray, *, X_emb: Optional[numpy.ndarray] = None, model: Optional[Any] = None, analyses: Optional[Sequence[str]] = None, feature_names: Optional[Sequence[str]] = None, method_name: str = 'embedding', n_repeats: int = 5, random_state: Optional[int] = None) -> Dict[str, Any] Run one or more feature interpretation analyses. :param X: Original input data. :type X: np.ndarray :param X_emb: Explicit embedding used by correlation-based analysis. :type X_emb: np.ndarray, optional :param model: Fitted reducer or model used by importance analyses. :type model: Any, optional :param analyses: Analyses to compute. ``None`` defaults to ``("correlation",)``. :type analyses: sequence of {"correlation", "perturbation", "gradient"}, optional :param feature_names: Feature names aligned with ``X`` when the requested analysis returns feature-keyed outputs. :type feature_names: sequence of str, optional :param method_name: Display name written into the returned analysis records. :type method_name: str, default="embedding" :param n_repeats: Number of permutations per feature for perturbation importance. :type n_repeats: int, default=5 :param random_state: Random seed for perturbation importance. :type random_state: int, optional :returns: Dictionary with keys: - ``analysis``: nested analysis payloads - ``records``: tidy analysis records as ``list[dict]`` :rtype: dict :raises ValueError: If a requested analysis is unsupported, missing required inputs, or lacks required feature names. .. rubric:: Notes This function is a pure interpretation backend for manager, report, or visualization workflows. It does not fit models, compute embeddings, or mutate reducer state. .. seealso:: :obj:`correlate_features` Feature-to-dimension interpretation from explicit embeddings. :obj:`perturbation_importance` Model-agnostic importance based on shuffled features. :obj:`gradient_importance` Encoder saliency for supported torch-based reducers. .. rubric:: Examples >>> import numpy as np >>> class MockReducer: ... def transform(self, X): ... return X[:, :2] >>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]]) >>> X_emb = X[:, :2] >>> result = interpret_features( ... X, ... X_emb=X_emb, ... model=MockReducer(), ... analyses=["correlation", "perturbation"], ... feature_names=["f1", "f2"], ... n_repeats=1, ... random_state=0, ... ) >>> sorted(result) ['analysis', 'records']