coco_pipe.dim_reduction.analysis

Pure attribution and interpretability utilities for dimensionality reduction.

This module is intentionally separate from the preservation-focused evaluation stack. The functions here answer a different question:

  • evaluate_embedding(...) in coco_pipe.dim_reduction.evaluation asks whether an embedding preserves structure well.

  • analysis.py asks which input features appear to drive an embedding.

The public surface is explicit and array-first:

  • correlate_features(...) computes feature-to-dimension correlations.

  • perturbation_importance(...) measures embedding sensitivity to shuffled features.

  • gradient_importance(...) computes encoder saliency for supported torch-based reducers.

  • interpret_features(...) is a pure backend that combines one or more of these analyses and returns normalized payloads plus tidy records for future manager/report integration.

Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca)

Functions

correlate_features(→ Dict[str, Dict[str, float]])

Compute Spearman correlations between original features and embedding axes.

perturbation_importance(→ Dict[str, float])

Compute model-agnostic feature importance by feature shuffling.

gradient_importance(→ Dict[str, Any])

Compute encoder saliency by differentiating embedding magnitude w.r.t. input.

interpret_features(→ Dict[str, Any])

Run one or more feature interpretation analyses.

Module Contents

coco_pipe.dim_reduction.analysis.correlate_features(X_orig: numpy.ndarray, X_emb: numpy.ndarray, feature_names: Sequence[str]) Dict[str, Dict[str, float]][source]

Compute Spearman correlations between original features and embedding axes.

Parameters:
  • X_orig (np.ndarray) – Original data with shape (n_samples, n_features).

  • X_emb (np.ndarray) – Embedded data with shape (n_samples, n_dimensions).

  • feature_names (sequence of str) – Feature names aligned with the columns of X_orig.

Returns:

Nested mapping of dimension names to feature-correlation mappings, sorted by descending absolute correlation magnitude within each dimension.

Return type:

dict

Raises:

ValueError – If X_orig or X_emb is not 2D, if sample counts do not match, or if feature_names has the wrong length.

Notes

Constant features or constant embedding dimensions can yield undefined Spearman coefficients. These are reported as 0.0 to keep the output stable and sortable.

See also

perturbation_importance

Model-agnostic feature importance by embedding perturbation.

gradient_importance

Encoder saliency for supported torch-based reducers.

interpret_features

Higher-level backend that packages correlation and importance outputs.

Examples

>>> import numpy as np
>>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]])
>>> X_emb = np.array([[0.0, 0.5], [1.0, 0.0], [2.0, 0.5]])
>>> result = correlate_features(X, X_emb, feature_names=["f1", "f2"])
>>> sorted(result)
['Dimension 1', 'Dimension 2']
coco_pipe.dim_reduction.analysis.perturbation_importance(model: Any, X: numpy.ndarray, feature_names: Sequence[str], X_emb: numpy.ndarray, n_repeats: int = 5, random_state: int | None = None) Dict[str, float][source]

Compute model-agnostic feature importance by feature shuffling.

Parameters:
  • model (Any) – Fitted reducer or estimator exposing transform(X).

  • X (np.ndarray) – Input data with shape (n_samples, n_features).

  • feature_names (sequence of str) – Feature names aligned with the columns of X.

  • X_emb (np.ndarray) – Explicit embedding of X used as the perturbation reference.

  • n_repeats (int, default=5) – Number of independent shuffles per feature.

  • random_state (int, optional) – Random seed for reproducible shuffling.

Returns:

Mapping of feature name to normalized importance score. Scores sum to 1 when the perturbation signal is nonzero; otherwise all scores are 0.

Return type:

dict

Raises:

ValueError – If X is not 2D, if X_emb does not align with X along the sample axis, or if feature_names has the wrong length.

See also

correlate_features

Cheap feature-to-dimension interpretation based on correlations.

gradient_importance

Encoder saliency for supported torch-based reducers.

interpret_features

Higher-level backend that packages correlation and importance outputs.

Examples

>>> import numpy as np
>>> class MockReducer:
...     def transform(self, X):
...         return X[:, :2]
>>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]])
>>> X_emb = X[:, :2]
>>> scores = perturbation_importance(
...     MockReducer(),
...     X,
...     feature_names=["f1", "f2"],
...     X_emb=X_emb,
...     n_repeats=1,
...     random_state=0,
... )
>>> sorted(scores)
['f1', 'f2']
coco_pipe.dim_reduction.analysis.gradient_importance(wrapper: Any, X: numpy.ndarray, feature_names: Sequence[str] | None = None) Dict[str, Any][source]

Compute encoder saliency by differentiating embedding magnitude w.r.t. input.

Parameters:
  • wrapper (Any) – Fitted encoder-based reducer wrapper exposing get_pytorch_module().

  • X (np.ndarray) – Input array. The sample axis is assumed to be axis 0. Remaining axes are treated as feature dimensions.

  • feature_names (sequence of str, optional) – Feature names for 2D inputs. Named outputs are only supported when the reduced saliency is one-dimensional.

Returns:

For one-dimensional reduced saliency with names, returns a mapping of feature name to normalized importance score. For higher-dimensional saliency, returns {"importance_matrix": scores}.

Return type:

dict

Raises:

ValueError – If X has fewer than 2 dimensions, or if feature_names is incompatible with the reduced saliency shape.

Notes

This function assumes an encoder-based torch wrapper that exposes get_pytorch_module() and an encoder submodule.

See also

perturbation_importance

Model-agnostic importance that only requires transform.

correlate_features

Cheap feature-to-dimension interpretation from explicit embeddings.

interpret_features

Higher-level backend that packages gradient and perturbation outputs.

Examples

>>> import numpy as np
>>> class Encoder:
...     def __call__(self, X):
...         return X
>>> class MockModule:
...     def __init__(self):
...         self.encoder = Encoder()
...     def eval(self):
...         return None
...     def parameters(self):
...         return iter(())
>>> class MockWrapper:
...     def get_pytorch_module(self):
...         return MockModule()
>>> X = np.array([[1.0, 2.0], [3.0, 4.0]])
>>> result = gradient_importance(MockWrapper(), X)
>>> isinstance(result, dict)
True
coco_pipe.dim_reduction.analysis.interpret_features(X: numpy.ndarray, *, X_emb: numpy.ndarray | None = None, model: Any | None = None, analyses: Sequence[str] | None = None, feature_names: Sequence[str] | None = None, method_name: str = 'embedding', n_repeats: int = 5, random_state: int | None = None) Dict[str, Any][source]

Run one or more feature interpretation analyses.

Parameters:
  • X (np.ndarray) – Original input data.

  • X_emb (np.ndarray, optional) – Explicit embedding used by correlation-based analysis.

  • model (Any, optional) – Fitted reducer or model used by importance analyses.

  • analyses (sequence of {"correlation", "perturbation", "gradient"}, optional) – Analyses to compute. None defaults to ("correlation",).

  • feature_names (sequence of str, optional) – Feature names aligned with X when the requested analysis returns feature-keyed outputs.

  • method_name (str, default="embedding") – Display name written into the returned analysis records.

  • n_repeats (int, default=5) – Number of permutations per feature for perturbation importance.

  • random_state (int, optional) – Random seed for perturbation importance.

Returns:

Dictionary with keys:

  • analysis: nested analysis payloads

  • records: tidy analysis records as list[dict]

Return type:

dict

Raises:

ValueError – If a requested analysis is unsupported, missing required inputs, or lacks required feature names.

Notes

This function is a pure interpretation backend for manager, report, or visualization workflows. It does not fit models, compute embeddings, or mutate reducer state.

See also

correlate_features

Feature-to-dimension interpretation from explicit embeddings.

perturbation_importance

Model-agnostic importance based on shuffled features.

gradient_importance

Encoder saliency for supported torch-based reducers.

Examples

>>> import numpy as np
>>> class MockReducer:
...     def transform(self, X):
...         return X[:, :2]
>>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]])
>>> X_emb = X[:, :2]
>>> result = interpret_features(
...     X,
...     X_emb=X_emb,
...     model=MockReducer(),
...     analyses=["correlation", "perturbation"],
...     feature_names=["f1", "f2"],
...     n_repeats=1,
...     random_state=0,
... )
>>> sorted(result)
['analysis', 'records']