coco_pipe.dim_reduction.analysis¶
Pure attribution and interpretability utilities for dimensionality reduction.
This module is intentionally separate from the preservation-focused evaluation stack. The functions here answer a different question:
evaluate_embedding(...)incoco_pipe.dim_reduction.evaluationasks whether an embedding preserves structure well.analysis.pyasks which input features appear to drive an embedding.
The public surface is explicit and array-first:
correlate_features(...)computes feature-to-dimension correlations.perturbation_importance(...)measures embedding sensitivity to shuffled features.gradient_importance(...)computes encoder saliency for supported torch-based reducers.interpret_features(...)is a pure backend that combines one or more of these analyses and returns normalized payloads plus tidy records for future manager/report integration.
Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca)
Functions¶
|
Compute Spearman correlations between original features and embedding axes. |
|
Compute model-agnostic feature importance by feature shuffling. |
|
Compute encoder saliency by differentiating embedding magnitude w.r.t. input. |
|
Run one or more feature interpretation analyses. |
Module Contents¶
- coco_pipe.dim_reduction.analysis.correlate_features(X_orig: numpy.ndarray, X_emb: numpy.ndarray, feature_names: Sequence[str]) Dict[str, Dict[str, float]][source]¶
Compute Spearman correlations between original features and embedding axes.
- Parameters:
X_orig (np.ndarray) – Original data with shape
(n_samples, n_features).X_emb (np.ndarray) – Embedded data with shape
(n_samples, n_dimensions).feature_names (sequence of str) – Feature names aligned with the columns of
X_orig.
- Returns:
Nested mapping of dimension names to feature-correlation mappings, sorted by descending absolute correlation magnitude within each dimension.
- Return type:
dict
- Raises:
ValueError – If
X_origorX_embis not 2D, if sample counts do not match, or iffeature_nameshas the wrong length.
Notes
Constant features or constant embedding dimensions can yield undefined Spearman coefficients. These are reported as
0.0to keep the output stable and sortable.See also
perturbation_importanceModel-agnostic feature importance by embedding perturbation.
gradient_importanceEncoder saliency for supported torch-based reducers.
interpret_featuresHigher-level backend that packages correlation and importance outputs.
Examples
>>> import numpy as np >>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]]) >>> X_emb = np.array([[0.0, 0.5], [1.0, 0.0], [2.0, 0.5]]) >>> result = correlate_features(X, X_emb, feature_names=["f1", "f2"]) >>> sorted(result) ['Dimension 1', 'Dimension 2']
- coco_pipe.dim_reduction.analysis.perturbation_importance(model: Any, X: numpy.ndarray, feature_names: Sequence[str], X_emb: numpy.ndarray, n_repeats: int = 5, random_state: int | None = None) Dict[str, float][source]¶
Compute model-agnostic feature importance by feature shuffling.
- Parameters:
model (Any) – Fitted reducer or estimator exposing
transform(X).X (np.ndarray) – Input data with shape
(n_samples, n_features).feature_names (sequence of str) – Feature names aligned with the columns of
X.X_emb (np.ndarray) – Explicit embedding of
Xused as the perturbation reference.n_repeats (int, default=5) – Number of independent shuffles per feature.
random_state (int, optional) – Random seed for reproducible shuffling.
- Returns:
Mapping of feature name to normalized importance score. Scores sum to 1 when the perturbation signal is nonzero; otherwise all scores are 0.
- Return type:
dict
- Raises:
ValueError – If
Xis not 2D, ifX_embdoes not align withXalong the sample axis, or iffeature_nameshas the wrong length.
See also
correlate_featuresCheap feature-to-dimension interpretation based on correlations.
gradient_importanceEncoder saliency for supported torch-based reducers.
interpret_featuresHigher-level backend that packages correlation and importance outputs.
Examples
>>> import numpy as np >>> class MockReducer: ... def transform(self, X): ... return X[:, :2] >>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]]) >>> X_emb = X[:, :2] >>> scores = perturbation_importance( ... MockReducer(), ... X, ... feature_names=["f1", "f2"], ... X_emb=X_emb, ... n_repeats=1, ... random_state=0, ... ) >>> sorted(scores) ['f1', 'f2']
- coco_pipe.dim_reduction.analysis.gradient_importance(wrapper: Any, X: numpy.ndarray, feature_names: Sequence[str] | None = None) Dict[str, Any][source]¶
Compute encoder saliency by differentiating embedding magnitude w.r.t. input.
- Parameters:
wrapper (Any) – Fitted encoder-based reducer wrapper exposing
get_pytorch_module().X (np.ndarray) – Input array. The sample axis is assumed to be axis 0. Remaining axes are treated as feature dimensions.
feature_names (sequence of str, optional) – Feature names for 2D inputs. Named outputs are only supported when the reduced saliency is one-dimensional.
- Returns:
For one-dimensional reduced saliency with names, returns a mapping of feature name to normalized importance score. For higher-dimensional saliency, returns
{"importance_matrix": scores}.- Return type:
dict
- Raises:
ValueError – If
Xhas fewer than 2 dimensions, or iffeature_namesis incompatible with the reduced saliency shape.
Notes
This function assumes an encoder-based torch wrapper that exposes
get_pytorch_module()and anencodersubmodule.See also
perturbation_importanceModel-agnostic importance that only requires
transform.correlate_featuresCheap feature-to-dimension interpretation from explicit embeddings.
interpret_featuresHigher-level backend that packages gradient and perturbation outputs.
Examples
>>> import numpy as np >>> class Encoder: ... def __call__(self, X): ... return X >>> class MockModule: ... def __init__(self): ... self.encoder = Encoder() ... def eval(self): ... return None ... def parameters(self): ... return iter(()) >>> class MockWrapper: ... def get_pytorch_module(self): ... return MockModule() >>> X = np.array([[1.0, 2.0], [3.0, 4.0]]) >>> result = gradient_importance(MockWrapper(), X) >>> isinstance(result, dict) True
- coco_pipe.dim_reduction.analysis.interpret_features(X: numpy.ndarray, *, X_emb: numpy.ndarray | None = None, model: Any | None = None, analyses: Sequence[str] | None = None, feature_names: Sequence[str] | None = None, method_name: str = 'embedding', n_repeats: int = 5, random_state: int | None = None) Dict[str, Any][source]¶
Run one or more feature interpretation analyses.
- Parameters:
X (np.ndarray) – Original input data.
X_emb (np.ndarray, optional) – Explicit embedding used by correlation-based analysis.
model (Any, optional) – Fitted reducer or model used by importance analyses.
analyses (sequence of {"correlation", "perturbation", "gradient"}, optional) – Analyses to compute.
Nonedefaults to("correlation",).feature_names (sequence of str, optional) – Feature names aligned with
Xwhen the requested analysis returns feature-keyed outputs.method_name (str, default="embedding") – Display name written into the returned analysis records.
n_repeats (int, default=5) – Number of permutations per feature for perturbation importance.
random_state (int, optional) – Random seed for perturbation importance.
- Returns:
Dictionary with keys:
analysis: nested analysis payloadsrecords: tidy analysis records aslist[dict]
- Return type:
dict
- Raises:
ValueError – If a requested analysis is unsupported, missing required inputs, or lacks required feature names.
Notes
This function is a pure interpretation backend for manager, report, or visualization workflows. It does not fit models, compute embeddings, or mutate reducer state.
See also
correlate_featuresFeature-to-dimension interpretation from explicit embeddings.
perturbation_importanceModel-agnostic importance based on shuffled features.
gradient_importanceEncoder saliency for supported torch-based reducers.
Examples
>>> import numpy as np >>> class MockReducer: ... def transform(self, X): ... return X[:, :2] >>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]]) >>> X_emb = X[:, :2] >>> result = interpret_features( ... X, ... X_emb=X_emb, ... model=MockReducer(), ... analyses=["correlation", "perturbation"], ... feature_names=["f1", "f2"], ... n_repeats=1, ... random_state=0, ... ) >>> sorted(result) ['analysis', 'records']