coco_pipe.dim_reduction

Submodules

Attributes

Classes

DimReduction

Manage one dimensionality reduction workflow.

BaseReducer

Abstract base class for all dimensionality reduction implementations.

IncrementalPCAReducer

Incremental PCA reducer.

IsomapReducer

Isometric Mapping reducer.

LLEReducer

Locally Linear Embedding reducer.

MDSReducer

Multidimensional Scaling reducer.

PCAReducer

Principal Component Analysis reducer.

SpectralEmbeddingReducer

Spectral Embedding reducer.

TSNEReducer

t-SNE reducer.

Functions

interpret_features(→ Dict[str, Any])

Run one or more feature interpretation analyses.

trajectory_curvature(→ numpy.ndarray)

Calculate geometric curvature of a trajectory.

trajectory_separation(→ Dict[Tuple[str, str], ...)

Calculate time-resolved separation between labeled trajectory groups.

trajectory_speed(→ numpy.ndarray)

Calculate instantaneous trajectory speed.

continuity(→ float)

Compute continuity from a co-ranking matrix.

lcmc(→ float)

Compute the local continuity meta-criterion (LCMC).

shepard_diagram_data(→ Tuple[numpy.ndarray, numpy.ndarray])

Compute sampled pairwise distances for a Shepard diagram.

trustworthiness(→ float)

Compute trustworthiness from a co-ranking matrix.

Package Contents

coco_pipe.dim_reduction.interpret_features(X: numpy.ndarray, *, X_emb: numpy.ndarray | None = None, model: Any | None = None, analyses: Sequence[str] | None = None, feature_names: Sequence[str] | None = None, method_name: str = 'embedding', n_repeats: int = 5, random_state: int | None = None) Dict[str, Any][source]

Run one or more feature interpretation analyses.

Parameters:
  • X (np.ndarray) – Original input data.

  • X_emb (np.ndarray, optional) – Explicit embedding used by correlation-based analysis.

  • model (Any, optional) – Fitted reducer or model used by importance analyses.

  • analyses (sequence of {"correlation", "perturbation", "gradient"}, optional) – Analyses to compute. None defaults to ("correlation",).

  • feature_names (sequence of str, optional) – Feature names aligned with X when the requested analysis returns feature-keyed outputs.

  • method_name (str, default="embedding") – Display name written into the returned analysis records.

  • n_repeats (int, default=5) – Number of permutations per feature for perturbation importance.

  • random_state (int, optional) – Random seed for perturbation importance.

Returns:

Dictionary with keys:

  • analysis: nested analysis payloads

  • records: tidy analysis records as list[dict]

Return type:

dict

Raises:

ValueError – If a requested analysis is unsupported, missing required inputs, or lacks required feature names.

Notes

This function is a pure interpretation backend for manager, report, or visualization workflows. It does not fit models, compute embeddings, or mutate reducer state.

See also

correlate_features

Feature-to-dimension interpretation from explicit embeddings.

perturbation_importance

Model-agnostic importance based on shuffled features.

gradient_importance

Encoder saliency for supported torch-based reducers.

Examples

>>> import numpy as np
>>> class MockReducer:
...     def transform(self, X):
...         return X[:, :2]
>>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]])
>>> X_emb = X[:, :2]
>>> result = interpret_features(
...     X,
...     X_emb=X_emb,
...     model=MockReducer(),
...     analyses=["correlation", "perturbation"],
...     feature_names=["f1", "f2"],
...     n_repeats=1,
...     random_state=0,
... )
>>> sorted(result)
['analysis', 'records']
coco_pipe.dim_reduction.METHODS = ('PCA', 'IncrementalPCA', 'DaskPCA', 'DaskTruncatedSVD', 'Isomap', 'LLE', 'MDS',...
class coco_pipe.dim_reduction.DimReduction(method: str | coco_pipe.dim_reduction.config.BaseReducerConfig, n_components: int = 2, params: Dict[str, Any] | None = None, **kwargs)[source]

Manage one dimensionality reduction workflow.

Parameters:
  • method (str or BaseReducerConfig) – Canonical public reducer name or a typed configuration object. Method names are exact and must match the registry, for example "PCA", "Isomap", "Pacmap", or "TopologicalAE".

  • n_components (int, default=2) – Target dimensionality when method is a string.

  • params (dict, optional) – Additional reducer keyword arguments merged into the constructor arguments when method is a string.

  • **kwargs (dict) – Runtime reducer keyword overrides. These are merged after params.

method

Canonical reducer name.

Type:

str

n_components

Target dimensionality used for the reducer instance.

Type:

int

reducer

Instantiated reducer backend.

Type:

BaseReducer

metrics_

Cached scalar evaluation summaries from the latest score() call.

Type:

dict

quality_metadata_

Cached scalar reducer metadata exposed through the reducer contract.

Type:

dict

diagnostics_

Cached non-scalar diagnostic artifacts exposed through the reducer contract or the evaluation layer.

Type:

dict

metric_records_

Cached tidy metric observations produced by the evaluator.

Type:

list of dict

interpretation_

Cached feature interpretation payloads from the latest interpret() call.

Type:

dict

interpretation_records_

Cached tidy feature-interpretation observations.

Type:

list of dict

See also

coco_pipe.dim_reduction.analysis.interpret_features

Pure interpretation backend used by interpret().

coco_pipe.dim_reduction.evaluation.core.evaluate_embedding

Pure evaluator used by score().

coco_pipe.dim_reduction.evaluation.core.MethodSelector

Post-hoc comparison and ranking over already-scored reducers.

coco_pipe.viz.dim_reduction

Plotting utilities for embeddings, metrics, and diagnostics.

Examples

>>> reducer = DimReduction("UMAP", n_components=2, n_neighbors=15)
>>> embedding = reducer.fit_transform(X)
>>> scores = reducer.score(embedding, X=X)
>>> "trustworthiness" in scores["metrics"]
True
>>> interpretation = reducer.interpret(
...     X,
...     X_emb=embedding,
...     analyses=["correlation"],
...     feature_names=feature_names,
... )
>>> "correlation" in interpretation["analysis"]
True
reducer_kwargs
reducer: coco_pipe.dim_reduction.reducers.base.BaseReducer
metrics_: Dict[str, Any]
quality_metadata_: Dict[str, Any]
diagnostics_: Dict[str, Any]
metric_records_: List[Dict[str, Any]] = []
interpretation_: Dict[str, Any]
interpretation_records_: List[Dict[str, Any]] = []
property random_state: int | None

Return the random seed from parameters if any.

property capabilities: Dict[str, Any]

Return reducer capability metadata through the manager interface.

_reset_cached_outputs() None[source]

Clear cached evaluation outputs.

_validate_input(X: Any) numpy.ndarray[source]

Validate reducer input shape and coerce to a NumPy array.

Parameters:

X (array-like or MNE object) – Input data accepted by the reducer. Objects exposing get_data() are unwrapped before validation.

Returns:

X – Validated reducer input.

Return type:

np.ndarray

Raises:

ValueError – If the input dimensionality does not match the reducer contract.

fit(X: Any, y: Any | None = None) DimReduction[source]

Fit the reducer on the provided data.

Parameters:
  • X (array-like or MNE object) – Input data in the reducer’s native layout.

  • y (array-like, optional) – Optional supervision forwarded to the reducer.

Returns:

self – The fitted reducer.

Return type:

DimReduction

transform(X: Any) numpy.ndarray[source]

Transform new data with a fitted reducer.

Parameters:

X (array-like or MNE object) – Input data in the reducer’s native layout.

Returns:

X_emb – Reduced representation returned by the reducer.

Return type:

np.ndarray

fit_transform(X: Any, y: Any | None = None) numpy.ndarray[source]

Fit the reducer and return the reduced representation.

Parameters:
  • X (array-like or MNE object) – Input data in the reducer’s native layout.

  • y (array-like, optional) – Optional supervision forwarded to the reducer.

Returns:

X_emb – Reduced representation returned by the reducer.

Return type:

np.ndarray

get_components() numpy.ndarray[source]

Return reducer-defined component-like outputs.

Returns:

components – Component-like array exposed by the reducer.

Return type:

np.ndarray

Raises:

ValueError – If the reducer does not expose public components.

score(X_emb: numpy.ndarray, X: Any = None, n_neighbors: int = 5, metrics: List[str] | None = None, k_values: List[int] | None = None, labels: numpy.ndarray | None = None, groups: numpy.ndarray | None = None, times: numpy.ndarray | None = None, separation_method: str = 'centroid') Dict[str, Dict[str, Any]][source]

Evaluate an explicit embedding against the original data.

Parameters:
  • X_emb (array-like) – Embedded data to evaluate.

  • X (array-like, optional) – Original high-dimensional data in evaluation-ready layout. This is required for standard 2D metrics and optional for native 3D trajectory metrics.

  • n_neighbors (int, default=5) – K-nearest neighbors size for metric computation.

  • metrics (list of str, optional) – Metric selectors to compute. None evaluates all metric families available for the embedding shape.

  • k_values (list of int, optional) – Neighborhood sizes used for multi-scale standard metric evaluation.

  • labels (np.ndarray, optional) – Optional labels aligned with the embedding. Used for trajectory separation when X_emb is 3D and for explicit supervised 2D metrics when requested.

  • groups (np.ndarray, optional) – Optional grouping variable aligned with the embedding. Required by grouped supervised evaluation metrics such as separation_logreg_balanced_accuracy.

  • times (np.ndarray, optional) – Optional trajectory time coordinates aligned with the trajectory length axis.

  • separation_method (str, default="centroid") – Separation definition passed to trajectory evaluation when labels are available for native 3D trajectory embeddings.

Returns:

scores – Dictionary with keys "metrics", "metadata", and "diagnostics".

Return type:

dict

Notes

score() does not infer or cache embeddings. Callers must pass X_emb explicitly. X is only required when the requested evaluation path needs the original high-dimensional samples.

interpret(X: numpy.ndarray, *, X_emb: numpy.ndarray, analyses: List[str] | None = None, feature_names: List[str] | None = None, n_repeats: int = 5, random_state: int | None = None) Dict[str, Any][source]

Run feature interpretation analyses for an explicit embedding.

Parameters:
  • X (np.ndarray) – Original input data.

  • X_emb (np.ndarray) – Explicit embedding aligned with X.

  • analyses (list of {"correlation", "perturbation", "gradient"}, optional) – Interpretation analyses to compute. None defaults to ["correlation"].

  • feature_names (list of str, optional) – Feature names aligned with the columns of X when the requested interpretation returns feature-keyed outputs.

  • n_repeats (int, default=5) – Number of shuffles per feature for perturbation importance.

  • random_state (int, optional) – Random seed for perturbation importance.

Returns:

Dictionary with keys "analysis" and "records".

Return type:

dict

Notes

interpret() does not fit the reducer or compute embeddings. Callers must pass both X and X_emb explicitly.

See also

coco_pipe.dim_reduction.analysis.interpret_features

Pure interpretation backend used by this manager method.

score

Evaluate structure-preservation metrics for an explicit embedding.

Examples

>>> reducer = DimReduction("PCA", n_components=2)
>>> embedding = reducer.fit_transform(X)
>>> result = reducer.interpret(
...     X,
...     X_emb=embedding,
...     analyses=["correlation"],
...     feature_names=feature_names,
... )
>>> sorted(result)
['analysis', 'records']
get_diagnostics() Dict[str, Any][source]

Return cached diagnostics merged with reducer diagnostics.

Returns:

diagnostics – Diagnostic artifacts declared by the reducer contract and the evaluation layer.

Return type:

dict

get_quality_metadata() Dict[str, Any][source]

Return cached scalar metadata merged with reducer metadata.

Returns:

metadata – Scalar metadata declared by the reducer contract and the evaluation layer.

Return type:

dict

get_metrics() Dict[str, Any][source]

Return cached scalar metrics from the latest score() call.

get_summary() Dict[str, Any][source]

Return a normalized summary payload for report and export paths.

Returns:

Plain dictionary containing method identity, cached scalar summaries, reducer metadata, diagnostics, tidy metric records, and capability flags, plus cached feature interpretation payloads.

Return type:

dict

Notes

The summary does not include an embedding payload. Embeddings are handled explicitly outside the manager and must be passed directly to plotting or reporting utilities that need them.

save(path: str | pathlib.Path)[source]

Save the underlying reducer to disk.

Parameters:

path (str or Path) – Output path for reducer persistence.

Notes

Only the reducer model is persisted. Cached manager state such as metrics and diagnostics is not included.

classmethod load(path: str | pathlib.Path, method: str) DimReduction[source]

Load a persisted reducer and wrap it in a fresh manager.

Parameters:
  • path (str or Path) – Path to a serialized reducer saved with save().

  • method (str) – Canonical public reducer name used to reconstruct the manager.

Returns:

Fresh manager wrapping the loaded reducer model.

Return type:

DimReduction

Notes

This restores the reducer model only. Cached manager state such as scores, diagnostics, and metric records is not persisted.

coco_pipe.dim_reduction.trajectory_curvature(traj: numpy.ndarray) numpy.ndarray[source]

Calculate geometric curvature of a trajectory.

Parameters:

traj (np.ndarray of shape (..., n_times, n_dims)) – Trajectory array. The second-to-last axis is interpreted as time and the last axis as coordinates.

Returns:

Curvature timecourse aligned with the input time axis.

Return type:

np.ndarray of shape (…, n_times)

Raises:

ValueError – If traj has fewer than two dimensions or fewer than two time points.

Notes

For vector-valued trajectories, curvature is computed from first and second derivatives using the generalized formula

sqrt(||v||^2 ||a||^2 - (v . a)^2) / ||v||^3.

The implementation assumes uniformly spaced samples.

See also

trajectory_turning_angle

Discrete local directional change.

trajectory_tortuosity

Path inefficiency relative to net displacement.

trajectory_speed

First-order trajectory dynamics.

Examples

>>> import numpy as np
>>> t = np.linspace(0, 2 * np.pi, 100)
>>> traj = np.stack([np.cos(t), np.sin(t)], axis=1)
>>> k = trajectory_curvature(traj)
>>> k.shape
(100,)
coco_pipe.dim_reduction.trajectory_separation(traj: numpy.ndarray, labels: numpy.ndarray, method: str = 'centroid', **kwargs) Dict[Tuple[str, str], numpy.ndarray][source]

Calculate time-resolved separation between labeled trajectory groups.

Parameters:
  • traj (np.ndarray of shape (n_trials, n_times, n_dims)) – Trajectory tensor containing one trajectory per trial.

  • labels (np.ndarray of shape (n_trials,)) – Class label for each trial.

  • method ({"centroid", "within_between_ratio", "mahalanobis",) – “distributional”, “margin”}, default=”centroid” Separation definition to compute.

  • **kwargs (dict) – Additional keyword arguments forwarded to the selected separation method.

Returns:

Mapping from label pairs to separation timecourses of shape (n_times,).

Return type:

dict[tuple[str, str], np.ndarray]

Raises:

ValueError – If the inputs are invalid or if an unsupported separation method is requested.

Notes

This is the high-level separation entrypoint for trajectory-group comparison. It dispatches to the more specific separation primitives in this module.

Supported methods:

  • "centroid": Euclidean distance between label centroids.

  • "within_between_ratio": Between-centroid distance normalized by within-group dispersion.

  • "mahalanobis": Covariance-aware centroid separation.

  • "distributional": Energy-distance separation between trial clouds.

  • "margin": Nearest-cross minus nearest-within margin separation.

See also

trajectory_dispersion

Within-group spread used by some separation methods.

Examples

>>> import numpy as np
>>> traj = np.zeros((4, 5, 2))
>>> labels = np.array(["A", "A", "B", "B"])
>>> sep = trajectory_separation(traj, labels, method="centroid")
>>> list(sep.keys())
[('A', 'B')]
coco_pipe.dim_reduction.trajectory_speed(traj: numpy.ndarray, dt: float = 1.0) numpy.ndarray[source]

Calculate instantaneous trajectory speed.

Parameters:
  • traj (np.ndarray of shape (..., n_times, n_dims)) – Trajectory array. The second-to-last axis is interpreted as time and the last axis as coordinates.

  • dt (float, default=1.0) – Uniform time step between consecutive samples.

Returns:

Instantaneous speed timecourse. The final value is padded with the last computed speed so that the output length matches the number of time points.

Return type:

np.ndarray of shape (…, n_times)

Raises:

ValueError – If traj has fewer than two dimensions, contains fewer than two time points, or if dt <= 0.

Notes

This function computes the norm of the first difference along the time axis, divided by dt.

See also

trajectory_acceleration

Second-order trajectory dynamics.

trajectory_path_length

Total or cumulative traveled distance.

trajectory_displacement

Distance from the initial state across time.

Examples

>>> import numpy as np
>>> traj = np.array([[0.0, 0.0], [1.0, 0.0], [2.0, 0.0]])
>>> trajectory_speed(traj)
array([1., 1., 1.])
coco_pipe.dim_reduction.continuity(Q: numpy.ndarray, k: int) float[source]

Compute continuity from a co-ranking matrix.

Continuity penalizes extrusions, i.e. points that are among the k nearest neighbors in the original space but are pushed farther away in the embedding.

Parameters:
  • Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.

  • k (int) – Neighborhood size. The normalization used by continuity requires 2 * n_samples - 3 * k - 1 > 0.

Returns:

Continuity score in [0, 1]. Higher is better.

Return type:

float

Raises:

ValueError – If Q is invalid or if k falls outside the valid domain.

See also

trustworthiness

Complementary intrusion-based metric.

compute_coranking_matrix

Construct the required co-ranking matrix.

Examples

>>> import numpy as np
>>> Q = np.diag([1, 1, 1, 1])
>>> continuity(Q, k=1)
1.0
coco_pipe.dim_reduction.lcmc(Q: numpy.ndarray, k: int) float[source]

Compute the local continuity meta-criterion (LCMC).

Parameters:
  • Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.

  • k (int) – Neighborhood size.

Returns:

LCMC score. Higher is better.

Return type:

float

Raises:

ValueError – If Q is invalid or if k falls outside the valid domain.

See also

trustworthiness

Neighbor-preservation metric.

continuity

Neighbor-consistency metric.

Examples

>>> import numpy as np
>>> Q = np.diag([1, 1, 1, 1])
>>> isinstance(lcmc(Q, k=1), float)
True
coco_pipe.dim_reduction.shepard_diagram_data(X: numpy.ndarray, X_embedded: numpy.ndarray, sample_size: int = 1000, random_state: int | None = None) Tuple[numpy.ndarray, numpy.ndarray][source]

Compute sampled pairwise distances for a Shepard diagram.

Parameters:
  • X (np.ndarray of shape (n_samples, n_features)) – Original high-dimensional data.

  • X_embedded (np.ndarray of shape (n_samples, n_components)) – Low-dimensional embedding of the same samples.

  • sample_size (int, default=1000) – Number of samples to keep before computing pairwise distances. If sample_size is at least n_samples, all samples are used.

  • random_state (int, optional) – Random seed used when subsampling.

Returns:

Pairwise distances in the original and embedded spaces.

Return type:

tuple[np.ndarray, np.ndarray]

Raises:

ValueError – If the inputs are invalid or if sample_size <= 1.

See also

compute_coranking_matrix

Rank-based global quality summary.

Examples

>>> import numpy as np
>>> X = np.random.RandomState(0).rand(10, 3)
>>> X_emb = X[:, :2]
>>> d_orig, d_emb = shepard_diagram_data(X, X_emb, sample_size=5, random_state=0)
>>> len(d_orig) == len(d_emb)
True
coco_pipe.dim_reduction.trustworthiness(Q: numpy.ndarray, k: int) float[source]

Compute trustworthiness from a co-ranking matrix.

Trustworthiness penalizes intrusions, i.e. points that appear among the k nearest neighbors in the embedding but were farther away in the original space.

Parameters:
  • Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.

  • k (int) – Neighborhood size. The normalization used by trustworthiness requires 2 * n_samples - 3 * k - 1 > 0.

Returns:

Trustworthiness score in [0, 1]. Higher is better.

Return type:

float

Raises:

ValueError – If Q is invalid or if k falls outside the valid domain.

See also

continuity

Complementary extrusion-based metric.

compute_coranking_matrix

Construct the required co-ranking matrix.

Examples

>>> import numpy as np
>>> Q = np.diag([1, 1, 1, 1])
>>> trustworthiness(Q, k=1)
1.0
class coco_pipe.dim_reduction.BaseReducer(n_components: int = 2, **kwargs)[source]

Bases: abc.ABC

Abstract base class for all dimensionality reduction implementations.

This class defines the standard interface that all reducers must implement and is safe to subclass for custom reducers. It provides built-in support for model persistence (save/load) using joblib.

For custom reducers operating on nonstandard data layouts, override capabilities so the manager layer can route validation, scoring, plotting, and reporting correctly.

Parameters:
  • n_components (int, default=2) – Target dimensionality of the reduced representation.

  • **kwargs (dict) – Additional keyword arguments stored on params and typically forwarded to the wrapped estimator or backend implementation.

n_components

Target dimensionality of the reduced representation.

Type:

int

params

Additional reducer parameters captured at initialization time.

Type:

dict

model

Underlying fitted model object, such as a scikit-learn estimator or a scientific computing backend. This attribute should be populated by fit.

Type:

Any

Notes

The capabilities property returns a plain dictionary consumed by the manager and evaluation layers. Custom reducers should declare supported diagnostics and scalar metadata explicitly through this mapping. Common keys include:

  • input_ndim : expected dimensionality of the input container

  • input_layout : semantic layout name such as “standard”

  • has_transform : whether transform is supported

  • has_inverse_transform : whether inverse transforms are available

  • has_components : whether PCA-like components are exposed

  • supported_diagnostics : names returned by get_diagnostics

  • has_native_plot : whether the reducer exposes its own plotting path

  • is_linear : whether the reducer is linear

  • is_stochastic : whether repeated runs can vary without a fixed seed

Examples

>>> from sklearn.decomposition import PCA
>>> from coco_pipe.dim_reduction import BaseReducer
>>>
>>> class CustomPCAReducer(BaseReducer):
...     @property
...     def capabilities(self):
...         return self._merge_capabilities(
...             super().capabilities,
...             is_linear=True,
...             has_components=True,
...             supported_diagnostics=("explained_variance_ratio_",),
...         )
...
...     def fit(self, X, y=None):
...         self.model = PCA(n_components=self.n_components, **self.params)
...         self.model.fit(X)
...         return self
...
...     def transform(self, X):
...         return self.model.transform(X)
n_components = 2
params
model = None
context_: Dict[str, Any]
property name: str

Return a stable public display name for the reducer.

_filter_params(fn_or_class: Any, params: dict) dict[source]

Filter parameters to match the signature of a function or class.

Parameters:
  • fn_or_class (Any) – The function or class to inspect.

  • params (dict) – The parameters to filter.

Returns:

filtered_params – Parameters present in the signature. If the target accepts **kwargs or its signature cannot be inspected, the original parameter dictionary is returned unchanged.

Return type:

dict

Notes

This is a convenience helper for reducer implementations that wrap third-party estimators with partially overlapping constructor signatures.

_build_estimator(estimator_cls: Any, params: dict | None = None, component_param: str | None = 'n_components', **fixed_kwargs: Any) Any[source]

Instantiate an estimator with filtered reducer parameters.

Parameters:
  • estimator_cls (Any) – Estimator class to instantiate.

  • params (dict, optional) – Explicit parameter dictionary to filter instead of self.params.

  • component_param (str or None, default="n_components") – Name of the constructor argument receiving self.n_components. Set to None to skip injecting the component count.

  • **fixed_kwargs (dict) – Keyword arguments always forwarded to the estimator constructor.

Returns:

Instantiated estimator.

Return type:

Any

Notes

This helper assumes the wrapped backend is constructor-driven and can be configured from keyword arguments.

_require_fitted(method_name: str = 'transform', model: Any = None) Any[source]

Validate that a reducer backend has been fitted before access.

Parameters:
  • method_name (str, default="transform") – Operation requiring a fitted model.

  • model (Any, optional) – Backend model to check. Defaults to self.model.

Returns:

The validated model instance.

Return type:

Any

Raises:

RuntimeError – If no fitted model is available.

_merge_capabilities(base_caps: Dict[str, Any], **overrides: Any) Dict[str, Any][source]

Return a capability mapping updated with reducer-specific overrides.

Parameters:
  • base_caps (dict) – Base capability mapping, typically super().capabilities.

  • **overrides (dict) – Reducer-specific capability values to apply.

Returns:

Capability mapping with overrides applied.

Return type:

dict

abstract fit(X: ArrayLike, y: ArrayLike | None = None) BaseReducer[source]

Fit the model to the data.

Parameters:
  • X (ArrayLike) – Training data. Most reducers expect (n_samples, n_features), but reducers with custom capabilities[“input_layout”] may accept other layouts such as snapshot matrices or grouped trajectory tensors.

  • y (ArrayLike, optional) – Optional supervision aligned with the sample axis used by the reducer’s declared input layout.

Returns:

self – The fitted reducer instance.

Return type:

BaseReducer

Notes

Most reducers expect X to have shape (n_samples, n_features). Some reducers operate on alternative layouts and should document those layouts through capabilities.

abstract transform(X: ArrayLike) numpy.ndarray[source]

Apply dimensionality reduction to X.

Parameters:

X (ArrayLike) – New data to transform. Its layout should match the reducer’s declared capabilities.

Returns:

X_new – Reduced representation. The exact output shape depends on the reducer, but the last dimension usually matches n_components.

Return type:

np.ndarray

Raises:

RuntimeError – Raised by concrete implementations when transform is called before fitting or when the reducer does not support out-of-sample transforms.

fit_transform(X: ArrayLike, y: ArrayLike | None = None) numpy.ndarray[source]

Fit the model to data and return the transformed data.

This method usually calls fit and then transform, but reducers may override it for efficiency if the underlying algorithm supports a native combined path.

Parameters:
  • X (ArrayLike) – Training data following the reducer’s declared layout.

  • y (ArrayLike, optional) – Optional supervision aligned with the reducer’s input layout.

Returns:

X_new – Reduced representation returned by transform.

Return type:

np.ndarray

save(filepath: str | os.PathLike) None[source]

Persist the reducer to a file.

The default implementation serializes the reducer instance with joblib. Custom reducers should either remain joblib-serializable or override this method and load() with a custom persistence strategy.

Parameters:

filepath (str or Path) – Path to the output file.

Notes

The default implementation serializes the reducer instance with joblib.dump. Custom reducers should either remain joblib-serializable or override this method and load with a custom persistence strategy.

property capabilities: Dict[str, Any]

Return reducer capability flags consumed by the manager layer.

Custom reducers with nonstandard inputs should override at least input_ndim and input_layout. Reducers exposing diagnostics or scalar quality metadata should declare them explicitly through supported_diagnostics and supported_metadata.

Returns:

Mapping of reducer capability flags.

Return type:

dict

Notes

The default capabilities describe a typical estimator consuming (samples, features) input and exposing transform.

_attribute_dict(obj: Any, attrs: Iterable[str]) Dict[str, Any][source]

Extract requested attributes from a target object into a dictionary.

This helper filters missing attributes and swallows common access errors (such as deferred scikit-learn properties) to return only what is currently available on the target.

Parameters:
  • obj (Any) – Target object to inspect.

  • attrs (iterable of str) – Attribute names to attempt to extract.

Returns:

Mapping of available attribute names to their values.

Return type:

dict

get_diagnostics() Dict[str, Any][source]

Return diagnostic arrays or structured artifacts.

Diagnostics are intended for non-scalar outputs such as explained variance curves, eigenvalues, modes, graphs, or training histories. Only names declared in capabilities[“supported_diagnostics”] are queried.

Returns:

diagnostics – Dictionary of diagnostic attributes declared in capabilities[“supported_diagnostics”].

Return type:

dict

Raises:

RuntimeError – If the reducer has not been fitted.

get_quality_metadata() Dict[str, Any][source]

Return scalar metadata about the reduction process or quality.

Typical examples include iteration counts, optimization stress, final loss values, or backend-specific convergence flags. Only names declared in capabilities[“supported_metadata”] are queried.

Returns:

metadata – Dictionary containing only scalar values corresponding to keys declared in capabilities[“supported_metadata”].

Return type:

dict

Raises:

RuntimeError – If the reducer has not been fitted.

get_components() numpy.ndarray[source]

Return reducer-defined component-like outputs.

Returns:

Reducer-defined component array.

Return type:

np.ndarray

Raises:

ValueError – If the reducer does not expose public components.

classmethod load(filepath: str | os.PathLike) BaseReducer[source]

Load a reducer from a file.

Parameters:

filepath (str or Path) – Path to the file to load.

Returns:

reducer – The loaded reducer instance.

Return type:

BaseReducer

Notes

This method assumes the reducer was serialized with save or a compatible joblib.dump call.

class coco_pipe.dim_reduction.IncrementalPCAReducer(n_components: int = 2, batch_size: int | None = None, **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Incremental PCA reducer.

This reducer wraps sklearn.decomposition.IncrementalPCA for batch-wise fitting when the full dataset is too large to process in one pass.

Parameters:
  • n_components (int, default=2) – Number of principal components to keep.

  • batch_size (int, optional) – Number of samples processed per batch.

  • **kwargs (dict) – Additional keyword arguments forwarded to IncrementalPCA after signature filtering.

batch_size

Batch size used when fitting the incremental estimator.

Type:

int or None

model

Fitted IncrementalPCA estimator after fit or partial_fit.

Type:

sklearn.decomposition.IncrementalPCA or None

See also

PCAReducer

Standard in-memory linear PCA reducer.

DaskPCAReducer

Linear PCA variant for lazy or distributed arrays.

DaskTruncatedSVDReducer

Linear factorization alternative for lazy arrays.

IsomapReducer

Nonlinear manifold learner based on geodesic distances.

TSNEReducer

Nonlinear neighborhood-preserving embedding.

UMAPReducer

Nonlinear graph-based embedding balancing local and global structure.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IncrementalPCAReducer
>>> X = np.random.rand(100, 12)
>>> reducer = IncrementalPCAReducer(n_components=3, batch_size=25)
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:10]).shape
(10, 3)
>>> stream = IncrementalPCAReducer(n_components=2, batch_size=20)
>>> _ = stream.partial_fit(X[:50])
>>> _ = stream.partial_fit(X[50:])
>>> stream.transform(X).shape
(100, 2)
property capabilities: dict

Return capability metadata for Incremental PCA.

Returns:

Capability mapping describing Incremental PCA as a linear component-based reducer.

Return type:

dict

batch_size = None
fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) IncrementalPCAReducer[source]

Fit Incremental PCA in batch mode.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

IncrementalPCAReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IncrementalPCAReducer
>>> X = np.random.rand(30, 6)
>>> reducer = IncrementalPCAReducer(n_components=2, batch_size=10)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
partial_fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) IncrementalPCAReducer[source]

Incrementally fit the estimator on a batch of samples.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Batch of training samples.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Reducer instance after updating the incremental estimator.

Return type:

IncrementalPCAReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IncrementalPCAReducer
>>> X = np.random.rand(40, 6)
>>> reducer = IncrementalPCAReducer(n_components=2, batch_size=20)
>>> _ = reducer.partial_fit(X[:20])
>>> _ = reducer.partial_fit(X[20:])
>>> reducer.model is not None
True
transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Project data onto the fitted incremental PCA basis.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Data to project.

Returns:

Projected coordinates in component space.

Return type:

np.ndarray of shape (n_samples, n_components)

Raises:

RuntimeError – If the reducer has not been fitted.

get_components() numpy.ndarray[source]

Return the incremental PCA component loading matrix.

Returns:

Principal component loading matrix.

Return type:

np.ndarray

Raises:

RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.IsomapReducer(n_components: int = 2, **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Isometric Mapping reducer.

Isomap estimates geodesic distances on a nearest-neighbor graph and then computes a low-dimensional embedding consistent with those distances.

Parameters:
  • n_components (int, default=2) – Number of coordinates for the manifold.

  • **kwargs (dict) – Additional keyword arguments forwarded to sklearn.manifold.Isomap after signature filtering. Common options include n_neighbors, metric, p, and eigen_solver.

model

Fitted Isomap estimator after fit.

Type:

sklearn.manifold.Isomap or None

See also

LLEReducer

Nonlinear local-neighborhood manifold embedding.

MDSReducer

Distance-preserving manifold embedding.

SpectralEmbeddingReducer

Nonlinear graph Laplacian embedding.

PCAReducer

Linear baseline for global variance preservation.

UMAPReducer

Nonlinear graph-based embedding for local and global structure.

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IsomapReducer
>>> X = np.random.rand(100, 10)
>>> reducer = IsomapReducer(n_components=2, n_neighbors=5)
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:8]).shape
(8, 2)
>>> reducer.n_features_in_
10
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(100, 2)
property capabilities: dict

Return capability metadata for Isomap.

Returns:

Capability mapping describing Isomap as a nonlinear reducer with out-of-sample transform support.

Return type:

dict

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) IsomapReducer[source]

Fit Isomap on the input data.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

IsomapReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IsomapReducer
>>> X = np.random.rand(30, 6)
>>> reducer = IsomapReducer(n_components=2, n_neighbors=4)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Project data into the fitted Isomap embedding space.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Data to project.

Returns:

Low-dimensional embedding coordinates.

Return type:

np.ndarray of shape (n_samples, n_components)

Raises:

RuntimeError – If the reducer has not been fitted.

property reconstruction_error_: float | None

Return the Isomap reconstruction error.

Returns:

Reconstruction error returned by the fitted estimator.

Return type:

float

Raises:

RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.LLEReducer(n_components: int = 2, **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Locally Linear Embedding reducer.

LLE learns a nonlinear embedding by reconstructing each point from its local neighborhood in the input space and preserving those reconstruction weights in the low-dimensional space.

Parameters:
  • n_components (int, default=2) – Number of coordinates for the manifold.

  • **kwargs (dict) – Additional keyword arguments forwarded to sklearn.manifold.LocallyLinearEmbedding after signature filtering. Common options include n_neighbors, method, eigen_solver, and random_state.

model

Fitted LLE estimator after fit.

Type:

sklearn.manifold.LocallyLinearEmbedding or None

See also

IsomapReducer

Nonlinear geodesic-distance embedding.

MDSReducer

Distance-preserving manifold embedding.

SpectralEmbeddingReducer

Nonlinear graph Laplacian embedding.

PCAReducer

Linear baseline for global variance preservation.

UMAPReducer

Nonlinear graph-based embedding for local and global structure.

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import LLEReducer
>>> X = np.random.rand(100, 10)
>>> reducer = LLEReducer(n_components=2, n_neighbors=10, eigen_solver="dense")
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:6]).shape
(6, 2)
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(100, 2)
property capabilities: dict

Return capability metadata for LLE.

Returns:

Capability mapping describing LLE as a nonlinear reducer with out-of-sample transform support.

Return type:

dict

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) LLEReducer[source]

Fit LLE on the input data.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

LLEReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import LLEReducer
>>> X = np.random.rand(30, 6)
>>> reducer = LLEReducer(n_components=2, n_neighbors=5, eigen_solver="dense")
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
>>> reducer = LLEReducer(n_components=2, method="modified", n_neighbors=5)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Project data into the fitted LLE embedding space.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Data to project.

Returns:

Low-dimensional embedding coordinates.

Return type:

np.ndarray of shape (n_samples, n_components)

Raises:

RuntimeError – If the reducer has not been fitted.

property reconstruction_error_: float

Return the LLE reconstruction error.

Returns:

Reconstruction error associated with the embedding.

Return type:

float

Raises:

RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.MDSReducer(n_components: int = 2, **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Multidimensional Scaling reducer.

MDS seeks a low-dimensional representation whose pairwise distances best match the pairwise distances in the original space.

Parameters:
  • n_components (int, default=2) – Number of coordinates for the manifold.

  • **kwargs (dict) – Additional keyword arguments forwarded to sklearn.manifold.MDS after signature filtering. Common options include metric, n_init, max_iter, dissimilarity, and random_state.

model

Fitted MDS estimator after fit or fit_transform.

Type:

sklearn.manifold.MDS or None

Notes

transform is not supported because scikit-learn MDS does not provide an out-of-sample projection API.

See also

IsomapReducer

Nonlinear geodesic-distance embedding.

LLEReducer

Nonlinear local-neighborhood embedding.

SpectralEmbeddingReducer

Nonlinear graph Laplacian embedding.

PCAReducer

Linear baseline for global variance preservation.

UMAPReducer

Nonlinear graph-based embedding for local and global structure.

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import MDSReducer
>>> X = np.random.rand(60, 8)
>>> reducer = MDSReducer(n_components=2, random_state=42)
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(60, 2)
>>> reducer.stress_ >= 0
True
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
property capabilities: dict

Return capability metadata for MDS.

Returns:

Capability mapping describing MDS as a nonlinear reducer without out-of-sample transform support.

Return type:

dict

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) MDSReducer[source]

Fit MDS on the input data.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

MDSReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import MDSReducer
>>> X = np.random.rand(25, 5)
>>> reducer = MDSReducer(n_components=2, random_state=0)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
abstract transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Raise because scikit-learn MDS does not support out-of-sample transform.

Parameters:

X (ArrayLike) – Ignored input included for API compatibility.

Raises:

NotImplementedError – Always raised because MDS does not support transforming new data.

fit_transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) numpy.ndarray[source]

Fit MDS and return the embedding coordinates.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Embedded coordinates produced by MDS.

Return type:

np.ndarray of shape (n_samples, n_components)

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import MDSReducer
>>> X = np.random.rand(20, 4)
>>> reducer = MDSReducer(n_components=2, random_state=0)
>>> reducer.fit_transform(X).shape
(20, 2)
property stress_: float

Return the MDS stress (sum of squared distances mismatch).

Returns:

Stress value returned by the fitted MDS model.

Return type:

float

Raises:

RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.PCAReducer(n_components: int = 2, **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Principal Component Analysis reducer.

This reducer wraps sklearn.decomposition.PCA and provides a linear low-dimensional embedding based on singular value decomposition.

Parameters:
  • n_components (int, default=2) – Number of principal components to keep.

  • **kwargs (dict) – Additional keyword arguments forwarded to sklearn.decomposition.PCA after signature filtering. Common options include whiten, svd_solver, and random_state.

model

Fitted PCA estimator after fit.

Type:

sklearn.decomposition.PCA or None

Notes

This is a deterministic linear reducer unless a randomized solver is used.

See also

IncrementalPCAReducer

Linear PCA variant for batch-wise fitting.

DaskPCAReducer

Linear PCA variant for lazy or distributed arrays.

DaskTruncatedSVDReducer

Linear factorization alternative for lazy arrays.

IsomapReducer

Nonlinear manifold learner based on geodesic distances.

TSNEReducer

Nonlinear neighborhood-preserving embedding.

UMAPReducer

Nonlinear graph-based embedding balancing local and global structure.

PHATEReducer

Nonlinear diffusion-based embedding for smooth trajectories.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import PCAReducer
>>> X = np.random.rand(100, 10)
>>> reducer = PCAReducer(n_components=2, random_state=42)
>>> _ = reducer.fit(X)
>>> X_reduced = reducer.transform(X)
>>> X_reduced.shape
(100, 2)
>>> reducer.explained_variance_ratio_.shape
(2,)
>>> reducer.components_.shape
(2, 10)
>>> reducer = PCAReducer(n_components=3, whiten=True)
>>> reducer.fit_transform(X).shape
(100, 3)
property capabilities: dict

Return capability metadata for PCA.

Returns:

Capability mapping describing PCA as a linear component-based reducer.

Return type:

dict

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) PCAReducer[source]

Fit PCA on the input data.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

PCAReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import PCAReducer
>>> X = np.random.rand(20, 5)
>>> reducer = PCAReducer(n_components=2)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Project data onto the fitted principal component basis.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Data to project.

Returns:

Projected coordinates in principal component space.

Return type:

np.ndarray of shape (n_samples, n_components)

Raises:

RuntimeError – If the reducer has not been fitted.

property explained_variance_ratio_: numpy.ndarray

Percentage of variance explained by each selected component.

Returns:

Explained variance ratio for each retained component.

Return type:

np.ndarray of shape (n_components,)

Raises:

RuntimeError – If the reducer has not been fitted.

property components_: numpy.ndarray

Principal axes in feature space.

Returns:

Principal component loading matrix.

Return type:

np.ndarray of shape (n_components, n_features)

Raises:

RuntimeError – If the reducer has not been fitted.

get_components() numpy.ndarray[source]

Return the principal component loading matrix.

Returns:

Principal component loading matrix.

Return type:

np.ndarray

Raises:

RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.SpectralEmbeddingReducer(n_components: int = 2, **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Spectral Embedding reducer.

Spectral Embedding computes a nonlinear embedding using eigenvectors of the graph Laplacian built from the data affinity graph.

Parameters:
  • n_components (int, default=2) – Number of coordinates for the manifold.

  • **kwargs (dict) – Additional keyword arguments forwarded to sklearn.manifold.SpectralEmbedding after signature filtering. Common options include affinity, gamma, random_state, eigen_solver, and n_neighbors.

model

Fitted spectral embedding estimator after fit or fit_transform.

Type:

sklearn.manifold.SpectralEmbedding or None

Notes

transform is not supported because scikit-learn SpectralEmbedding does not provide an out-of-sample projection API.

See also

IsomapReducer

Nonlinear geodesic-distance embedding.

LLEReducer

Nonlinear local-neighborhood embedding.

MDSReducer

Distance-preserving manifold embedding.

PCAReducer

Linear baseline for global variance preservation.

UMAPReducer

Nonlinear graph-based embedding for local and global structure.

TSNEReducer

Nonlinear neighborhood-preserving visualization method.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import SpectralEmbeddingReducer
>>> X = np.random.rand(80, 10)
>>> reducer = SpectralEmbeddingReducer(n_components=2, random_state=42)
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(80, 2)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
property capabilities: dict

Return capability metadata for Spectral Embedding.

Returns:

Capability mapping describing Spectral Embedding as a nonlinear reducer without out-of-sample transform support.

Return type:

dict

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) SpectralEmbeddingReducer[source]

Fit Spectral Embedding on the input data.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

SpectralEmbeddingReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import SpectralEmbeddingReducer
>>> X = np.random.rand(30, 6)
>>> reducer = SpectralEmbeddingReducer(n_components=2, random_state=0)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
abstract transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Raise because scikit-learn Spectral Embedding lacks out-of-sample transform.

Parameters:

X (ArrayLike) – Ignored input included for API compatibility.

Raises:

NotImplementedError – Always raised because Spectral Embedding does not support transforming new data.

fit_transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) numpy.ndarray[source]

Fit Spectral Embedding and return the embedding coordinates.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Embedded coordinates produced by Spectral Embedding.

Return type:

np.ndarray of shape (n_samples, n_components)

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import SpectralEmbeddingReducer
>>> X = np.random.rand(20, 4)
>>> reducer = SpectralEmbeddingReducer(n_components=2, random_state=0)
>>> reducer.fit_transform(X).shape
(20, 2)
class coco_pipe.dim_reduction.TSNEReducer(n_components: int = 2, **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

t-SNE reducer.

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a neighborhood- preserving method designed primarily for visualization. It optimizes a low-dimensional embedding by matching pairwise similarities between the original space and the embedding.

Parameters:
  • n_components (int, default=2) – Number of embedding dimensions.

  • **kwargs (dict) – Additional keyword arguments forwarded to sklearn.manifold.TSNE after signature filtering. Common options include perplexity, learning_rate, max_iter, init, and random_state.

embedding_

Learned training-set embedding after fit or fit_transform.

Type:

np.ndarray or None

model

Fitted t-SNE estimator after fit or fit_transform.

Type:

sklearn.manifold.TSNE or None

Notes

transform is not supported because scikit-learn t-SNE does not provide an out-of-sample projection API.

See also

UMAPReducer

Nonlinear graph-based embedding with transform support.

PacmapReducer

Nonlinear embedding balancing local and global structure.

TrimapReducer

Nonlinear triplet-based embedding preserving global layout.

PHATEReducer

Diffusion-based embedding for continuous trajectories.

PCAReducer

Linear baseline for global variance preservation.

IsomapReducer

Nonlinear geodesic-distance manifold embedding.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import TSNEReducer
>>> X = np.random.rand(100, 10)
>>> reducer = TSNEReducer(n_components=2, perplexity=20, random_state=42)
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(100, 2)
>>> reducer.get_quality_metadata()["kl_divergence_"] >= 0
True
>>> _ = reducer.fit(X)
>>> reducer.embedding_.shape
(100, 2)
property capabilities: dict

Return capability metadata for t-SNE.

Returns:

Capability mapping describing t-SNE as a nonlinear stochastic reducer without out-of-sample transform support.

Return type:

dict

embedding_ = None
fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) TSNEReducer[source]

Fit t-SNE on the input data.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

TSNEReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import TSNEReducer
>>> X = np.random.rand(30, 6)
>>> reducer = TSNEReducer(n_components=2, perplexity=5, max_iter=250)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
abstract transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Raise because t-SNE does not support out-of-sample transformation.

Parameters:

X (ArrayLike) – Ignored input included for API compatibility.

Raises:

NotImplementedError – Always raised because t-SNE does not support transforming new data.

fit_transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) numpy.ndarray[source]

Fit t-SNE and return the embedding coordinates.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Embedded coordinates produced by t-SNE.

Return type:

np.ndarray of shape (n_samples, n_components)