coco_pipe.dim_reduction¶

Submodules¶

Attributes¶

METHODS

Classes¶

`DimReduction`	Manage one dimensionality reduction workflow.
`BaseReducer`	Abstract base class for all dimensionality reduction implementations.
`IncrementalPCAReducer`	Incremental PCA reducer.
`IsomapReducer`	Isometric Mapping reducer.
`LLEReducer`	Locally Linear Embedding reducer.
`MDSReducer`	Multidimensional Scaling reducer.
`PCAReducer`	Principal Component Analysis reducer.
`SpectralEmbeddingReducer`	Spectral Embedding reducer.
`TSNEReducer`	t-SNE reducer.

Functions¶

`interpret_features`(→ Dict[str, Any])	Run one or more feature interpretation analyses.
`trajectory_curvature`(→ numpy.ndarray)	Calculate geometric curvature of a trajectory.
`trajectory_separation`(→ Dict[Tuple[str, str], ...)	Calculate time-resolved separation between labeled trajectory groups.
`trajectory_speed`(→ numpy.ndarray)	Calculate instantaneous trajectory speed.
`continuity`(→ float)	Compute continuity from a co-ranking matrix.
`lcmc`(→ float)	Compute the local continuity meta-criterion (LCMC).
`shepard_diagram_data`(→ Tuple[numpy.ndarray, numpy.ndarray])	Compute sampled pairwise distances for a Shepard diagram.
`trustworthiness`(→ float)	Compute trustworthiness from a co-ranking matrix.

Package Contents¶

coco_pipe.dim_reduction.interpret_features(X: numpy.ndarray, *, X_emb: numpy.ndarray | None = None, model: Any | None = None, analyses: Sequence[str] | None = None, feature_names: Sequence[str] | None = None, method_name: str = 'embedding', n_repeats: int = 5, random_state: int | None = None) → Dict[str, Any][source]¶

Run one or more feature interpretation analyses.

Parameters:

X (np.ndarray) – Original input data.
X_emb (np.ndarray, optional) – Explicit embedding used by correlation-based analysis.
model (Any, optional) – Fitted reducer or model used by importance analyses.
analyses (sequence of {"correlation", "perturbation", "gradient"}, optional) – Analyses to compute. None defaults to ("correlation",).
feature_names (sequence of str, optional) – Feature names aligned with X when the requested analysis returns feature-keyed outputs.
method_name (str, default="embedding") – Display name written into the returned analysis records.
n_repeats (int, default=5) – Number of permutations per feature for perturbation importance.
random_state (int, optional) – Random seed for perturbation importance.

Returns:

Dictionary with keys:

analysis: nested analysis payloads
records: tidy analysis records as list[dict]

Return type:

dict

Raises:

ValueError – If a requested analysis is unsupported, missing required inputs, or lacks required feature names.

Notes

This function is a pure interpretation backend for manager, report, or visualization workflows. It does not fit models, compute embeddings, or mutate reducer state.

See also

correlate_features: Feature-to-dimension interpretation from explicit embeddings.
perturbation_importance: Model-agnostic importance based on shuffled features.
gradient_importance: Encoder saliency for supported torch-based reducers.

Examples

>>> import numpy as np
>>> class MockReducer:
...     def transform(self, X):
...         return X[:, :2]
>>> X = np.array([[0.0, 1.0], [1.0, 0.0], [2.0, 1.0]])
>>> X_emb = X[:, :2]
>>> result = interpret_features(
...     X,
...     X_emb=X_emb,
...     model=MockReducer(),
...     analyses=["correlation", "perturbation"],
...     feature_names=["f1", "f2"],
...     n_repeats=1,
...     random_state=0,
... )
>>> sorted(result)
['analysis', 'records']

coco_pipe.dim_reduction.METHODS = ('PCA', 'IncrementalPCA', 'DaskPCA', 'DaskTruncatedSVD', 'Isomap', 'LLE', 'MDS',...¶

class coco_pipe.dim_reduction.DimReduction(method: str | coco_pipe.dim_reduction.config.BaseReducerConfig, n_components: int = 2, params: Dict[str, Any] | None = None, **kwargs)[source]¶

Manage one dimensionality reduction workflow.

Parameters:

method (str or BaseReducerConfig) – Canonical public reducer name or a typed configuration object. Method names are exact and must match the registry, for example "PCA", "Isomap", "Pacmap", or "TopologicalAE".
n_components (int, default=2) – Target dimensionality when method is a string.
params (dict, optional) – Additional reducer keyword arguments merged into the constructor arguments when method is a string.
**kwargs (dict) – Runtime reducer keyword overrides. These are merged after params.

method¶

Canonical reducer name.

Type:: str

n_components¶

Target dimensionality used for the reducer instance.

Type:: int

reducer¶

Instantiated reducer backend.

Type:: BaseReducer

metrics_¶

Cached scalar evaluation summaries from the latest score() call.

Type:: dict

quality_metadata_¶

Cached scalar reducer metadata exposed through the reducer contract.

Type:: dict

diagnostics_¶

Cached non-scalar diagnostic artifacts exposed through the reducer contract or the evaluation layer.

Type:: dict

metric_records_¶

Cached tidy metric observations produced by the evaluator.

Type:: list of dict

interpretation_¶

Cached feature interpretation payloads from the latest interpret() call.

Type:: dict

interpretation_records_¶

Cached tidy feature-interpretation observations.

Type:: list of dict

See also

coco_pipe.dim_reduction.analysis.interpret_features: Pure interpretation backend used by interpret().
coco_pipe.dim_reduction.evaluation.core.evaluate_embedding: Pure evaluator used by score().
coco_pipe.dim_reduction.evaluation.core.MethodSelector: Post-hoc comparison and ranking over already-scored reducers.
coco_pipe.viz.dim_reduction: Plotting utilities for embeddings, metrics, and diagnostics.

Examples

>>> reducer = DimReduction("UMAP", n_components=2, n_neighbors=15)
>>> embedding = reducer.fit_transform(X)
>>> scores = reducer.score(embedding, X=X)
>>> "trustworthiness" in scores["metrics"]
True
>>> interpretation = reducer.interpret(
...     X,
...     X_emb=embedding,
...     analyses=["correlation"],
...     feature_names=feature_names,
... )
>>> "correlation" in interpretation["analysis"]
True

reducer_kwargs¶

reducer: coco_pipe.dim_reduction.reducers.base.BaseReducer¶

metrics_: Dict[str, Any]¶

quality_metadata_: Dict[str, Any]¶

diagnostics_: Dict[str, Any]¶

metric_records_: List[Dict[str, Any]] = []¶

interpretation_: Dict[str, Any]¶

interpretation_records_: List[Dict[str, Any]] = []¶

property random_state: int | None¶: Return the random seed from parameters if any.

property capabilities: Dict[str, Any]¶: Return reducer capability metadata through the manager interface.

_reset_cached_outputs() → None[source]¶: Clear cached evaluation outputs.

_validate_input(X: Any) → numpy.ndarray[source]¶

Validate reducer input shape and coerce to a NumPy array.

Parameters:: X (array-like or MNE object) – Input data accepted by the reducer. Objects exposing get_data() are unwrapped before validation.
Returns:: X – Validated reducer input.
Return type:: np.ndarray
Raises:: ValueError – If the input dimensionality does not match the reducer contract.

fit(X: Any, y: Any | None = None) → DimReduction[source]¶

Fit the reducer on the provided data.

Parameters:

X (array-like or MNE object) – Input data in the reducer’s native layout.
y (array-like, optional) – Optional supervision forwarded to the reducer.

Returns:

self – The fitted reducer.

Return type:

DimReduction

transform(X: Any) → numpy.ndarray[source]¶

Transform new data with a fitted reducer.

Parameters:: X (array-like or MNE object) – Input data in the reducer’s native layout.
Returns:: X_emb – Reduced representation returned by the reducer.
Return type:: np.ndarray

fit_transform(X: Any, y: Any | None = None) → numpy.ndarray[source]¶

Fit the reducer and return the reduced representation.

Parameters:

X (array-like or MNE object) – Input data in the reducer’s native layout.
y (array-like, optional) – Optional supervision forwarded to the reducer.

Returns:

X_emb – Reduced representation returned by the reducer.

Return type:

np.ndarray

get_components() → numpy.ndarray[source]¶

Return reducer-defined component-like outputs.

Returns:: components – Component-like array exposed by the reducer.
Return type:: np.ndarray
Raises:: ValueError – If the reducer does not expose public components.

score(X_emb: numpy.ndarray, X: Any = None, n_neighbors: int = 5, metrics: List[str] | None = None, k_values: List[int] | None = None, labels: numpy.ndarray | None = None, groups: numpy.ndarray | None = None, times: numpy.ndarray | None = None, separation_method: str = 'centroid') → Dict[str, Dict[str, Any]][source]¶

Evaluate an explicit embedding against the original data.

Parameters:

X_emb (array-like) – Embedded data to evaluate.
X (array-like, optional) – Original high-dimensional data in evaluation-ready layout. This is required for standard 2D metrics and optional for native 3D trajectory metrics.
n_neighbors (int, default=5) – K-nearest neighbors size for metric computation.
metrics (list of str, optional) – Metric selectors to compute. None evaluates all metric families available for the embedding shape.
k_values (list of int, optional) – Neighborhood sizes used for multi-scale standard metric evaluation.
labels (np.ndarray, optional) – Optional labels aligned with the embedding. Used for trajectory separation when X_emb is 3D and for explicit supervised 2D metrics when requested.
groups (np.ndarray, optional) – Optional grouping variable aligned with the embedding. Required by grouped supervised evaluation metrics such as separation_logreg_balanced_accuracy.
times (np.ndarray, optional) – Optional trajectory time coordinates aligned with the trajectory length axis.
separation_method (str, default="centroid") – Separation definition passed to trajectory evaluation when labels are available for native 3D trajectory embeddings.

Returns:

scores – Dictionary with keys "metrics", "metadata", and "diagnostics".

Return type:

dict

Notes

score() does not infer or cache embeddings. Callers must pass X_emb explicitly. X is only required when the requested evaluation path needs the original high-dimensional samples.

interpret(X: numpy.ndarray, *, X_emb: numpy.ndarray, analyses: List[str] | None = None, feature_names: List[str] | None = None, n_repeats: int = 5, random_state: int | None = None) → Dict[str, Any][source]¶

Run feature interpretation analyses for an explicit embedding.

Parameters:

X (np.ndarray) – Original input data.
X_emb (np.ndarray) – Explicit embedding aligned with X.
analyses (list of {"correlation", "perturbation", "gradient"}, optional) – Interpretation analyses to compute. None defaults to ["correlation"].
feature_names (list of str, optional) – Feature names aligned with the columns of X when the requested interpretation returns feature-keyed outputs.
n_repeats (int, default=5) – Number of shuffles per feature for perturbation importance.
random_state (int, optional) – Random seed for perturbation importance.

Returns:

Dictionary with keys "analysis" and "records".

Return type:

dict

Notes

interpret() does not fit the reducer or compute embeddings. Callers must pass both X and X_emb explicitly.

See also

coco_pipe.dim_reduction.analysis.interpret_features: Pure interpretation backend used by this manager method.
score: Evaluate structure-preservation metrics for an explicit embedding.

Examples

>>> reducer = DimReduction("PCA", n_components=2)
>>> embedding = reducer.fit_transform(X)
>>> result = reducer.interpret(
...     X,
...     X_emb=embedding,
...     analyses=["correlation"],
...     feature_names=feature_names,
... )
>>> sorted(result)
['analysis', 'records']

get_diagnostics() → Dict[str, Any][source]¶

Return cached diagnostics merged with reducer diagnostics.

Returns:: diagnostics – Diagnostic artifacts declared by the reducer contract and the evaluation layer.
Return type:: dict

get_quality_metadata() → Dict[str, Any][source]¶

Return cached scalar metadata merged with reducer metadata.

Returns:: metadata – Scalar metadata declared by the reducer contract and the evaluation layer.
Return type:: dict

get_metrics() → Dict[str, Any][source]¶: Return cached scalar metrics from the latest score() call.

get_summary() → Dict[str, Any][source]¶

Return a normalized summary payload for report and export paths.

Returns:: Plain dictionary containing method identity, cached scalar summaries, reducer metadata, diagnostics, tidy metric records, and capability flags, plus cached feature interpretation payloads.
Return type:: dict

Notes

The summary does not include an embedding payload. Embeddings are handled explicitly outside the manager and must be passed directly to plotting or reporting utilities that need them.

save(path: str | pathlib.Path)[source]¶

Save the underlying reducer to disk.

Parameters:: path (str or Path) – Output path for reducer persistence.

Notes

Only the reducer model is persisted. Cached manager state such as metrics and diagnostics is not included.

classmethod load(path: str | pathlib.Path, method: str) → DimReduction[source]¶

Load a persisted reducer and wrap it in a fresh manager.

Parameters:

path (str or Path) – Path to a serialized reducer saved with save().
method (str) – Canonical public reducer name used to reconstruct the manager.

Returns:

Fresh manager wrapping the loaded reducer model.

Return type:

DimReduction

Notes

This restores the reducer model only. Cached manager state such as scores, diagnostics, and metric records is not persisted.

coco_pipe.dim_reduction.trajectory_curvature(traj: numpy.ndarray) → numpy.ndarray[source]¶

Calculate geometric curvature of a trajectory.

Parameters:: traj (np.ndarray of shape (..., n_times, n_dims)) – Trajectory array. The second-to-last axis is interpreted as time and the last axis as coordinates.
Returns:: Curvature timecourse aligned with the input time axis.
Return type:: np.ndarray of shape (…, n_times)
Raises:: ValueError – If traj has fewer than two dimensions or fewer than two time points.

Notes

For vector-valued trajectories, curvature is computed from first and second derivatives using the generalized formula

sqrt(||v||^2 ||a||^2 - (v . a)^2) / ||v||^3.

The implementation assumes uniformly spaced samples.

See also

trajectory_turning_angle: Discrete local directional change.
trajectory_tortuosity: Path inefficiency relative to net displacement.
trajectory_speed: First-order trajectory dynamics.

Examples

>>> import numpy as np
>>> t = np.linspace(0, 2 * np.pi, 100)
>>> traj = np.stack([np.cos(t), np.sin(t)], axis=1)
>>> k = trajectory_curvature(traj)
>>> k.shape
(100,)

coco_pipe.dim_reduction.trajectory_separation(traj: numpy.ndarray, labels: numpy.ndarray, method: str = 'centroid', **kwargs) → Dict[Tuple[str, str], numpy.ndarray][source]¶

Calculate time-resolved separation between labeled trajectory groups.

Parameters:

traj (np.ndarray of shape (n_trials, n_times, n_dims)) – Trajectory tensor containing one trajectory per trial.
labels (np.ndarray of shape (n_trials,)) – Class label for each trial.
method ({"centroid", "within_between_ratio", "mahalanobis",) – “distributional”, “margin”}, default=”centroid” Separation definition to compute.
**kwargs (dict) – Additional keyword arguments forwarded to the selected separation method.

Returns:

Mapping from label pairs to separation timecourses of shape (n_times,).

Return type:

dict[tuple[str, str], np.ndarray]

Raises:

ValueError – If the inputs are invalid or if an unsupported separation method is requested.

Notes

This is the high-level separation entrypoint for trajectory-group comparison. It dispatches to the more specific separation primitives in this module.

Supported methods:

"centroid": Euclidean distance between label centroids.
"within_between_ratio": Between-centroid distance normalized by within-group dispersion.
"mahalanobis": Covariance-aware centroid separation.
"distributional": Energy-distance separation between trial clouds.
"margin": Nearest-cross minus nearest-within margin separation.

See also

trajectory_dispersion: Within-group spread used by some separation methods.

Examples

>>> import numpy as np
>>> traj = np.zeros((4, 5, 2))
>>> labels = np.array(["A", "A", "B", "B"])
>>> sep = trajectory_separation(traj, labels, method="centroid")
>>> list(sep.keys())
[('A', 'B')]

coco_pipe.dim_reduction.trajectory_speed(traj: numpy.ndarray, dt: float = 1.0) → numpy.ndarray[source]¶

Calculate instantaneous trajectory speed.

Parameters:

traj (np.ndarray of shape (..., n_times, n_dims)) – Trajectory array. The second-to-last axis is interpreted as time and the last axis as coordinates.
dt (float, default=1.0) – Uniform time step between consecutive samples.

Returns:

Instantaneous speed timecourse. The final value is padded with the last computed speed so that the output length matches the number of time points.

Return type:

np.ndarray of shape (…, n_times)

Raises:

ValueError – If traj has fewer than two dimensions, contains fewer than two time points, or if dt <= 0.

Notes

This function computes the norm of the first difference along the time axis, divided by dt.

See also

trajectory_acceleration: Second-order trajectory dynamics.
trajectory_path_length: Total or cumulative traveled distance.
trajectory_displacement: Distance from the initial state across time.

Examples

>>> import numpy as np
>>> traj = np.array([[0.0, 0.0], [1.0, 0.0], [2.0, 0.0]])
>>> trajectory_speed(traj)
array([1., 1., 1.])

coco_pipe.dim_reduction.continuity(Q: numpy.ndarray, k: int) → float[source]¶

Compute continuity from a co-ranking matrix.

Continuity penalizes extrusions, i.e. points that are among the k nearest neighbors in the original space but are pushed farther away in the embedding.

Parameters:

Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.
k (int) – Neighborhood size. The normalization used by continuity requires 2 * n_samples - 3 * k - 1 > 0.

Returns:

Continuity score in [0, 1]. Higher is better.

Return type:

float

Raises:

ValueError – If Q is invalid or if k falls outside the valid domain.

See also

trustworthiness: Complementary intrusion-based metric.
compute_coranking_matrix: Construct the required co-ranking matrix.

Examples

>>> import numpy as np
>>> Q = np.diag([1, 1, 1, 1])
>>> continuity(Q, k=1)
1.0

coco_pipe.dim_reduction.lcmc(Q: numpy.ndarray, k: int) → float[source]¶

Compute the local continuity meta-criterion (LCMC).

Parameters:

Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.
k (int) – Neighborhood size.

Returns:

LCMC score. Higher is better.

Return type:

float

Raises:

ValueError – If Q is invalid or if k falls outside the valid domain.

See also

trustworthiness: Neighbor-preservation metric.
continuity: Neighbor-consistency metric.

Examples

>>> import numpy as np
>>> Q = np.diag([1, 1, 1, 1])
>>> isinstance(lcmc(Q, k=1), float)
True

coco_pipe.dim_reduction.shepard_diagram_data(X: numpy.ndarray, X_embedded: numpy.ndarray, sample_size: int = 1000, random_state: int | None = None) → Tuple[numpy.ndarray, numpy.ndarray][source]¶

Compute sampled pairwise distances for a Shepard diagram.

Parameters:

X (np.ndarray of shape (n_samples, n_features)) – Original high-dimensional data.
X_embedded (np.ndarray of shape (n_samples, n_components)) – Low-dimensional embedding of the same samples.
sample_size (int, default=1000) – Number of samples to keep before computing pairwise distances. If sample_size is at least n_samples, all samples are used.
random_state (int, optional) – Random seed used when subsampling.

Returns:

Pairwise distances in the original and embedded spaces.

Return type:

tuple[np.ndarray, np.ndarray]

Raises:

ValueError – If the inputs are invalid or if sample_size <= 1.

See also

compute_coranking_matrix: Rank-based global quality summary.

Examples

>>> import numpy as np
>>> X = np.random.RandomState(0).rand(10, 3)
>>> X_emb = X[:, :2]
>>> d_orig, d_emb = shepard_diagram_data(X, X_emb, sample_size=5, random_state=0)
>>> len(d_orig) == len(d_emb)
True

coco_pipe.dim_reduction.trustworthiness(Q: numpy.ndarray, k: int) → float[source]¶

Compute trustworthiness from a co-ranking matrix.

Trustworthiness penalizes intrusions, i.e. points that appear among the k nearest neighbors in the embedding but were farther away in the original space.

Parameters:

Q (np.ndarray of shape (n_samples - 1, n_samples - 1)) – Co-ranking matrix.
k (int) – Neighborhood size. The normalization used by trustworthiness requires 2 * n_samples - 3 * k - 1 > 0.

Returns:

Trustworthiness score in [0, 1]. Higher is better.

Return type:

float

Raises:

ValueError – If Q is invalid or if k falls outside the valid domain.

See also

continuity: Complementary extrusion-based metric.
compute_coranking_matrix: Construct the required co-ranking matrix.

Examples

>>> import numpy as np
>>> Q = np.diag([1, 1, 1, 1])
>>> trustworthiness(Q, k=1)
1.0

class coco_pipe.dim_reduction.BaseReducer(n_components: int = 2, **kwargs)[source]¶

Bases: abc.ABC

Abstract base class for all dimensionality reduction implementations.

This class defines the standard interface that all reducers must implement and is safe to subclass for custom reducers. It provides built-in support for model persistence (save/load) using joblib.

For custom reducers operating on nonstandard data layouts, override capabilities so the manager layer can route validation, scoring, plotting, and reporting correctly.

Parameters:

n_components (int, default=2) – Target dimensionality of the reduced representation.
**kwargs (dict) – Additional keyword arguments stored on params and typically forwarded to the wrapped estimator or backend implementation.

n_components¶

Target dimensionality of the reduced representation.

Type:: int

params¶

Additional reducer parameters captured at initialization time.

Type:: dict

model¶

Underlying fitted model object, such as a scikit-learn estimator or a scientific computing backend. This attribute should be populated by fit.

Type:: Any

Notes

The capabilities property returns a plain dictionary consumed by the manager and evaluation layers. Custom reducers should declare supported diagnostics and scalar metadata explicitly through this mapping. Common keys include:

input_ndim : expected dimensionality of the input container
input_layout : semantic layout name such as “standard”
has_transform : whether transform is supported
has_inverse_transform : whether inverse transforms are available
has_components : whether PCA-like components are exposed
supported_diagnostics : names returned by get_diagnostics
has_native_plot : whether the reducer exposes its own plotting path
is_linear : whether the reducer is linear
is_stochastic : whether repeated runs can vary without a fixed seed

Examples

>>> from sklearn.decomposition import PCA
>>> from coco_pipe.dim_reduction import BaseReducer
>>>
>>> class CustomPCAReducer(BaseReducer):
...     @property
...     def capabilities(self):
...         return self._merge_capabilities(
...             super().capabilities,
...             is_linear=True,
...             has_components=True,
...             supported_diagnostics=("explained_variance_ratio_",),
...         )
...
...     def fit(self, X, y=None):
...         self.model = PCA(n_components=self.n_components, **self.params)
...         self.model.fit(X)
...         return self
...
...     def transform(self, X):
...         return self.model.transform(X)

n_components = 2¶

params¶

model = None¶

context_: Dict[str, Any]¶

property name: str¶: Return a stable public display name for the reducer.

_filter_params(fn_or_class: Any, params: dict) → dict[source]¶

Filter parameters to match the signature of a function or class.

Parameters:

fn_or_class (Any) – The function or class to inspect.
params (dict) – The parameters to filter.

Returns:

filtered_params – Parameters present in the signature. If the target accepts **kwargs or its signature cannot be inspected, the original parameter dictionary is returned unchanged.

Return type:

dict

Notes

This is a convenience helper for reducer implementations that wrap third-party estimators with partially overlapping constructor signatures.

_build_estimator(estimator_cls: Any, params: dict | None = None, component_param: str | None = 'n_components', **fixed_kwargs: Any) → Any[source]¶

Instantiate an estimator with filtered reducer parameters.

Parameters:

estimator_cls (Any) – Estimator class to instantiate.
params (dict, optional) – Explicit parameter dictionary to filter instead of self.params.
component_param (str or None, default="n_components") – Name of the constructor argument receiving self.n_components. Set to None to skip injecting the component count.
**fixed_kwargs (dict) – Keyword arguments always forwarded to the estimator constructor.

Returns:

Instantiated estimator.

Return type:

Any

Notes

This helper assumes the wrapped backend is constructor-driven and can be configured from keyword arguments.

_require_fitted(method_name: str = 'transform', model: Any = None) → Any[source]¶

Validate that a reducer backend has been fitted before access.

Parameters:

method_name (str, default="transform") – Operation requiring a fitted model.
model (Any, optional) – Backend model to check. Defaults to self.model.

Returns:

The validated model instance.

Return type:

Any

Raises:

RuntimeError – If no fitted model is available.

_merge_capabilities(base_caps: Dict[str, Any], **overrides: Any) → Dict[str, Any][source]¶

Return a capability mapping updated with reducer-specific overrides.

Parameters:

base_caps (dict) – Base capability mapping, typically super().capabilities.
**overrides (dict) – Reducer-specific capability values to apply.

Returns:

Capability mapping with overrides applied.

Return type:

dict

abstract fit(X: ArrayLike, y: ArrayLike | None = None) → BaseReducer[source]¶

Fit the model to the data.

Parameters:

X (ArrayLike) – Training data. Most reducers expect (n_samples, n_features), but reducers with custom capabilities[“input_layout”] may accept other layouts such as snapshot matrices or grouped trajectory tensors.
y (ArrayLike, optional) – Optional supervision aligned with the sample axis used by the reducer’s declared input layout.

Returns:

self – The fitted reducer instance.

Return type:

BaseReducer

Notes

Most reducers expect X to have shape (n_samples, n_features). Some reducers operate on alternative layouts and should document those layouts through capabilities.

abstract transform(X: ArrayLike) → numpy.ndarray[source]¶

Apply dimensionality reduction to X.

Parameters:: X (ArrayLike) – New data to transform. Its layout should match the reducer’s declared capabilities.
Returns:: X_new – Reduced representation. The exact output shape depends on the reducer, but the last dimension usually matches n_components.
Return type:: np.ndarray
Raises:: RuntimeError – Raised by concrete implementations when transform is called before fitting or when the reducer does not support out-of-sample transforms.

fit_transform(X: ArrayLike, y: ArrayLike | None = None) → numpy.ndarray[source]¶

Fit the model to data and return the transformed data.

This method usually calls fit and then transform, but reducers may override it for efficiency if the underlying algorithm supports a native combined path.

Parameters:

X (ArrayLike) – Training data following the reducer’s declared layout.
y (ArrayLike, optional) – Optional supervision aligned with the reducer’s input layout.

Returns:

X_new – Reduced representation returned by transform.

Return type:

np.ndarray

save(filepath: str | os.PathLike) → None[source]¶

Persist the reducer to a file.

The default implementation serializes the reducer instance with joblib. Custom reducers should either remain joblib-serializable or override this method and load() with a custom persistence strategy.

Parameters:: filepath (str or Path) – Path to the output file.

Notes

The default implementation serializes the reducer instance with joblib.dump. Custom reducers should either remain joblib-serializable or override this method and load with a custom persistence strategy.

property capabilities: Dict[str, Any]¶

Return reducer capability flags consumed by the manager layer.

Custom reducers with nonstandard inputs should override at least input_ndim and input_layout. Reducers exposing diagnostics or scalar quality metadata should declare them explicitly through supported_diagnostics and supported_metadata.

Returns:: Mapping of reducer capability flags.
Return type:: dict

Notes

The default capabilities describe a typical estimator consuming (samples, features) input and exposing transform.

_attribute_dict(obj: Any, attrs: Iterable[str]) → Dict[str, Any][source]¶

Extract requested attributes from a target object into a dictionary.

This helper filters missing attributes and swallows common access errors (such as deferred scikit-learn properties) to return only what is currently available on the target.

Parameters:

obj (Any) – Target object to inspect.
attrs (iterable of str) – Attribute names to attempt to extract.

Returns:

Mapping of available attribute names to their values.

Return type:

dict

get_diagnostics() → Dict[str, Any][source]¶

Return diagnostic arrays or structured artifacts.

Diagnostics are intended for non-scalar outputs such as explained variance curves, eigenvalues, modes, graphs, or training histories. Only names declared in capabilities[“supported_diagnostics”] are queried.

Returns:: diagnostics – Dictionary of diagnostic attributes declared in capabilities[“supported_diagnostics”].
Return type:: dict
Raises:: RuntimeError – If the reducer has not been fitted.

get_quality_metadata() → Dict[str, Any][source]¶

Return scalar metadata about the reduction process or quality.

Typical examples include iteration counts, optimization stress, final loss values, or backend-specific convergence flags. Only names declared in capabilities[“supported_metadata”] are queried.

Returns:: metadata – Dictionary containing only scalar values corresponding to keys declared in capabilities[“supported_metadata”].
Return type:: dict
Raises:: RuntimeError – If the reducer has not been fitted.

get_components() → numpy.ndarray[source]¶

Return reducer-defined component-like outputs.

Returns:: Reducer-defined component array.
Return type:: np.ndarray
Raises:: ValueError – If the reducer does not expose public components.

classmethod load(filepath: str | os.PathLike) → BaseReducer[source]¶

Load a reducer from a file.

Parameters:: filepath (str or Path) – Path to the file to load.
Returns:: reducer – The loaded reducer instance.
Return type:: BaseReducer

Notes

This method assumes the reducer was serialized with save or a compatible joblib.dump call.

class coco_pipe.dim_reduction.IncrementalPCAReducer(n_components: int = 2, batch_size: int | None = None, **kwargs)[source]¶

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Incremental PCA reducer.

This reducer wraps sklearn.decomposition.IncrementalPCA for batch-wise fitting when the full dataset is too large to process in one pass.

Parameters:

n_components (int, default=2) – Number of principal components to keep.
batch_size (int, optional) – Number of samples processed per batch.
**kwargs (dict) – Additional keyword arguments forwarded to IncrementalPCA after signature filtering.

batch_size¶

Batch size used when fitting the incremental estimator.

Type:: int or None

model¶

Fitted IncrementalPCA estimator after fit or partial_fit.

Type:: sklearn.decomposition.IncrementalPCA or None

See also

PCAReducer: Standard in-memory linear PCA reducer.
DaskPCAReducer: Linear PCA variant for lazy or distributed arrays.
DaskTruncatedSVDReducer: Linear factorization alternative for lazy arrays.
IsomapReducer: Nonlinear manifold learner based on geodesic distances.
TSNEReducer: Nonlinear neighborhood-preserving embedding.
UMAPReducer: Nonlinear graph-based embedding balancing local and global structure.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IncrementalPCAReducer
>>> X = np.random.rand(100, 12)
>>> reducer = IncrementalPCAReducer(n_components=3, batch_size=25)
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:10]).shape
(10, 3)
>>> stream = IncrementalPCAReducer(n_components=2, batch_size=20)
>>> _ = stream.partial_fit(X[:50])
>>> _ = stream.partial_fit(X[50:])
>>> stream.transform(X).shape
(100, 2)

property capabilities: dict¶

Return capability metadata for Incremental PCA.

Returns:: Capability mapping describing Incremental PCA as a linear component-based reducer.
Return type:: dict

batch_size = None¶

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → IncrementalPCAReducer[source]¶

Fit Incremental PCA in batch mode.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

IncrementalPCAReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IncrementalPCAReducer
>>> X = np.random.rand(30, 6)
>>> reducer = IncrementalPCAReducer(n_components=2, batch_size=10)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True

partial_fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → IncrementalPCAReducer[source]¶

Incrementally fit the estimator on a batch of samples.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Batch of training samples.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Reducer instance after updating the incremental estimator.

Return type:

IncrementalPCAReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IncrementalPCAReducer
>>> X = np.random.rand(40, 6)
>>> reducer = IncrementalPCAReducer(n_components=2, batch_size=20)
>>> _ = reducer.partial_fit(X[:20])
>>> _ = reducer.partial_fit(X[20:])
>>> reducer.model is not None
True

transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) → numpy.ndarray[source]¶

Project data onto the fitted incremental PCA basis.

Parameters:: X (ArrayLike of shape (n_samples, n_features)) – Data to project.
Returns:: Projected coordinates in component space.
Return type:: np.ndarray of shape (n_samples, n_components)
Raises:: RuntimeError – If the reducer has not been fitted.

get_components() → numpy.ndarray[source]¶

Return the incremental PCA component loading matrix.

Returns:: Principal component loading matrix.
Return type:: np.ndarray
Raises:: RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.IsomapReducer(n_components: int = 2, **kwargs)[source]¶

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Isometric Mapping reducer.

Isomap estimates geodesic distances on a nearest-neighbor graph and then computes a low-dimensional embedding consistent with those distances.

Parameters:

n_components (int, default=2) – Number of coordinates for the manifold.
**kwargs (dict) – Additional keyword arguments forwarded to sklearn.manifold.Isomap after signature filtering. Common options include n_neighbors, metric, p, and eigen_solver.

model¶

Fitted Isomap estimator after fit.

Type:: sklearn.manifold.Isomap or None

See also

LLEReducer: Nonlinear local-neighborhood manifold embedding.
MDSReducer: Distance-preserving manifold embedding.
SpectralEmbeddingReducer: Nonlinear graph Laplacian embedding.
PCAReducer: Linear baseline for global variance preservation.
UMAPReducer: Nonlinear graph-based embedding for local and global structure.
TSNEReducer: Nonlinear neighborhood-preserving visualization method.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IsomapReducer
>>> X = np.random.rand(100, 10)
>>> reducer = IsomapReducer(n_components=2, n_neighbors=5)
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:8]).shape
(8, 2)
>>> reducer.n_features_in_
10
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(100, 2)

property capabilities: dict¶

Return capability metadata for Isomap.

Returns:: Capability mapping describing Isomap as a nonlinear reducer with out-of-sample transform support.
Return type:: dict

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → IsomapReducer[source]¶

Fit Isomap on the input data.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

IsomapReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IsomapReducer
>>> X = np.random.rand(30, 6)
>>> reducer = IsomapReducer(n_components=2, n_neighbors=4)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True

transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) → numpy.ndarray[source]¶

Project data into the fitted Isomap embedding space.

Parameters:: X (ArrayLike of shape (n_samples, n_features)) – Data to project.
Returns:: Low-dimensional embedding coordinates.
Return type:: np.ndarray of shape (n_samples, n_components)
Raises:: RuntimeError – If the reducer has not been fitted.

property reconstruction_error_: float | None¶

Return the Isomap reconstruction error.

Returns:: Reconstruction error returned by the fitted estimator.
Return type:: float
Raises:: RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.LLEReducer(n_components: int = 2, **kwargs)[source]¶

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Locally Linear Embedding reducer.

LLE learns a nonlinear embedding by reconstructing each point from its local neighborhood in the input space and preserving those reconstruction weights in the low-dimensional space.

Parameters:

n_components (int, default=2) – Number of coordinates for the manifold.
**kwargs (dict) – Additional keyword arguments forwarded to sklearn.manifold.LocallyLinearEmbedding after signature filtering. Common options include n_neighbors, method, eigen_solver, and random_state.

model¶

Fitted LLE estimator after fit.

Type:: sklearn.manifold.LocallyLinearEmbedding or None

See also

IsomapReducer: Nonlinear geodesic-distance embedding.
MDSReducer: Distance-preserving manifold embedding.
SpectralEmbeddingReducer: Nonlinear graph Laplacian embedding.
PCAReducer: Linear baseline for global variance preservation.
UMAPReducer: Nonlinear graph-based embedding for local and global structure.
TSNEReducer: Nonlinear neighborhood-preserving visualization method.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import LLEReducer
>>> X = np.random.rand(100, 10)
>>> reducer = LLEReducer(n_components=2, n_neighbors=10, eigen_solver="dense")
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:6]).shape
(6, 2)
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(100, 2)

property capabilities: dict¶

Return capability metadata for LLE.

Returns:: Capability mapping describing LLE as a nonlinear reducer with out-of-sample transform support.
Return type:: dict

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → LLEReducer[source]¶

Fit LLE on the input data.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

LLEReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import LLEReducer
>>> X = np.random.rand(30, 6)
>>> reducer = LLEReducer(n_components=2, n_neighbors=5, eigen_solver="dense")
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
>>> reducer = LLEReducer(n_components=2, method="modified", n_neighbors=5)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True

transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) → numpy.ndarray[source]¶

Project data into the fitted LLE embedding space.

Parameters:: X (ArrayLike of shape (n_samples, n_features)) – Data to project.
Returns:: Low-dimensional embedding coordinates.
Return type:: np.ndarray of shape (n_samples, n_components)
Raises:: RuntimeError – If the reducer has not been fitted.

property reconstruction_error_: float¶

Return the LLE reconstruction error.

Returns:: Reconstruction error associated with the embedding.
Return type:: float
Raises:: RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.MDSReducer(n_components: int = 2, **kwargs)[source]¶

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Multidimensional Scaling reducer.

MDS seeks a low-dimensional representation whose pairwise distances best match the pairwise distances in the original space.

Parameters:

n_components (int, default=2) – Number of coordinates for the manifold.
**kwargs (dict) – Additional keyword arguments forwarded to sklearn.manifold.MDS after signature filtering. Common options include metric, n_init, max_iter, dissimilarity, and random_state.

model¶

Fitted MDS estimator after fit or fit_transform.

Type:: sklearn.manifold.MDS or None

Notes

transform is not supported because scikit-learn MDS does not provide an out-of-sample projection API.

See also

IsomapReducer: Nonlinear geodesic-distance embedding.
LLEReducer: Nonlinear local-neighborhood embedding.
SpectralEmbeddingReducer: Nonlinear graph Laplacian embedding.
PCAReducer: Linear baseline for global variance preservation.
UMAPReducer: Nonlinear graph-based embedding for local and global structure.
TSNEReducer: Nonlinear neighborhood-preserving visualization method.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import MDSReducer
>>> X = np.random.rand(60, 8)
>>> reducer = MDSReducer(n_components=2, random_state=42)
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(60, 2)
>>> reducer.stress_ >= 0
True
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True

property capabilities: dict¶

Return capability metadata for MDS.

Returns:: Capability mapping describing MDS as a nonlinear reducer without out-of-sample transform support.
Return type:: dict

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → MDSReducer[source]¶

Fit MDS on the input data.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

MDSReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import MDSReducer
>>> X = np.random.rand(25, 5)
>>> reducer = MDSReducer(n_components=2, random_state=0)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True

abstract transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) → numpy.ndarray[source]¶

Raise because scikit-learn MDS does not support out-of-sample transform.

Parameters:: X (ArrayLike) – Ignored input included for API compatibility.
Raises:: NotImplementedError – Always raised because MDS does not support transforming new data.

fit_transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → numpy.ndarray[source]¶

Fit MDS and return the embedding coordinates.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Embedded coordinates produced by MDS.

Return type:

np.ndarray of shape (n_samples, n_components)

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import MDSReducer
>>> X = np.random.rand(20, 4)
>>> reducer = MDSReducer(n_components=2, random_state=0)
>>> reducer.fit_transform(X).shape
(20, 2)

property stress_: float¶

Return the MDS stress (sum of squared distances mismatch).

Returns:: Stress value returned by the fitted MDS model.
Return type:: float
Raises:: RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.PCAReducer(n_components: int = 2, **kwargs)[source]¶

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Principal Component Analysis reducer.

This reducer wraps sklearn.decomposition.PCA and provides a linear low-dimensional embedding based on singular value decomposition.

Parameters:

n_components (int, default=2) – Number of principal components to keep.
**kwargs (dict) – Additional keyword arguments forwarded to sklearn.decomposition.PCA after signature filtering. Common options include whiten, svd_solver, and random_state.

model¶

Fitted PCA estimator after fit.

Type:: sklearn.decomposition.PCA or None

Notes

This is a deterministic linear reducer unless a randomized solver is used.

See also

IncrementalPCAReducer: Linear PCA variant for batch-wise fitting.
DaskPCAReducer: Linear PCA variant for lazy or distributed arrays.
DaskTruncatedSVDReducer: Linear factorization alternative for lazy arrays.
IsomapReducer: Nonlinear manifold learner based on geodesic distances.
TSNEReducer: Nonlinear neighborhood-preserving embedding.
UMAPReducer: Nonlinear graph-based embedding balancing local and global structure.
PHATEReducer: Nonlinear diffusion-based embedding for smooth trajectories.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import PCAReducer
>>> X = np.random.rand(100, 10)
>>> reducer = PCAReducer(n_components=2, random_state=42)
>>> _ = reducer.fit(X)
>>> X_reduced = reducer.transform(X)
>>> X_reduced.shape
(100, 2)
>>> reducer.explained_variance_ratio_.shape
(2,)
>>> reducer.components_.shape
(2, 10)
>>> reducer = PCAReducer(n_components=3, whiten=True)
>>> reducer.fit_transform(X).shape
(100, 3)

property capabilities: dict¶

Return capability metadata for PCA.

Returns:: Capability mapping describing PCA as a linear component-based reducer.
Return type:: dict

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → PCAReducer[source]¶

Fit PCA on the input data.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

PCAReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import PCAReducer
>>> X = np.random.rand(20, 5)
>>> reducer = PCAReducer(n_components=2)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True

transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) → numpy.ndarray[source]¶

Project data onto the fitted principal component basis.

Parameters:: X (ArrayLike of shape (n_samples, n_features)) – Data to project.
Returns:: Projected coordinates in principal component space.
Return type:: np.ndarray of shape (n_samples, n_components)
Raises:: RuntimeError – If the reducer has not been fitted.

property explained_variance_ratio_: numpy.ndarray¶

Percentage of variance explained by each selected component.

Returns:: Explained variance ratio for each retained component.
Return type:: np.ndarray of shape (n_components,)
Raises:: RuntimeError – If the reducer has not been fitted.

property components_: numpy.ndarray¶

Principal axes in feature space.

Returns:: Principal component loading matrix.
Return type:: np.ndarray of shape (n_components, n_features)
Raises:: RuntimeError – If the reducer has not been fitted.

get_components() → numpy.ndarray[source]¶

Return the principal component loading matrix.

Returns:: Principal component loading matrix.
Return type:: np.ndarray
Raises:: RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.SpectralEmbeddingReducer(n_components: int = 2, **kwargs)[source]¶

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Spectral Embedding reducer.

Spectral Embedding computes a nonlinear embedding using eigenvectors of the graph Laplacian built from the data affinity graph.

Parameters:

n_components (int, default=2) – Number of coordinates for the manifold.
**kwargs (dict) – Additional keyword arguments forwarded to sklearn.manifold.SpectralEmbedding after signature filtering. Common options include affinity, gamma, random_state, eigen_solver, and n_neighbors.

model¶

Fitted spectral embedding estimator after fit or fit_transform.

Type:: sklearn.manifold.SpectralEmbedding or None

Notes

transform is not supported because scikit-learn SpectralEmbedding does not provide an out-of-sample projection API.

See also

IsomapReducer: Nonlinear geodesic-distance embedding.
LLEReducer: Nonlinear local-neighborhood embedding.
MDSReducer: Distance-preserving manifold embedding.
PCAReducer: Linear baseline for global variance preservation.
UMAPReducer: Nonlinear graph-based embedding for local and global structure.
TSNEReducer: Nonlinear neighborhood-preserving visualization method.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import SpectralEmbeddingReducer
>>> X = np.random.rand(80, 10)
>>> reducer = SpectralEmbeddingReducer(n_components=2, random_state=42)
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(80, 2)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True

property capabilities: dict¶

Return capability metadata for Spectral Embedding.

Returns:: Capability mapping describing Spectral Embedding as a nonlinear reducer without out-of-sample transform support.
Return type:: dict

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → SpectralEmbeddingReducer[source]¶

Fit Spectral Embedding on the input data.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

SpectralEmbeddingReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import SpectralEmbeddingReducer
>>> X = np.random.rand(30, 6)
>>> reducer = SpectralEmbeddingReducer(n_components=2, random_state=0)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True

abstract transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) → numpy.ndarray[source]¶

Raise because scikit-learn Spectral Embedding lacks out-of-sample transform.

Parameters:: X (ArrayLike) – Ignored input included for API compatibility.
Raises:: NotImplementedError – Always raised because Spectral Embedding does not support transforming new data.

fit_transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → numpy.ndarray[source]¶

Fit Spectral Embedding and return the embedding coordinates.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Embedded coordinates produced by Spectral Embedding.

Return type:

np.ndarray of shape (n_samples, n_components)

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import SpectralEmbeddingReducer
>>> X = np.random.rand(20, 4)
>>> reducer = SpectralEmbeddingReducer(n_components=2, random_state=0)
>>> reducer.fit_transform(X).shape
(20, 2)

class coco_pipe.dim_reduction.TSNEReducer(n_components: int = 2, **kwargs)[source]¶

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

t-SNE reducer.

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a neighborhood- preserving method designed primarily for visualization. It optimizes a low-dimensional embedding by matching pairwise similarities between the original space and the embedding.

Parameters:

n_components (int, default=2) – Number of embedding dimensions.
**kwargs (dict) – Additional keyword arguments forwarded to sklearn.manifold.TSNE after signature filtering. Common options include perplexity, learning_rate, max_iter, init, and random_state.

embedding_¶

Learned training-set embedding after fit or fit_transform.

Type:: np.ndarray or None

model¶

Fitted t-SNE estimator after fit or fit_transform.

Type:: sklearn.manifold.TSNE or None

Notes

transform is not supported because scikit-learn t-SNE does not provide an out-of-sample projection API.

See also

UMAPReducer: Nonlinear graph-based embedding with transform support.
PacmapReducer: Nonlinear embedding balancing local and global structure.
TrimapReducer: Nonlinear triplet-based embedding preserving global layout.
PHATEReducer: Diffusion-based embedding for continuous trajectories.
PCAReducer: Linear baseline for global variance preservation.
IsomapReducer: Nonlinear geodesic-distance manifold embedding.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import TSNEReducer
>>> X = np.random.rand(100, 10)
>>> reducer = TSNEReducer(n_components=2, perplexity=20, random_state=42)
>>> embedding = reducer.fit_transform(X)
>>> embedding.shape
(100, 2)
>>> reducer.get_quality_metadata()["kl_divergence_"] >= 0
True
>>> _ = reducer.fit(X)
>>> reducer.embedding_.shape
(100, 2)

property capabilities: dict¶

Return capability metadata for t-SNE.

Returns:: Capability mapping describing t-SNE as a nonlinear stochastic reducer without out-of-sample transform support.
Return type:: dict

embedding_ = None¶

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → TSNEReducer[source]¶

Fit t-SNE on the input data.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

TSNEReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import TSNEReducer
>>> X = np.random.rand(30, 6)
>>> reducer = TSNEReducer(n_components=2, perplexity=5, max_iter=250)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True

abstract transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) → numpy.ndarray[source]¶

Raise because t-SNE does not support out-of-sample transformation.

Parameters:: X (ArrayLike) – Ignored input included for API compatibility.
Raises:: NotImplementedError – Always raised because t-SNE does not support transforming new data.

fit_transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → numpy.ndarray[source]¶

Fit t-SNE and return the embedding coordinates.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Embedded coordinates produced by t-SNE.

Return type:

np.ndarray of shape (n_samples, n_components)