coco_pipe.dim_reduction.reducers.linear

Linear dimensionality reduction reducers.

This module provides linear projection wrappers built on top of scikit-learn and optional Dask backends. These reducers follow the shared BaseReducer contract so they can be used directly with DimReduction, reporting, and visualization utilities.

Classes

PCAReducer

Principal Component Analysis wrapper based on sklearn.decomposition.PCA.

IncrementalPCAReducer

Incremental PCA wrapper for batch-wise fitting on larger datasets.

DaskPCAReducer

Optional Dask-ML PCA wrapper for lazy or distributed arrays.

DaskTruncatedSVDReducer

Optional Dask-ML Truncated SVD wrapper for lazy or distributed arrays.

References

Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca)

Classes

PCAReducer

Principal Component Analysis reducer.

IncrementalPCAReducer

Incremental PCA reducer.

DaskPCAReducer

Dask-ML PCA reducer for lazy or distributed data.

DaskTruncatedSVDReducer

Dask-ML Truncated SVD reducer.

Module Contents

class coco_pipe.dim_reduction.reducers.linear.PCAReducer(n_components: int = 2, **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Principal Component Analysis reducer.

This reducer wraps sklearn.decomposition.PCA and provides a linear low-dimensional embedding based on singular value decomposition.

Parameters:
  • n_components (int, default=2) – Number of principal components to keep.

  • **kwargs (dict) – Additional keyword arguments forwarded to sklearn.decomposition.PCA after signature filtering. Common options include whiten, svd_solver, and random_state.

model

Fitted PCA estimator after fit.

Type:

sklearn.decomposition.PCA or None

Notes

This is a deterministic linear reducer unless a randomized solver is used.

See also

IncrementalPCAReducer

Linear PCA variant for batch-wise fitting.

DaskPCAReducer

Linear PCA variant for lazy or distributed arrays.

DaskTruncatedSVDReducer

Linear factorization alternative for lazy arrays.

IsomapReducer

Nonlinear manifold learner based on geodesic distances.

TSNEReducer

Nonlinear neighborhood-preserving embedding.

UMAPReducer

Nonlinear graph-based embedding balancing local and global structure.

PHATEReducer

Nonlinear diffusion-based embedding for smooth trajectories.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import PCAReducer
>>> X = np.random.rand(100, 10)
>>> reducer = PCAReducer(n_components=2, random_state=42)
>>> _ = reducer.fit(X)
>>> X_reduced = reducer.transform(X)
>>> X_reduced.shape
(100, 2)
>>> reducer.explained_variance_ratio_.shape
(2,)
>>> reducer.components_.shape
(2, 10)
>>> reducer = PCAReducer(n_components=3, whiten=True)
>>> reducer.fit_transform(X).shape
(100, 3)
property capabilities: dict

Return capability metadata for PCA.

Returns:

Capability mapping describing PCA as a linear component-based reducer.

Return type:

dict

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) PCAReducer[source]

Fit PCA on the input data.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

PCAReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import PCAReducer
>>> X = np.random.rand(20, 5)
>>> reducer = PCAReducer(n_components=2)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Project data onto the fitted principal component basis.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Data to project.

Returns:

Projected coordinates in principal component space.

Return type:

np.ndarray of shape (n_samples, n_components)

Raises:

RuntimeError – If the reducer has not been fitted.

property explained_variance_ratio_: numpy.ndarray

Percentage of variance explained by each selected component.

Returns:

Explained variance ratio for each retained component.

Return type:

np.ndarray of shape (n_components,)

Raises:

RuntimeError – If the reducer has not been fitted.

property components_: numpy.ndarray

Principal axes in feature space.

Returns:

Principal component loading matrix.

Return type:

np.ndarray of shape (n_components, n_features)

Raises:

RuntimeError – If the reducer has not been fitted.

get_components() numpy.ndarray[source]

Return the principal component loading matrix.

Returns:

Principal component loading matrix.

Return type:

np.ndarray

Raises:

RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.reducers.linear.IncrementalPCAReducer(n_components: int = 2, batch_size: int | None = None, **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Incremental PCA reducer.

This reducer wraps sklearn.decomposition.IncrementalPCA for batch-wise fitting when the full dataset is too large to process in one pass.

Parameters:
  • n_components (int, default=2) – Number of principal components to keep.

  • batch_size (int, optional) – Number of samples processed per batch.

  • **kwargs (dict) – Additional keyword arguments forwarded to IncrementalPCA after signature filtering.

batch_size

Batch size used when fitting the incremental estimator.

Type:

int or None

model

Fitted IncrementalPCA estimator after fit or partial_fit.

Type:

sklearn.decomposition.IncrementalPCA or None

See also

PCAReducer

Standard in-memory linear PCA reducer.

DaskPCAReducer

Linear PCA variant for lazy or distributed arrays.

DaskTruncatedSVDReducer

Linear factorization alternative for lazy arrays.

IsomapReducer

Nonlinear manifold learner based on geodesic distances.

TSNEReducer

Nonlinear neighborhood-preserving embedding.

UMAPReducer

Nonlinear graph-based embedding balancing local and global structure.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IncrementalPCAReducer
>>> X = np.random.rand(100, 12)
>>> reducer = IncrementalPCAReducer(n_components=3, batch_size=25)
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:10]).shape
(10, 3)
>>> stream = IncrementalPCAReducer(n_components=2, batch_size=20)
>>> _ = stream.partial_fit(X[:50])
>>> _ = stream.partial_fit(X[50:])
>>> stream.transform(X).shape
(100, 2)
property capabilities: dict

Return capability metadata for Incremental PCA.

Returns:

Capability mapping describing Incremental PCA as a linear component-based reducer.

Return type:

dict

batch_size = None
fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) IncrementalPCAReducer[source]

Fit Incremental PCA in batch mode.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Training data.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

IncrementalPCAReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IncrementalPCAReducer
>>> X = np.random.rand(30, 6)
>>> reducer = IncrementalPCAReducer(n_components=2, batch_size=10)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
partial_fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) IncrementalPCAReducer[source]

Incrementally fit the estimator on a batch of samples.

Parameters:
  • X (ArrayLike of shape (n_samples, n_features)) – Batch of training samples.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Reducer instance after updating the incremental estimator.

Return type:

IncrementalPCAReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IncrementalPCAReducer
>>> X = np.random.rand(40, 6)
>>> reducer = IncrementalPCAReducer(n_components=2, batch_size=20)
>>> _ = reducer.partial_fit(X[:20])
>>> _ = reducer.partial_fit(X[20:])
>>> reducer.model is not None
True
transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]

Project data onto the fitted incremental PCA basis.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Data to project.

Returns:

Projected coordinates in component space.

Return type:

np.ndarray of shape (n_samples, n_components)

Raises:

RuntimeError – If the reducer has not been fitted.

get_components() numpy.ndarray[source]

Return the incremental PCA component loading matrix.

Returns:

Principal component loading matrix.

Return type:

np.ndarray

Raises:

RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.reducers.linear.DaskPCAReducer(n_components: int = 2, svd_solver: str = 'auto', **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Dask-ML PCA reducer for lazy or distributed data.

This reducer wraps dask_ml.decomposition.PCA. The backend is imported lazily so the rest of the package remains importable without dask-ml.

Parameters:
  • n_components (int, default=2) – Number of principal components to keep.

  • svd_solver ({"auto", "full", "tsqr", "randomized"}, default="auto") – Solver used by the Dask PCA backend.

  • **kwargs (dict) – Additional keyword arguments forwarded to dask_ml.decomposition.PCA after signature filtering.

svd_solver

Solver used when instantiating the Dask PCA estimator.

Type:

str

model

Fitted Dask PCA estimator after fit.

Type:

dask_ml.decomposition.PCA or None

Notes

This reducer requires the optional dask-ml backend.

See also

PCAReducer

Standard in-memory linear PCA reducer.

IncrementalPCAReducer

Linear PCA variant for batch-wise fitting.

DaskTruncatedSVDReducer

Linear SVD-based alternative for lazy arrays.

IsomapReducer

Nonlinear manifold learner based on geodesic distances.

TSNEReducer

Nonlinear neighborhood-preserving embedding.

UMAPReducer

Nonlinear graph-based embedding balancing local and global structure.

Examples

>>> import dask.array as da
>>> import numpy as np
>>> from coco_pipe.dim_reduction import DaskPCAReducer
>>> X = da.from_array(np.random.rand(100, 10), chunks=(25, 10))
>>> reducer = DaskPCAReducer(n_components=2, svd_solver="tsqr")
>>> _ = reducer.fit(X)
>>> reducer.transform(X).shape
(100, 2)
property capabilities: dict

Return capability metadata for Dask PCA.

Returns:

Capability mapping describing Dask PCA as a linear component-based reducer.

Return type:

dict

svd_solver = 'auto'
fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) DaskPCAReducer[source]

Fit Dask PCA on the input data.

Parameters:
  • X (ArrayLike) – Training data, typically a Dask array or a compatible array-like object accepted by the Dask backend.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

DaskPCAReducer

Raises:
  • ImportError – If dask-ml is not installed.

  • RuntimeError – If dask-ml is installed but fails during initialization.

Examples

>>> import dask.array as da
>>> import numpy as np
>>> from coco_pipe.dim_reduction import DaskPCAReducer
>>> X = da.from_array(np.random.rand(40, 8), chunks=(20, 8))
>>> reducer = DaskPCAReducer(n_components=2)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) Any[source]

Project data using the fitted Dask PCA model.

Parameters:

X (ArrayLike) – Data to project.

Returns:

Backend-specific transformed output, typically a Dask array.

Return type:

Any

Raises:

RuntimeError – If the reducer has not been fitted.

get_components() numpy.ndarray[source]

Return the Dask PCA component loading matrix.

Returns:

Principal component loading matrix or Dask-backed equivalent.

Return type:

np.ndarray

Raises:

RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.reducers.linear.DaskTruncatedSVDReducer(n_components: int = 2, algorithm: str = 'tsqr', **kwargs)[source]

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Dask-ML Truncated SVD reducer.

This reducer wraps dask_ml.decomposition.TruncatedSVD and provides a linear projection for lazy or distributed arrays.

Parameters:
  • n_components (int, default=2) – Number of components to keep.

  • algorithm ({"tsqr", "randomized"}, default="tsqr") – SVD algorithm used by the Dask backend.

  • **kwargs (dict) – Additional keyword arguments forwarded to dask_ml.decomposition.TruncatedSVD after signature filtering.

algorithm

SVD algorithm used when instantiating the backend estimator.

Type:

str

model

Fitted TruncatedSVD estimator after fit.

Type:

dask_ml.decomposition.TruncatedSVD or None

Notes

This reducer requires the optional dask-ml backend.

See also

PCAReducer

Standard in-memory linear PCA reducer.

IncrementalPCAReducer

Linear PCA variant for batch-wise fitting.

DaskPCAReducer

Linear PCA variant for lazy or distributed arrays.

IsomapReducer

Nonlinear manifold learner based on geodesic distances.

TSNEReducer

Nonlinear neighborhood-preserving embedding.

UMAPReducer

Nonlinear graph-based embedding balancing local and global structure.

Examples

>>> import dask.array as da
>>> import numpy as np
>>> from coco_pipe.dim_reduction import DaskTruncatedSVDReducer
>>> X = da.from_array(np.random.rand(120, 15), chunks=(30, 15))
>>> reducer = DaskTruncatedSVDReducer(n_components=3, algorithm="randomized")
>>> _ = reducer.fit(X)
>>> reducer.transform(X).shape
(120, 3)
property capabilities: dict

Return capability metadata for Dask Truncated SVD.

Returns:

Capability mapping describing Dask Truncated SVD as a linear component-based reducer.

Return type:

dict

algorithm = 'tsqr'
fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) DaskTruncatedSVDReducer[source]

Fit Dask Truncated SVD on the input data.

Parameters:
  • X (ArrayLike) – Training data, typically a Dask array or compatible array-like object accepted by the backend.

  • y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

DaskTruncatedSVDReducer

Raises:
  • ImportError – If dask-ml is not installed.

  • RuntimeError – If dask-ml is installed but fails during initialization.

Examples

>>> import dask.array as da
>>> import numpy as np
>>> from coco_pipe.dim_reduction import DaskTruncatedSVDReducer
>>> X = da.from_array(np.random.rand(40, 8), chunks=(20, 8))
>>> reducer = DaskTruncatedSVDReducer(n_components=2)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True
transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) Any[source]

Project data using the fitted Dask Truncated SVD model.

Parameters:

X (ArrayLike) – Data to project.

Returns:

Backend-specific transformed output, typically a Dask array.

Return type:

Any

Raises:

RuntimeError – If the reducer has not been fitted.

get_components() numpy.ndarray[source]

Return the Truncated SVD component loading matrix.

Returns:

Component loading matrix or Dask-backed equivalent.

Return type:

np.ndarray

Raises:

RuntimeError – If the reducer has not been fitted.