coco_pipe.dim_reduction.reducers.linear¶

Linear dimensionality reduction reducers.

This module provides linear projection wrappers built on top of scikit-learn and optional Dask backends. These reducers follow the shared BaseReducer contract so they can be used directly with DimReduction, reporting, and visualization utilities.

Classes¶

PCAReducer: Principal Component Analysis wrapper based on sklearn.decomposition.PCA.
IncrementalPCAReducer: Incremental PCA wrapper for batch-wise fitting on larger datasets.
DaskPCAReducer: Optional Dask-ML PCA wrapper for lazy or distributed arrays.
DaskTruncatedSVDReducer: Optional Dask-ML Truncated SVD wrapper for lazy or distributed arrays.

References

Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca)

Classes¶

`PCAReducer`	Principal Component Analysis reducer.
`IncrementalPCAReducer`	Incremental PCA reducer.
`DaskPCAReducer`	Dask-ML PCA reducer for lazy or distributed data.
`DaskTruncatedSVDReducer`	Dask-ML Truncated SVD reducer.

Module Contents¶

class coco_pipe.dim_reduction.reducers.linear.PCAReducer(n_components: int = 2, **kwargs)[source]¶

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Principal Component Analysis reducer.

This reducer wraps sklearn.decomposition.PCA and provides a linear low-dimensional embedding based on singular value decomposition.

Parameters:

n_components (int, default=2) – Number of principal components to keep.
**kwargs (dict) – Additional keyword arguments forwarded to sklearn.decomposition.PCA after signature filtering. Common options include whiten, svd_solver, and random_state.

model¶

Fitted PCA estimator after fit.

Type:: sklearn.decomposition.PCA or None

Notes

This is a deterministic linear reducer unless a randomized solver is used.

See also

IncrementalPCAReducer: Linear PCA variant for batch-wise fitting.
DaskPCAReducer: Linear PCA variant for lazy or distributed arrays.
DaskTruncatedSVDReducer: Linear factorization alternative for lazy arrays.
IsomapReducer: Nonlinear manifold learner based on geodesic distances.
TSNEReducer: Nonlinear neighborhood-preserving embedding.
UMAPReducer: Nonlinear graph-based embedding balancing local and global structure.
PHATEReducer: Nonlinear diffusion-based embedding for smooth trajectories.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import PCAReducer
>>> X = np.random.rand(100, 10)
>>> reducer = PCAReducer(n_components=2, random_state=42)
>>> _ = reducer.fit(X)
>>> X_reduced = reducer.transform(X)
>>> X_reduced.shape
(100, 2)
>>> reducer.explained_variance_ratio_.shape
(2,)
>>> reducer.components_.shape
(2, 10)
>>> reducer = PCAReducer(n_components=3, whiten=True)
>>> reducer.fit_transform(X).shape
(100, 3)

property capabilities: dict¶

Return capability metadata for PCA.

Returns:: Capability mapping describing PCA as a linear component-based reducer.
Return type:: dict

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → PCAReducer[source]¶

Fit PCA on the input data.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

PCAReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import PCAReducer
>>> X = np.random.rand(20, 5)
>>> reducer = PCAReducer(n_components=2)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True

transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) → numpy.ndarray[source]¶

Project data onto the fitted principal component basis.

Parameters:: X (ArrayLike of shape (n_samples, n_features)) – Data to project.
Returns:: Projected coordinates in principal component space.
Return type:: np.ndarray of shape (n_samples, n_components)
Raises:: RuntimeError – If the reducer has not been fitted.

property explained_variance_ratio_: numpy.ndarray¶

Percentage of variance explained by each selected component.

Returns:: Explained variance ratio for each retained component.
Return type:: np.ndarray of shape (n_components,)
Raises:: RuntimeError – If the reducer has not been fitted.

property components_: numpy.ndarray¶

Principal axes in feature space.

Returns:: Principal component loading matrix.
Return type:: np.ndarray of shape (n_components, n_features)
Raises:: RuntimeError – If the reducer has not been fitted.

get_components() → numpy.ndarray[source]¶

Return the principal component loading matrix.

Returns:: Principal component loading matrix.
Return type:: np.ndarray
Raises:: RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.reducers.linear.IncrementalPCAReducer(n_components: int = 2, batch_size: int | None = None, **kwargs)[source]¶

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Incremental PCA reducer.

This reducer wraps sklearn.decomposition.IncrementalPCA for batch-wise fitting when the full dataset is too large to process in one pass.

Parameters:

n_components (int, default=2) – Number of principal components to keep.
batch_size (int, optional) – Number of samples processed per batch.
**kwargs (dict) – Additional keyword arguments forwarded to IncrementalPCA after signature filtering.

batch_size¶

Batch size used when fitting the incremental estimator.

Type:: int or None

model¶

Fitted IncrementalPCA estimator after fit or partial_fit.

Type:: sklearn.decomposition.IncrementalPCA or None

See also

PCAReducer: Standard in-memory linear PCA reducer.
DaskPCAReducer: Linear PCA variant for lazy or distributed arrays.
DaskTruncatedSVDReducer: Linear factorization alternative for lazy arrays.
IsomapReducer: Nonlinear manifold learner based on geodesic distances.
TSNEReducer: Nonlinear neighborhood-preserving embedding.
UMAPReducer: Nonlinear graph-based embedding balancing local and global structure.

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IncrementalPCAReducer
>>> X = np.random.rand(100, 12)
>>> reducer = IncrementalPCAReducer(n_components=3, batch_size=25)
>>> _ = reducer.fit(X)
>>> reducer.transform(X[:10]).shape
(10, 3)
>>> stream = IncrementalPCAReducer(n_components=2, batch_size=20)
>>> _ = stream.partial_fit(X[:50])
>>> _ = stream.partial_fit(X[50:])
>>> stream.transform(X).shape
(100, 2)

property capabilities: dict¶

Return capability metadata for Incremental PCA.

Returns:: Capability mapping describing Incremental PCA as a linear component-based reducer.
Return type:: dict

batch_size = None¶

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → IncrementalPCAReducer[source]¶

Fit Incremental PCA in batch mode.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

IncrementalPCAReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IncrementalPCAReducer
>>> X = np.random.rand(30, 6)
>>> reducer = IncrementalPCAReducer(n_components=2, batch_size=10)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True

partial_fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → IncrementalPCAReducer[source]¶

Incrementally fit the estimator on a batch of samples.

Parameters:

X (ArrayLike of shape (n_samples, n_features)) – Batch of training samples.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Reducer instance after updating the incremental estimator.

Return type:

IncrementalPCAReducer

Examples

>>> import numpy as np
>>> from coco_pipe.dim_reduction import IncrementalPCAReducer
>>> X = np.random.rand(40, 6)
>>> reducer = IncrementalPCAReducer(n_components=2, batch_size=20)
>>> _ = reducer.partial_fit(X[:20])
>>> _ = reducer.partial_fit(X[20:])
>>> reducer.model is not None
True

transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) → numpy.ndarray[source]¶

Project data onto the fitted incremental PCA basis.

Parameters:: X (ArrayLike of shape (n_samples, n_features)) – Data to project.
Returns:: Projected coordinates in component space.
Return type:: np.ndarray of shape (n_samples, n_components)
Raises:: RuntimeError – If the reducer has not been fitted.

get_components() → numpy.ndarray[source]¶

Return the incremental PCA component loading matrix.

Returns:: Principal component loading matrix.
Return type:: np.ndarray
Raises:: RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.reducers.linear.DaskPCAReducer(n_components: int = 2, svd_solver: str = 'auto', **kwargs)[source]¶

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Dask-ML PCA reducer for lazy or distributed data.

This reducer wraps dask_ml.decomposition.PCA. The backend is imported lazily so the rest of the package remains importable without dask-ml.

Parameters:

n_components (int, default=2) – Number of principal components to keep.
svd_solver ({"auto", "full", "tsqr", "randomized"}, default="auto") – Solver used by the Dask PCA backend.
**kwargs (dict) – Additional keyword arguments forwarded to dask_ml.decomposition.PCA after signature filtering.

svd_solver¶

Solver used when instantiating the Dask PCA estimator.

Type:: str

model¶

Fitted Dask PCA estimator after fit.

Type:: dask_ml.decomposition.PCA or None

Notes

This reducer requires the optional dask-ml backend.

See also

PCAReducer: Standard in-memory linear PCA reducer.
IncrementalPCAReducer: Linear PCA variant for batch-wise fitting.
DaskTruncatedSVDReducer: Linear SVD-based alternative for lazy arrays.
IsomapReducer: Nonlinear manifold learner based on geodesic distances.
TSNEReducer: Nonlinear neighborhood-preserving embedding.
UMAPReducer: Nonlinear graph-based embedding balancing local and global structure.

Examples

>>> import dask.array as da
>>> import numpy as np
>>> from coco_pipe.dim_reduction import DaskPCAReducer
>>> X = da.from_array(np.random.rand(100, 10), chunks=(25, 10))
>>> reducer = DaskPCAReducer(n_components=2, svd_solver="tsqr")
>>> _ = reducer.fit(X)
>>> reducer.transform(X).shape
(100, 2)

property capabilities: dict¶

Return capability metadata for Dask PCA.

Returns:: Capability mapping describing Dask PCA as a linear component-based reducer.
Return type:: dict

svd_solver = 'auto'¶

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → DaskPCAReducer[source]¶

Fit Dask PCA on the input data.

Parameters:

X (ArrayLike) – Training data, typically a Dask array or a compatible array-like object accepted by the Dask backend.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

DaskPCAReducer

Raises:

ImportError – If dask-ml is not installed.
RuntimeError – If dask-ml is installed but fails during initialization.

Examples

>>> import dask.array as da
>>> import numpy as np
>>> from coco_pipe.dim_reduction import DaskPCAReducer
>>> X = da.from_array(np.random.rand(40, 8), chunks=(20, 8))
>>> reducer = DaskPCAReducer(n_components=2)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True

transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) → Any[source]¶

Project data using the fitted Dask PCA model.

Parameters:: X (ArrayLike) – Data to project.
Returns:: Backend-specific transformed output, typically a Dask array.
Return type:: Any
Raises:: RuntimeError – If the reducer has not been fitted.

get_components() → numpy.ndarray[source]¶

Return the Dask PCA component loading matrix.

Returns:: Principal component loading matrix or Dask-backed equivalent.
Return type:: np.ndarray
Raises:: RuntimeError – If the reducer has not been fitted.

class coco_pipe.dim_reduction.reducers.linear.DaskTruncatedSVDReducer(n_components: int = 2, algorithm: str = 'tsqr', **kwargs)[source]¶

Bases: coco_pipe.dim_reduction.reducers.base.BaseReducer

Dask-ML Truncated SVD reducer.

This reducer wraps dask_ml.decomposition.TruncatedSVD and provides a linear projection for lazy or distributed arrays.

Parameters:

n_components (int, default=2) – Number of components to keep.
algorithm ({"tsqr", "randomized"}, default="tsqr") – SVD algorithm used by the Dask backend.
**kwargs (dict) – Additional keyword arguments forwarded to dask_ml.decomposition.TruncatedSVD after signature filtering.

algorithm¶

SVD algorithm used when instantiating the backend estimator.

Type:: str

model¶

Fitted TruncatedSVD estimator after fit.

Type:: dask_ml.decomposition.TruncatedSVD or None

Notes

This reducer requires the optional dask-ml backend.

See also

PCAReducer: Standard in-memory linear PCA reducer.
IncrementalPCAReducer: Linear PCA variant for batch-wise fitting.
DaskPCAReducer: Linear PCA variant for lazy or distributed arrays.
IsomapReducer: Nonlinear manifold learner based on geodesic distances.
TSNEReducer: Nonlinear neighborhood-preserving embedding.
UMAPReducer: Nonlinear graph-based embedding balancing local and global structure.

Examples

>>> import dask.array as da
>>> import numpy as np
>>> from coco_pipe.dim_reduction import DaskTruncatedSVDReducer
>>> X = da.from_array(np.random.rand(120, 15), chunks=(30, 15))
>>> reducer = DaskTruncatedSVDReducer(n_components=3, algorithm="randomized")
>>> _ = reducer.fit(X)
>>> reducer.transform(X).shape
(120, 3)

property capabilities: dict¶

Return capability metadata for Dask Truncated SVD.

Returns:: Capability mapping describing Dask Truncated SVD as a linear component-based reducer.
Return type:: dict

algorithm = 'tsqr'¶

fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) → DaskTruncatedSVDReducer[source]¶

Fit Dask Truncated SVD on the input data.

Parameters:

X (ArrayLike) – Training data, typically a Dask array or compatible array-like object accepted by the backend.
y (ArrayLike, optional) – Ignored. Present for API compatibility.

Returns:

Fitted reducer instance.

Return type:

DaskTruncatedSVDReducer

Raises:

ImportError – If dask-ml is not installed.
RuntimeError – If dask-ml is installed but fails during initialization.

Examples

>>> import dask.array as da
>>> import numpy as np
>>> from coco_pipe.dim_reduction import DaskTruncatedSVDReducer
>>> X = da.from_array(np.random.rand(40, 8), chunks=(20, 8))
>>> reducer = DaskTruncatedSVDReducer(n_components=2)
>>> _ = reducer.fit(X)
>>> reducer.model is not None
True

transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) → Any[source]¶

Project data using the fitted Dask Truncated SVD model.

Parameters:: X (ArrayLike) – Data to project.
Returns:: Backend-specific transformed output, typically a Dask array.
Return type:: Any
Raises:: RuntimeError – If the reducer has not been fitted.

get_components() → numpy.ndarray[source]¶

Return the Truncated SVD component loading matrix.

Returns:: Component loading matrix or Dask-backed equivalent.
Return type:: np.ndarray
Raises:: RuntimeError – If the reducer has not been fitted.