coco_pipe.dim_reduction.reducers.linear¶
Linear dimensionality reduction reducers.
This module provides linear projection wrappers built on top of scikit-learn and optional Dask backends. These reducers follow the shared BaseReducer contract so they can be used directly with DimReduction, reporting, and visualization utilities.
Classes¶
- PCAReducer
Principal Component Analysis wrapper based on sklearn.decomposition.PCA.
- IncrementalPCAReducer
Incremental PCA wrapper for batch-wise fitting on larger datasets.
- DaskPCAReducer
Optional Dask-ML PCA wrapper for lazy or distributed arrays.
- DaskTruncatedSVDReducer
Optional Dask-ML Truncated SVD wrapper for lazy or distributed arrays.
References
Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca)
Classes¶
Principal Component Analysis reducer. |
|
Incremental PCA reducer. |
|
Dask-ML PCA reducer for lazy or distributed data. |
|
Dask-ML Truncated SVD reducer. |
Module Contents¶
- class coco_pipe.dim_reduction.reducers.linear.PCAReducer(n_components: int = 2, **kwargs)[source]¶
Bases:
coco_pipe.dim_reduction.reducers.base.BaseReducerPrincipal Component Analysis reducer.
This reducer wraps sklearn.decomposition.PCA and provides a linear low-dimensional embedding based on singular value decomposition.
- Parameters:
n_components (int, default=2) – Number of principal components to keep.
**kwargs (dict) – Additional keyword arguments forwarded to sklearn.decomposition.PCA after signature filtering. Common options include whiten, svd_solver, and random_state.
- model¶
Fitted PCA estimator after fit.
- Type:
sklearn.decomposition.PCA or None
Notes
This is a deterministic linear reducer unless a randomized solver is used.
See also
IncrementalPCAReducerLinear PCA variant for batch-wise fitting.
DaskPCAReducerLinear PCA variant for lazy or distributed arrays.
DaskTruncatedSVDReducerLinear factorization alternative for lazy arrays.
IsomapReducerNonlinear manifold learner based on geodesic distances.
TSNEReducerNonlinear neighborhood-preserving embedding.
UMAPReducerNonlinear graph-based embedding balancing local and global structure.
PHATEReducerNonlinear diffusion-based embedding for smooth trajectories.
Examples
>>> import numpy as np >>> from coco_pipe.dim_reduction import PCAReducer >>> X = np.random.rand(100, 10) >>> reducer = PCAReducer(n_components=2, random_state=42) >>> _ = reducer.fit(X) >>> X_reduced = reducer.transform(X) >>> X_reduced.shape (100, 2) >>> reducer.explained_variance_ratio_.shape (2,) >>> reducer.components_.shape (2, 10) >>> reducer = PCAReducer(n_components=3, whiten=True) >>> reducer.fit_transform(X).shape (100, 3)
- property capabilities: dict¶
Return capability metadata for PCA.
- Returns:
Capability mapping describing PCA as a linear component-based reducer.
- Return type:
dict
- fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) PCAReducer[source]¶
Fit PCA on the input data.
- Parameters:
X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.
- Returns:
Fitted reducer instance.
- Return type:
Examples
>>> import numpy as np >>> from coco_pipe.dim_reduction import PCAReducer >>> X = np.random.rand(20, 5) >>> reducer = PCAReducer(n_components=2) >>> _ = reducer.fit(X) >>> reducer.model is not None True
- transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]¶
Project data onto the fitted principal component basis.
- Parameters:
X (ArrayLike of shape (n_samples, n_features)) – Data to project.
- Returns:
Projected coordinates in principal component space.
- Return type:
np.ndarray of shape (n_samples, n_components)
- Raises:
RuntimeError – If the reducer has not been fitted.
- property explained_variance_ratio_: numpy.ndarray¶
Percentage of variance explained by each selected component.
- Returns:
Explained variance ratio for each retained component.
- Return type:
np.ndarray of shape (n_components,)
- Raises:
RuntimeError – If the reducer has not been fitted.
- property components_: numpy.ndarray¶
Principal axes in feature space.
- Returns:
Principal component loading matrix.
- Return type:
np.ndarray of shape (n_components, n_features)
- Raises:
RuntimeError – If the reducer has not been fitted.
- class coco_pipe.dim_reduction.reducers.linear.IncrementalPCAReducer(n_components: int = 2, batch_size: int | None = None, **kwargs)[source]¶
Bases:
coco_pipe.dim_reduction.reducers.base.BaseReducerIncremental PCA reducer.
This reducer wraps sklearn.decomposition.IncrementalPCA for batch-wise fitting when the full dataset is too large to process in one pass.
- Parameters:
n_components (int, default=2) – Number of principal components to keep.
batch_size (int, optional) – Number of samples processed per batch.
**kwargs (dict) – Additional keyword arguments forwarded to IncrementalPCA after signature filtering.
- batch_size¶
Batch size used when fitting the incremental estimator.
- Type:
int or None
- model¶
Fitted IncrementalPCA estimator after fit or partial_fit.
- Type:
sklearn.decomposition.IncrementalPCA or None
See also
PCAReducerStandard in-memory linear PCA reducer.
DaskPCAReducerLinear PCA variant for lazy or distributed arrays.
DaskTruncatedSVDReducerLinear factorization alternative for lazy arrays.
IsomapReducerNonlinear manifold learner based on geodesic distances.
TSNEReducerNonlinear neighborhood-preserving embedding.
UMAPReducerNonlinear graph-based embedding balancing local and global structure.
Examples
>>> import numpy as np >>> from coco_pipe.dim_reduction import IncrementalPCAReducer >>> X = np.random.rand(100, 12) >>> reducer = IncrementalPCAReducer(n_components=3, batch_size=25) >>> _ = reducer.fit(X) >>> reducer.transform(X[:10]).shape (10, 3) >>> stream = IncrementalPCAReducer(n_components=2, batch_size=20) >>> _ = stream.partial_fit(X[:50]) >>> _ = stream.partial_fit(X[50:]) >>> stream.transform(X).shape (100, 2)
- property capabilities: dict¶
Return capability metadata for Incremental PCA.
- Returns:
Capability mapping describing Incremental PCA as a linear component-based reducer.
- Return type:
dict
- batch_size = None¶
- fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) IncrementalPCAReducer[source]¶
Fit Incremental PCA in batch mode.
- Parameters:
X (ArrayLike of shape (n_samples, n_features)) – Training data.
y (ArrayLike, optional) – Ignored. Present for API compatibility.
- Returns:
Fitted reducer instance.
- Return type:
Examples
>>> import numpy as np >>> from coco_pipe.dim_reduction import IncrementalPCAReducer >>> X = np.random.rand(30, 6) >>> reducer = IncrementalPCAReducer(n_components=2, batch_size=10) >>> _ = reducer.fit(X) >>> reducer.model is not None True
- partial_fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) IncrementalPCAReducer[source]¶
Incrementally fit the estimator on a batch of samples.
- Parameters:
X (ArrayLike of shape (n_samples, n_features)) – Batch of training samples.
y (ArrayLike, optional) – Ignored. Present for API compatibility.
- Returns:
Reducer instance after updating the incremental estimator.
- Return type:
Examples
>>> import numpy as np >>> from coco_pipe.dim_reduction import IncrementalPCAReducer >>> X = np.random.rand(40, 6) >>> reducer = IncrementalPCAReducer(n_components=2, batch_size=20) >>> _ = reducer.partial_fit(X[:20]) >>> _ = reducer.partial_fit(X[20:]) >>> reducer.model is not None True
- transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) numpy.ndarray[source]¶
Project data onto the fitted incremental PCA basis.
- Parameters:
X (ArrayLike of shape (n_samples, n_features)) – Data to project.
- Returns:
Projected coordinates in component space.
- Return type:
np.ndarray of shape (n_samples, n_components)
- Raises:
RuntimeError – If the reducer has not been fitted.
- class coco_pipe.dim_reduction.reducers.linear.DaskPCAReducer(n_components: int = 2, svd_solver: str = 'auto', **kwargs)[source]¶
Bases:
coco_pipe.dim_reduction.reducers.base.BaseReducerDask-ML PCA reducer for lazy or distributed data.
This reducer wraps dask_ml.decomposition.PCA. The backend is imported lazily so the rest of the package remains importable without dask-ml.
- Parameters:
n_components (int, default=2) – Number of principal components to keep.
svd_solver ({"auto", "full", "tsqr", "randomized"}, default="auto") – Solver used by the Dask PCA backend.
**kwargs (dict) – Additional keyword arguments forwarded to dask_ml.decomposition.PCA after signature filtering.
- svd_solver¶
Solver used when instantiating the Dask PCA estimator.
- Type:
str
- model¶
Fitted Dask PCA estimator after fit.
- Type:
dask_ml.decomposition.PCA or None
Notes
This reducer requires the optional dask-ml backend.
See also
PCAReducerStandard in-memory linear PCA reducer.
IncrementalPCAReducerLinear PCA variant for batch-wise fitting.
DaskTruncatedSVDReducerLinear SVD-based alternative for lazy arrays.
IsomapReducerNonlinear manifold learner based on geodesic distances.
TSNEReducerNonlinear neighborhood-preserving embedding.
UMAPReducerNonlinear graph-based embedding balancing local and global structure.
Examples
>>> import dask.array as da >>> import numpy as np >>> from coco_pipe.dim_reduction import DaskPCAReducer >>> X = da.from_array(np.random.rand(100, 10), chunks=(25, 10)) >>> reducer = DaskPCAReducer(n_components=2, svd_solver="tsqr") >>> _ = reducer.fit(X) >>> reducer.transform(X).shape (100, 2)
- property capabilities: dict¶
Return capability metadata for Dask PCA.
- Returns:
Capability mapping describing Dask PCA as a linear component-based reducer.
- Return type:
dict
- svd_solver = 'auto'¶
- fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) DaskPCAReducer[source]¶
Fit Dask PCA on the input data.
- Parameters:
X (ArrayLike) – Training data, typically a Dask array or a compatible array-like object accepted by the Dask backend.
y (ArrayLike, optional) – Ignored. Present for API compatibility.
- Returns:
Fitted reducer instance.
- Return type:
- Raises:
ImportError – If dask-ml is not installed.
RuntimeError – If dask-ml is installed but fails during initialization.
Examples
>>> import dask.array as da >>> import numpy as np >>> from coco_pipe.dim_reduction import DaskPCAReducer >>> X = da.from_array(np.random.rand(40, 8), chunks=(20, 8)) >>> reducer = DaskPCAReducer(n_components=2) >>> _ = reducer.fit(X) >>> reducer.model is not None True
- transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) Any[source]¶
Project data using the fitted Dask PCA model.
- Parameters:
X (ArrayLike) – Data to project.
- Returns:
Backend-specific transformed output, typically a Dask array.
- Return type:
Any
- Raises:
RuntimeError – If the reducer has not been fitted.
- class coco_pipe.dim_reduction.reducers.linear.DaskTruncatedSVDReducer(n_components: int = 2, algorithm: str = 'tsqr', **kwargs)[source]¶
Bases:
coco_pipe.dim_reduction.reducers.base.BaseReducerDask-ML Truncated SVD reducer.
This reducer wraps dask_ml.decomposition.TruncatedSVD and provides a linear projection for lazy or distributed arrays.
- Parameters:
n_components (int, default=2) – Number of components to keep.
algorithm ({"tsqr", "randomized"}, default="tsqr") – SVD algorithm used by the Dask backend.
**kwargs (dict) – Additional keyword arguments forwarded to dask_ml.decomposition.TruncatedSVD after signature filtering.
- algorithm¶
SVD algorithm used when instantiating the backend estimator.
- Type:
str
- model¶
Fitted TruncatedSVD estimator after fit.
- Type:
dask_ml.decomposition.TruncatedSVD or None
Notes
This reducer requires the optional dask-ml backend.
See also
PCAReducerStandard in-memory linear PCA reducer.
IncrementalPCAReducerLinear PCA variant for batch-wise fitting.
DaskPCAReducerLinear PCA variant for lazy or distributed arrays.
IsomapReducerNonlinear manifold learner based on geodesic distances.
TSNEReducerNonlinear neighborhood-preserving embedding.
UMAPReducerNonlinear graph-based embedding balancing local and global structure.
Examples
>>> import dask.array as da >>> import numpy as np >>> from coco_pipe.dim_reduction import DaskTruncatedSVDReducer >>> X = da.from_array(np.random.rand(120, 15), chunks=(30, 15)) >>> reducer = DaskTruncatedSVDReducer(n_components=3, algorithm="randomized") >>> _ = reducer.fit(X) >>> reducer.transform(X).shape (120, 3)
- property capabilities: dict¶
Return capability metadata for Dask Truncated SVD.
- Returns:
Capability mapping describing Dask Truncated SVD as a linear component-based reducer.
- Return type:
dict
- algorithm = 'tsqr'¶
- fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: coco_pipe.dim_reduction.reducers.base.ArrayLike | None = None) DaskTruncatedSVDReducer[source]¶
Fit Dask Truncated SVD on the input data.
- Parameters:
X (ArrayLike) – Training data, typically a Dask array or compatible array-like object accepted by the backend.
y (ArrayLike, optional) – Ignored. Present for API compatibility.
- Returns:
Fitted reducer instance.
- Return type:
- Raises:
ImportError – If dask-ml is not installed.
RuntimeError – If dask-ml is installed but fails during initialization.
Examples
>>> import dask.array as da >>> import numpy as np >>> from coco_pipe.dim_reduction import DaskTruncatedSVDReducer >>> X = da.from_array(np.random.rand(40, 8), chunks=(20, 8)) >>> reducer = DaskTruncatedSVDReducer(n_components=2) >>> _ = reducer.fit(X) >>> reducer.model is not None True
- transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) Any[source]¶
Project data using the fitted Dask Truncated SVD model.
- Parameters:
X (ArrayLike) – Data to project.
- Returns:
Backend-specific transformed output, typically a Dask array.
- Return type:
Any
- Raises:
RuntimeError – If the reducer has not been fitted.