coco_pipe.dim_reduction.reducers.linear ======================================= .. py:module:: coco_pipe.dim_reduction.reducers.linear .. autoapi-nested-parse:: Linear dimensionality reduction reducers. This module provides linear projection wrappers built on top of scikit-learn and optional Dask backends. These reducers follow the shared `BaseReducer` contract so they can be used directly with `DimReduction`, reporting, and visualization utilities. Classes ------- PCAReducer Principal Component Analysis wrapper based on `sklearn.decomposition.PCA`. IncrementalPCAReducer Incremental PCA wrapper for batch-wise fitting on larger datasets. DaskPCAReducer Optional Dask-ML PCA wrapper for lazy or distributed arrays. DaskTruncatedSVDReducer Optional Dask-ML Truncated SVD wrapper for lazy or distributed arrays. .. rubric:: References .. [1] Pearson, K. (1901). "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine, 2(11), 559-572. .. [2] Hotelling, H. (1933). "Analysis of a complex of statistical variables into principal components". Journal of Educational Psychology, 24(6), 417-441. .. [3] Scikit-learn PCA documentation: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca) Classes ------- .. autoapisummary:: coco_pipe.dim_reduction.reducers.linear.PCAReducer coco_pipe.dim_reduction.reducers.linear.IncrementalPCAReducer coco_pipe.dim_reduction.reducers.linear.DaskPCAReducer coco_pipe.dim_reduction.reducers.linear.DaskTruncatedSVDReducer Module Contents --------------- .. py:class:: PCAReducer(n_components: int = 2, **kwargs) Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer` Principal Component Analysis reducer. This reducer wraps `sklearn.decomposition.PCA` and provides a linear low-dimensional embedding based on singular value decomposition. :param n_components: Number of principal components to keep. :type n_components: int, default=2 :param \*\*kwargs: Additional keyword arguments forwarded to `sklearn.decomposition.PCA` after signature filtering. Common options include `whiten`, `svd_solver`, and `random_state`. :type \*\*kwargs: dict .. attribute:: model Fitted PCA estimator after `fit`. :type: sklearn.decomposition.PCA or None .. rubric:: Notes This is a deterministic linear reducer unless a randomized solver is used. .. seealso:: :obj:`IncrementalPCAReducer` Linear PCA variant for batch-wise fitting. :obj:`DaskPCAReducer` Linear PCA variant for lazy or distributed arrays. :obj:`DaskTruncatedSVDReducer` Linear factorization alternative for lazy arrays. :obj:`IsomapReducer` Nonlinear manifold learner based on geodesic distances. :obj:`TSNEReducer` Nonlinear neighborhood-preserving embedding. :obj:`UMAPReducer` Nonlinear graph-based embedding balancing local and global structure. :obj:`PHATEReducer` Nonlinear diffusion-based embedding for smooth trajectories. .. rubric:: Examples >>> import numpy as np >>> from coco_pipe.dim_reduction import PCAReducer >>> X = np.random.rand(100, 10) >>> reducer = PCAReducer(n_components=2, random_state=42) >>> _ = reducer.fit(X) >>> X_reduced = reducer.transform(X) >>> X_reduced.shape (100, 2) >>> reducer.explained_variance_ratio_.shape (2,) >>> reducer.components_.shape (2, 10) >>> reducer = PCAReducer(n_components=3, whiten=True) >>> reducer.fit_transform(X).shape (100, 3) .. py:property:: capabilities :type: dict Return capability metadata for PCA. :returns: Capability mapping describing PCA as a linear component-based reducer. :rtype: dict .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> PCAReducer Fit PCA on the input data. :param X: Training data. :type X: ArrayLike of shape (n_samples, n_features) :param y: Ignored. Present for API compatibility. :type y: ArrayLike, optional :returns: Fitted reducer instance. :rtype: PCAReducer .. rubric:: Examples >>> import numpy as np >>> from coco_pipe.dim_reduction import PCAReducer >>> X = np.random.rand(20, 5) >>> reducer = PCAReducer(n_components=2) >>> _ = reducer.fit(X) >>> reducer.model is not None True .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> numpy.ndarray Project data onto the fitted principal component basis. :param X: Data to project. :type X: ArrayLike of shape (n_samples, n_features) :returns: Projected coordinates in principal component space. :rtype: np.ndarray of shape (n_samples, n_components) :raises RuntimeError: If the reducer has not been fitted. .. py:property:: explained_variance_ratio_ :type: numpy.ndarray Percentage of variance explained by each selected component. :returns: Explained variance ratio for each retained component. :rtype: np.ndarray of shape (n_components,) :raises RuntimeError: If the reducer has not been fitted. .. py:property:: components_ :type: numpy.ndarray Principal axes in feature space. :returns: Principal component loading matrix. :rtype: np.ndarray of shape (n_components, n_features) :raises RuntimeError: If the reducer has not been fitted. .. py:method:: get_components() -> numpy.ndarray Return the principal component loading matrix. :returns: Principal component loading matrix. :rtype: np.ndarray :raises RuntimeError: If the reducer has not been fitted. .. py:class:: IncrementalPCAReducer(n_components: int = 2, batch_size: Optional[int] = None, **kwargs) Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer` Incremental PCA reducer. This reducer wraps `sklearn.decomposition.IncrementalPCA` for batch-wise fitting when the full dataset is too large to process in one pass. :param n_components: Number of principal components to keep. :type n_components: int, default=2 :param batch_size: Number of samples processed per batch. :type batch_size: int, optional :param \*\*kwargs: Additional keyword arguments forwarded to `IncrementalPCA` after signature filtering. :type \*\*kwargs: dict .. attribute:: batch_size Batch size used when fitting the incremental estimator. :type: int or None .. attribute:: model Fitted IncrementalPCA estimator after `fit` or `partial_fit`. :type: sklearn.decomposition.IncrementalPCA or None .. seealso:: :obj:`PCAReducer` Standard in-memory linear PCA reducer. :obj:`DaskPCAReducer` Linear PCA variant for lazy or distributed arrays. :obj:`DaskTruncatedSVDReducer` Linear factorization alternative for lazy arrays. :obj:`IsomapReducer` Nonlinear manifold learner based on geodesic distances. :obj:`TSNEReducer` Nonlinear neighborhood-preserving embedding. :obj:`UMAPReducer` Nonlinear graph-based embedding balancing local and global structure. .. rubric:: Examples >>> import numpy as np >>> from coco_pipe.dim_reduction import IncrementalPCAReducer >>> X = np.random.rand(100, 12) >>> reducer = IncrementalPCAReducer(n_components=3, batch_size=25) >>> _ = reducer.fit(X) >>> reducer.transform(X[:10]).shape (10, 3) >>> stream = IncrementalPCAReducer(n_components=2, batch_size=20) >>> _ = stream.partial_fit(X[:50]) >>> _ = stream.partial_fit(X[50:]) >>> stream.transform(X).shape (100, 2) .. py:property:: capabilities :type: dict Return capability metadata for Incremental PCA. :returns: Capability mapping describing Incremental PCA as a linear component-based reducer. :rtype: dict .. py:attribute:: batch_size :value: None .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> IncrementalPCAReducer Fit Incremental PCA in batch mode. :param X: Training data. :type X: ArrayLike of shape (n_samples, n_features) :param y: Ignored. Present for API compatibility. :type y: ArrayLike, optional :returns: Fitted reducer instance. :rtype: IncrementalPCAReducer .. rubric:: Examples >>> import numpy as np >>> from coco_pipe.dim_reduction import IncrementalPCAReducer >>> X = np.random.rand(30, 6) >>> reducer = IncrementalPCAReducer(n_components=2, batch_size=10) >>> _ = reducer.fit(X) >>> reducer.model is not None True .. py:method:: partial_fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> IncrementalPCAReducer Incrementally fit the estimator on a batch of samples. :param X: Batch of training samples. :type X: ArrayLike of shape (n_samples, n_features) :param y: Ignored. Present for API compatibility. :type y: ArrayLike, optional :returns: Reducer instance after updating the incremental estimator. :rtype: IncrementalPCAReducer .. rubric:: Examples >>> import numpy as np >>> from coco_pipe.dim_reduction import IncrementalPCAReducer >>> X = np.random.rand(40, 6) >>> reducer = IncrementalPCAReducer(n_components=2, batch_size=20) >>> _ = reducer.partial_fit(X[:20]) >>> _ = reducer.partial_fit(X[20:]) >>> reducer.model is not None True .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> numpy.ndarray Project data onto the fitted incremental PCA basis. :param X: Data to project. :type X: ArrayLike of shape (n_samples, n_features) :returns: Projected coordinates in component space. :rtype: np.ndarray of shape (n_samples, n_components) :raises RuntimeError: If the reducer has not been fitted. .. py:method:: get_components() -> numpy.ndarray Return the incremental PCA component loading matrix. :returns: Principal component loading matrix. :rtype: np.ndarray :raises RuntimeError: If the reducer has not been fitted. .. py:class:: DaskPCAReducer(n_components: int = 2, svd_solver: str = 'auto', **kwargs) Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer` Dask-ML PCA reducer for lazy or distributed data. This reducer wraps `dask_ml.decomposition.PCA`. The backend is imported lazily so the rest of the package remains importable without `dask-ml`. :param n_components: Number of principal components to keep. :type n_components: int, default=2 :param svd_solver: Solver used by the Dask PCA backend. :type svd_solver: {"auto", "full", "tsqr", "randomized"}, default="auto" :param \*\*kwargs: Additional keyword arguments forwarded to `dask_ml.decomposition.PCA` after signature filtering. :type \*\*kwargs: dict .. attribute:: svd_solver Solver used when instantiating the Dask PCA estimator. :type: str .. attribute:: model Fitted Dask PCA estimator after `fit`. :type: dask_ml.decomposition.PCA or None .. rubric:: Notes This reducer requires the optional `dask-ml` backend. .. seealso:: :obj:`PCAReducer` Standard in-memory linear PCA reducer. :obj:`IncrementalPCAReducer` Linear PCA variant for batch-wise fitting. :obj:`DaskTruncatedSVDReducer` Linear SVD-based alternative for lazy arrays. :obj:`IsomapReducer` Nonlinear manifold learner based on geodesic distances. :obj:`TSNEReducer` Nonlinear neighborhood-preserving embedding. :obj:`UMAPReducer` Nonlinear graph-based embedding balancing local and global structure. .. rubric:: Examples >>> import dask.array as da >>> import numpy as np >>> from coco_pipe.dim_reduction import DaskPCAReducer >>> X = da.from_array(np.random.rand(100, 10), chunks=(25, 10)) >>> reducer = DaskPCAReducer(n_components=2, svd_solver="tsqr") >>> _ = reducer.fit(X) >>> reducer.transform(X).shape (100, 2) .. py:property:: capabilities :type: dict Return capability metadata for Dask PCA. :returns: Capability mapping describing Dask PCA as a linear component-based reducer. :rtype: dict .. py:attribute:: svd_solver :value: 'auto' .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> DaskPCAReducer Fit Dask PCA on the input data. :param X: Training data, typically a Dask array or a compatible array-like object accepted by the Dask backend. :type X: ArrayLike :param y: Ignored. Present for API compatibility. :type y: ArrayLike, optional :returns: Fitted reducer instance. :rtype: DaskPCAReducer :raises ImportError: If `dask-ml` is not installed. :raises RuntimeError: If `dask-ml` is installed but fails during initialization. .. rubric:: Examples >>> import dask.array as da >>> import numpy as np >>> from coco_pipe.dim_reduction import DaskPCAReducer >>> X = da.from_array(np.random.rand(40, 8), chunks=(20, 8)) >>> reducer = DaskPCAReducer(n_components=2) >>> _ = reducer.fit(X) >>> reducer.model is not None True .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> Any Project data using the fitted Dask PCA model. :param X: Data to project. :type X: ArrayLike :returns: Backend-specific transformed output, typically a Dask array. :rtype: Any :raises RuntimeError: If the reducer has not been fitted. .. py:method:: get_components() -> numpy.ndarray Return the Dask PCA component loading matrix. :returns: Principal component loading matrix or Dask-backed equivalent. :rtype: np.ndarray :raises RuntimeError: If the reducer has not been fitted. .. py:class:: DaskTruncatedSVDReducer(n_components: int = 2, algorithm: str = 'tsqr', **kwargs) Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer` Dask-ML Truncated SVD reducer. This reducer wraps `dask_ml.decomposition.TruncatedSVD` and provides a linear projection for lazy or distributed arrays. :param n_components: Number of components to keep. :type n_components: int, default=2 :param algorithm: SVD algorithm used by the Dask backend. :type algorithm: {"tsqr", "randomized"}, default="tsqr" :param \*\*kwargs: Additional keyword arguments forwarded to `dask_ml.decomposition.TruncatedSVD` after signature filtering. :type \*\*kwargs: dict .. attribute:: algorithm SVD algorithm used when instantiating the backend estimator. :type: str .. attribute:: model Fitted TruncatedSVD estimator after `fit`. :type: dask_ml.decomposition.TruncatedSVD or None .. rubric:: Notes This reducer requires the optional `dask-ml` backend. .. seealso:: :obj:`PCAReducer` Standard in-memory linear PCA reducer. :obj:`IncrementalPCAReducer` Linear PCA variant for batch-wise fitting. :obj:`DaskPCAReducer` Linear PCA variant for lazy or distributed arrays. :obj:`IsomapReducer` Nonlinear manifold learner based on geodesic distances. :obj:`TSNEReducer` Nonlinear neighborhood-preserving embedding. :obj:`UMAPReducer` Nonlinear graph-based embedding balancing local and global structure. .. rubric:: Examples >>> import dask.array as da >>> import numpy as np >>> from coco_pipe.dim_reduction import DaskTruncatedSVDReducer >>> X = da.from_array(np.random.rand(120, 15), chunks=(30, 15)) >>> reducer = DaskTruncatedSVDReducer(n_components=3, algorithm="randomized") >>> _ = reducer.fit(X) >>> reducer.transform(X).shape (120, 3) .. py:property:: capabilities :type: dict Return capability metadata for Dask Truncated SVD. :returns: Capability mapping describing Dask Truncated SVD as a linear component-based reducer. :rtype: dict .. py:attribute:: algorithm :value: 'tsqr' .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> DaskTruncatedSVDReducer Fit Dask Truncated SVD on the input data. :param X: Training data, typically a Dask array or compatible array-like object accepted by the backend. :type X: ArrayLike :param y: Ignored. Present for API compatibility. :type y: ArrayLike, optional :returns: Fitted reducer instance. :rtype: DaskTruncatedSVDReducer :raises ImportError: If `dask-ml` is not installed. :raises RuntimeError: If `dask-ml` is installed but fails during initialization. .. rubric:: Examples >>> import dask.array as da >>> import numpy as np >>> from coco_pipe.dim_reduction import DaskTruncatedSVDReducer >>> X = da.from_array(np.random.rand(40, 8), chunks=(20, 8)) >>> reducer = DaskTruncatedSVDReducer(n_components=2) >>> _ = reducer.fit(X) >>> reducer.model is not None True .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> Any Project data using the fitted Dask Truncated SVD model. :param X: Data to project. :type X: ArrayLike :returns: Backend-specific transformed output, typically a Dask array. :rtype: Any :raises RuntimeError: If the reducer has not been fitted. .. py:method:: get_components() -> numpy.ndarray Return the Truncated SVD component loading matrix. :returns: Component loading matrix or Dask-backed equivalent. :rtype: np.ndarray :raises RuntimeError: If the reducer has not been fitted.