coco_pipe.dim_reduction.reducers.linear
=======================================

.. py:module:: coco_pipe.dim_reduction.reducers.linear

.. autoapi-nested-parse::

   Linear dimensionality reduction reducers.

   This module provides linear projection wrappers built on top of scikit-learn
   and optional Dask backends. These reducers follow the shared `BaseReducer`
   contract so they can be used directly with `DimReduction`, reporting, and
   visualization utilities.

   Classes
   -------
   PCAReducer
       Principal Component Analysis wrapper based on `sklearn.decomposition.PCA`.
   IncrementalPCAReducer
       Incremental PCA wrapper for batch-wise fitting on larger datasets.
   DaskPCAReducer
       Optional Dask-ML PCA wrapper for lazy or distributed arrays.
   DaskTruncatedSVDReducer
       Optional Dask-ML Truncated SVD wrapper for lazy or distributed arrays.

   .. rubric:: References

   .. [1] Pearson, K. (1901). "On Lines and Planes of Closest Fit to Systems of
          Points in Space". Philosophical Magazine, 2(11), 559-572.
   .. [2] Hotelling, H. (1933). "Analysis of a complex of statistical variables
          into principal components". Journal of Educational Psychology, 24(6),
          417-441.
   .. [3] Scikit-learn PCA documentation:
          https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

   Author: Hamza Abdelhedi (hamza.abdelhedi@umontreal.ca)


Classes
-------

.. autoapisummary::

   coco_pipe.dim_reduction.reducers.linear.PCAReducer
   coco_pipe.dim_reduction.reducers.linear.IncrementalPCAReducer
   coco_pipe.dim_reduction.reducers.linear.DaskPCAReducer
   coco_pipe.dim_reduction.reducers.linear.DaskTruncatedSVDReducer


Module Contents
---------------

.. py:class:: PCAReducer(n_components: int = 2, **kwargs)

   Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer`


   Principal Component Analysis reducer.

   This reducer wraps `sklearn.decomposition.PCA` and provides a linear
   low-dimensional embedding based on singular value decomposition.

   :param n_components: Number of principal components to keep.
   :type n_components: int, default=2
   :param \*\*kwargs: Additional keyword arguments forwarded to
                      `sklearn.decomposition.PCA` after signature filtering. Common options
                      include `whiten`, `svd_solver`, and `random_state`.
   :type \*\*kwargs: dict

   .. attribute:: model

      Fitted PCA estimator after `fit`.

      :type: sklearn.decomposition.PCA or None

   .. rubric:: Notes

   This is a deterministic linear reducer unless a randomized solver is used.

   .. seealso::

      :obj:`IncrementalPCAReducer`
          Linear PCA variant for batch-wise fitting.

      :obj:`DaskPCAReducer`
          Linear PCA variant for lazy or distributed arrays.

      :obj:`DaskTruncatedSVDReducer`
          Linear factorization alternative for lazy arrays.

      :obj:`IsomapReducer`
          Nonlinear manifold learner based on geodesic distances.

      :obj:`TSNEReducer`
          Nonlinear neighborhood-preserving embedding.

      :obj:`UMAPReducer`
          Nonlinear graph-based embedding balancing local and global structure.

      :obj:`PHATEReducer`
          Nonlinear diffusion-based embedding for smooth trajectories.

   .. rubric:: Examples

   >>> import numpy as np
   >>> from coco_pipe.dim_reduction import PCAReducer
   >>> X = np.random.rand(100, 10)
   >>> reducer = PCAReducer(n_components=2, random_state=42)
   >>> _ = reducer.fit(X)
   >>> X_reduced = reducer.transform(X)
   >>> X_reduced.shape
   (100, 2)
   >>> reducer.explained_variance_ratio_.shape
   (2,)
   >>> reducer.components_.shape
   (2, 10)
   >>> reducer = PCAReducer(n_components=3, whiten=True)
   >>> reducer.fit_transform(X).shape
   (100, 3)


   .. py:property:: capabilities
      :type: dict


      Return capability metadata for PCA.

      :returns: Capability mapping describing PCA as a linear component-based
                reducer.
      :rtype: dict


   .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> PCAReducer

      Fit PCA on the input data.

      :param X: Training data.
      :type X: ArrayLike of shape (n_samples, n_features)
      :param y: Ignored. Present for API compatibility.
      :type y: ArrayLike, optional

      :returns: Fitted reducer instance.
      :rtype: PCAReducer

      .. rubric:: Examples

      >>> import numpy as np
      >>> from coco_pipe.dim_reduction import PCAReducer
      >>> X = np.random.rand(20, 5)
      >>> reducer = PCAReducer(n_components=2)
      >>> _ = reducer.fit(X)
      >>> reducer.model is not None
      True


   .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> numpy.ndarray

      Project data onto the fitted principal component basis.

      :param X: Data to project.
      :type X: ArrayLike of shape (n_samples, n_features)

      :returns: Projected coordinates in principal component space.
      :rtype: np.ndarray of shape (n_samples, n_components)

      :raises RuntimeError: If the reducer has not been fitted.


   .. py:property:: explained_variance_ratio_
      :type: numpy.ndarray


      Percentage of variance explained by each selected component.

      :returns: Explained variance ratio for each retained component.
      :rtype: np.ndarray of shape (n_components,)

      :raises RuntimeError: If the reducer has not been fitted.


   .. py:property:: components_
      :type: numpy.ndarray


      Principal axes in feature space.

      :returns: Principal component loading matrix.
      :rtype: np.ndarray of shape (n_components, n_features)

      :raises RuntimeError: If the reducer has not been fitted.


   .. py:method:: get_components() -> numpy.ndarray

      Return the principal component loading matrix.

      :returns: Principal component loading matrix.
      :rtype: np.ndarray

      :raises RuntimeError: If the reducer has not been fitted.


.. py:class:: IncrementalPCAReducer(n_components: int = 2, batch_size: Optional[int] = None, **kwargs)

   Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer`


   Incremental PCA reducer.

   This reducer wraps `sklearn.decomposition.IncrementalPCA` for batch-wise
   fitting when the full dataset is too large to process in one pass.

   :param n_components: Number of principal components to keep.
   :type n_components: int, default=2
   :param batch_size: Number of samples processed per batch.
   :type batch_size: int, optional
   :param \*\*kwargs: Additional keyword arguments forwarded to `IncrementalPCA` after
                      signature filtering.
   :type \*\*kwargs: dict

   .. attribute:: batch_size

      Batch size used when fitting the incremental estimator.

      :type: int or None

   .. attribute:: model

      Fitted IncrementalPCA estimator after `fit` or `partial_fit`.

      :type: sklearn.decomposition.IncrementalPCA or None

   .. seealso::

      :obj:`PCAReducer`
          Standard in-memory linear PCA reducer.

      :obj:`DaskPCAReducer`
          Linear PCA variant for lazy or distributed arrays.

      :obj:`DaskTruncatedSVDReducer`
          Linear factorization alternative for lazy arrays.

      :obj:`IsomapReducer`
          Nonlinear manifold learner based on geodesic distances.

      :obj:`TSNEReducer`
          Nonlinear neighborhood-preserving embedding.

      :obj:`UMAPReducer`
          Nonlinear graph-based embedding balancing local and global structure.

   .. rubric:: Examples

   >>> import numpy as np
   >>> from coco_pipe.dim_reduction import IncrementalPCAReducer
   >>> X = np.random.rand(100, 12)
   >>> reducer = IncrementalPCAReducer(n_components=3, batch_size=25)
   >>> _ = reducer.fit(X)
   >>> reducer.transform(X[:10]).shape
   (10, 3)
   >>> stream = IncrementalPCAReducer(n_components=2, batch_size=20)
   >>> _ = stream.partial_fit(X[:50])
   >>> _ = stream.partial_fit(X[50:])
   >>> stream.transform(X).shape
   (100, 2)


   .. py:property:: capabilities
      :type: dict


      Return capability metadata for Incremental PCA.

      :returns: Capability mapping describing Incremental PCA as a linear
                component-based reducer.
      :rtype: dict


   .. py:attribute:: batch_size
      :value: None


   .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> IncrementalPCAReducer

      Fit Incremental PCA in batch mode.

      :param X: Training data.
      :type X: ArrayLike of shape (n_samples, n_features)
      :param y: Ignored. Present for API compatibility.
      :type y: ArrayLike, optional

      :returns: Fitted reducer instance.
      :rtype: IncrementalPCAReducer

      .. rubric:: Examples

      >>> import numpy as np
      >>> from coco_pipe.dim_reduction import IncrementalPCAReducer
      >>> X = np.random.rand(30, 6)
      >>> reducer = IncrementalPCAReducer(n_components=2, batch_size=10)
      >>> _ = reducer.fit(X)
      >>> reducer.model is not None
      True


   .. py:method:: partial_fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> IncrementalPCAReducer

      Incrementally fit the estimator on a batch of samples.

      :param X: Batch of training samples.
      :type X: ArrayLike of shape (n_samples, n_features)
      :param y: Ignored. Present for API compatibility.
      :type y: ArrayLike, optional

      :returns: Reducer instance after updating the incremental estimator.
      :rtype: IncrementalPCAReducer

      .. rubric:: Examples

      >>> import numpy as np
      >>> from coco_pipe.dim_reduction import IncrementalPCAReducer
      >>> X = np.random.rand(40, 6)
      >>> reducer = IncrementalPCAReducer(n_components=2, batch_size=20)
      >>> _ = reducer.partial_fit(X[:20])
      >>> _ = reducer.partial_fit(X[20:])
      >>> reducer.model is not None
      True


   .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> numpy.ndarray

      Project data onto the fitted incremental PCA basis.

      :param X: Data to project.
      :type X: ArrayLike of shape (n_samples, n_features)

      :returns: Projected coordinates in component space.
      :rtype: np.ndarray of shape (n_samples, n_components)

      :raises RuntimeError: If the reducer has not been fitted.


   .. py:method:: get_components() -> numpy.ndarray

      Return the incremental PCA component loading matrix.

      :returns: Principal component loading matrix.
      :rtype: np.ndarray

      :raises RuntimeError: If the reducer has not been fitted.


.. py:class:: DaskPCAReducer(n_components: int = 2, svd_solver: str = 'auto', **kwargs)

   Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer`


   Dask-ML PCA reducer for lazy or distributed data.

   This reducer wraps `dask_ml.decomposition.PCA`. The backend is imported
   lazily so the rest of the package remains importable without `dask-ml`.

   :param n_components: Number of principal components to keep.
   :type n_components: int, default=2
   :param svd_solver: Solver used by the Dask PCA backend.
   :type svd_solver: {"auto", "full", "tsqr", "randomized"}, default="auto"
   :param \*\*kwargs: Additional keyword arguments forwarded to `dask_ml.decomposition.PCA`
                      after signature filtering.
   :type \*\*kwargs: dict

   .. attribute:: svd_solver

      Solver used when instantiating the Dask PCA estimator.

      :type: str

   .. attribute:: model

      Fitted Dask PCA estimator after `fit`.

      :type: dask_ml.decomposition.PCA or None

   .. rubric:: Notes

   This reducer requires the optional `dask-ml` backend.

   .. seealso::

      :obj:`PCAReducer`
          Standard in-memory linear PCA reducer.

      :obj:`IncrementalPCAReducer`
          Linear PCA variant for batch-wise fitting.

      :obj:`DaskTruncatedSVDReducer`
          Linear SVD-based alternative for lazy arrays.

      :obj:`IsomapReducer`
          Nonlinear manifold learner based on geodesic distances.

      :obj:`TSNEReducer`
          Nonlinear neighborhood-preserving embedding.

      :obj:`UMAPReducer`
          Nonlinear graph-based embedding balancing local and global structure.

   .. rubric:: Examples

   >>> import dask.array as da
   >>> import numpy as np
   >>> from coco_pipe.dim_reduction import DaskPCAReducer
   >>> X = da.from_array(np.random.rand(100, 10), chunks=(25, 10))
   >>> reducer = DaskPCAReducer(n_components=2, svd_solver="tsqr")
   >>> _ = reducer.fit(X)
   >>> reducer.transform(X).shape
   (100, 2)


   .. py:property:: capabilities
      :type: dict


      Return capability metadata for Dask PCA.

      :returns: Capability mapping describing Dask PCA as a linear component-based
                reducer.
      :rtype: dict


   .. py:attribute:: svd_solver
      :value: 'auto'


   .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> DaskPCAReducer

      Fit Dask PCA on the input data.

      :param X: Training data, typically a Dask array or a compatible array-like
                object accepted by the Dask backend.
      :type X: ArrayLike
      :param y: Ignored. Present for API compatibility.
      :type y: ArrayLike, optional

      :returns: Fitted reducer instance.
      :rtype: DaskPCAReducer

      :raises ImportError: If `dask-ml` is not installed.
      :raises RuntimeError: If `dask-ml` is installed but fails during initialization.

      .. rubric:: Examples

      >>> import dask.array as da
      >>> import numpy as np
      >>> from coco_pipe.dim_reduction import DaskPCAReducer
      >>> X = da.from_array(np.random.rand(40, 8), chunks=(20, 8))
      >>> reducer = DaskPCAReducer(n_components=2)
      >>> _ = reducer.fit(X)
      >>> reducer.model is not None
      True


   .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> Any

      Project data using the fitted Dask PCA model.

      :param X: Data to project.
      :type X: ArrayLike

      :returns: Backend-specific transformed output, typically a Dask array.
      :rtype: Any

      :raises RuntimeError: If the reducer has not been fitted.


   .. py:method:: get_components() -> numpy.ndarray

      Return the Dask PCA component loading matrix.

      :returns: Principal component loading matrix or Dask-backed equivalent.
      :rtype: np.ndarray

      :raises RuntimeError: If the reducer has not been fitted.


.. py:class:: DaskTruncatedSVDReducer(n_components: int = 2, algorithm: str = 'tsqr', **kwargs)

   Bases: :py:obj:`coco_pipe.dim_reduction.reducers.base.BaseReducer`


   Dask-ML Truncated SVD reducer.

   This reducer wraps `dask_ml.decomposition.TruncatedSVD` and provides a
   linear projection for lazy or distributed arrays.

   :param n_components: Number of components to keep.
   :type n_components: int, default=2
   :param algorithm: SVD algorithm used by the Dask backend.
   :type algorithm: {"tsqr", "randomized"}, default="tsqr"
   :param \*\*kwargs: Additional keyword arguments forwarded to
                      `dask_ml.decomposition.TruncatedSVD` after signature filtering.
   :type \*\*kwargs: dict

   .. attribute:: algorithm

      SVD algorithm used when instantiating the backend estimator.

      :type: str

   .. attribute:: model

      Fitted TruncatedSVD estimator after `fit`.

      :type: dask_ml.decomposition.TruncatedSVD or None

   .. rubric:: Notes

   This reducer requires the optional `dask-ml` backend.

   .. seealso::

      :obj:`PCAReducer`
          Standard in-memory linear PCA reducer.

      :obj:`IncrementalPCAReducer`
          Linear PCA variant for batch-wise fitting.

      :obj:`DaskPCAReducer`
          Linear PCA variant for lazy or distributed arrays.

      :obj:`IsomapReducer`
          Nonlinear manifold learner based on geodesic distances.

      :obj:`TSNEReducer`
          Nonlinear neighborhood-preserving embedding.

      :obj:`UMAPReducer`
          Nonlinear graph-based embedding balancing local and global structure.

   .. rubric:: Examples

   >>> import dask.array as da
   >>> import numpy as np
   >>> from coco_pipe.dim_reduction import DaskTruncatedSVDReducer
   >>> X = da.from_array(np.random.rand(120, 15), chunks=(30, 15))
   >>> reducer = DaskTruncatedSVDReducer(n_components=3, algorithm="randomized")
   >>> _ = reducer.fit(X)
   >>> reducer.transform(X).shape
   (120, 3)


   .. py:property:: capabilities
      :type: dict


      Return capability metadata for Dask Truncated SVD.

      :returns: Capability mapping describing Dask Truncated SVD as a linear
                component-based reducer.
      :rtype: dict


   .. py:attribute:: algorithm
      :value: 'tsqr'


   .. py:method:: fit(X: coco_pipe.dim_reduction.reducers.base.ArrayLike, y: Optional[coco_pipe.dim_reduction.reducers.base.ArrayLike] = None) -> DaskTruncatedSVDReducer

      Fit Dask Truncated SVD on the input data.

      :param X: Training data, typically a Dask array or compatible array-like
                object accepted by the backend.
      :type X: ArrayLike
      :param y: Ignored. Present for API compatibility.
      :type y: ArrayLike, optional

      :returns: Fitted reducer instance.
      :rtype: DaskTruncatedSVDReducer

      :raises ImportError: If `dask-ml` is not installed.
      :raises RuntimeError: If `dask-ml` is installed but fails during initialization.

      .. rubric:: Examples

      >>> import dask.array as da
      >>> import numpy as np
      >>> from coco_pipe.dim_reduction import DaskTruncatedSVDReducer
      >>> X = da.from_array(np.random.rand(40, 8), chunks=(20, 8))
      >>> reducer = DaskTruncatedSVDReducer(n_components=2)
      >>> _ = reducer.fit(X)
      >>> reducer.model is not None
      True


   .. py:method:: transform(X: coco_pipe.dim_reduction.reducers.base.ArrayLike) -> Any

      Project data using the fitted Dask Truncated SVD model.

      :param X: Data to project.
      :type X: ArrayLike

      :returns: Backend-specific transformed output, typically a Dask array.
      :rtype: Any

      :raises RuntimeError: If the reducer has not been fitted.


   .. py:method:: get_components() -> numpy.ndarray

      Return the Truncated SVD component loading matrix.

      :returns: Component loading matrix or Dask-backed equivalent.
      :rtype: np.ndarray

      :raises RuntimeError: If the reducer has not been fitted.