coco_pipe.report.quality ======================== .. py:module:: coco_pipe.report.quality .. autoapi-nested-parse:: Data Quality Checks =================== Functions for performing automated quality checks on data containers. These checks are used by the Report engine to flag potential issues. Classes ------- .. autoapisummary:: coco_pipe.report.quality.CheckResult Functions --------- .. autoapisummary:: coco_pipe.report.quality.check_missingness coco_pipe.report.quality.check_constant_columns coco_pipe.report.quality.check_outliers_zscore coco_pipe.report.quality.check_flatline Module Contents --------------- .. py:class:: CheckResult Result of a data quality check. .. attribute:: check_name Name of the check (e.g., "Missing Values"). :type: str .. attribute:: status "OK", "WARN", or "FAIL". :type: str .. attribute:: message Human-readable description of the issue. :type: str .. attribute:: severity 0 (Info) to 10 (Critical). :type: int .. attribute:: metric_name Name of the metric evaluated (e.g., "missing_pct"). :type: str, optional .. attribute:: metric_value Value of the metric. :type: float, optional .. rubric:: Examples >>> res = CheckResult("Missingness", "FAIL", "Too many NaNs", 9) >>> res.is_issue True .. py:attribute:: check_name :type: str .. py:attribute:: status :type: str .. py:attribute:: message :type: str .. py:attribute:: severity :type: int .. py:attribute:: metric_name :type: Optional[str] :value: None .. py:attribute:: metric_value :type: Optional[float] :value: None .. py:property:: is_issue :type: bool Return True if status is WARN or FAIL. .. py:function:: check_missingness(df: Union[pandas.DataFrame, numpy.ndarray], threshold_warn: float = 0.01, threshold_fail: float = 0.2) -> CheckResult Check for missing values (NaNs). :param df: The data to check. :type df: DataFrame or ndarray :param threshold_warn: Ratio of NaNs to trigger a warning. Default 0.01 (1%). :type threshold_warn: float :param threshold_fail: Ratio of NaNs to trigger a failure. Default 0.20 (20%). :type threshold_fail: float :returns: Quality check result. :rtype: CheckResult .. rubric:: Examples >>> data = np.array([1, 2, np.nan, 4]) >>> check_missingness(data, threshold_warn=0.1) CheckResult(check_name='Missingness', status='FAIL', ...) .. py:function:: check_constant_columns(df: Union[pandas.DataFrame, numpy.ndarray]) -> List[CheckResult] Check for columns/features with zero variance. :param df: The data to check. :type df: DataFrame or ndarray :returns: List of findings. Empty if no constant columns found. :rtype: List[CheckResult] .. rubric:: Examples >>> df = pd.DataFrame({'a': [1,1,1], 'b': [1,2,3]}) >>> check_constant_columns(df) [CheckResult(check_name='Constant Features', ...)] .. py:function:: check_outliers_zscore(df: Union[pandas.DataFrame, numpy.ndarray], sigma: float = 5.0) -> Optional[CheckResult] Check for extreme values (> sigma). Uses a simple global Z-score approach. :param df: Data to check. :type df: DataFrame or ndarray :param sigma: Z-score threshold. Default 5.0. :type sigma: float :returns: CheckResult if outliers found, else None. :rtype: Optional[CheckResult] .. py:function:: check_flatline(signal: numpy.ndarray, threshold: float = 1e-10) -> CheckResult Check if signal is effectively dead (flatline). :param signal: 1D signal array or flattened data. :type signal: ndarray :param threshold: Standard deviation threshold. Default 1e-10. :type threshold: float :returns: Result indicating if signal is flat. :rtype: CheckResult