coco_pipe.report.quality¶
Functions for performing automated quality checks on data containers. These checks are used by the Report engine to flag potential issues.
Classes¶
Result of a data quality check. |
Functions¶
|
Check for missing values (NaNs). |
|
Check for columns/features with zero variance. |
|
Check for extreme values (> sigma). |
|
Check if signal is effectively dead (flatline). |
Module Contents¶
- class coco_pipe.report.quality.CheckResult[source]¶
Result of a data quality check.
- check_name¶
Name of the check (e.g., “Missing Values”).
- Type:
str
- status¶
“OK”, “WARN”, or “FAIL”.
- Type:
str
- message¶
Human-readable description of the issue.
- Type:
str
- severity¶
0 (Info) to 10 (Critical).
- Type:
int
- metric_name¶
Name of the metric evaluated (e.g., “missing_pct”).
- Type:
str, optional
- metric_value¶
Value of the metric.
- Type:
float, optional
Examples
>>> res = CheckResult("Missingness", "FAIL", "Too many NaNs", 9) >>> res.is_issue True
- check_name: str¶
- status: str¶
- message: str¶
- severity: int¶
- metric_name: str | None = None¶
- metric_value: float | None = None¶
- property is_issue: bool¶
Return True if status is WARN or FAIL.
- coco_pipe.report.quality.check_missingness(df: pandas.DataFrame | numpy.ndarray, threshold_warn: float = 0.01, threshold_fail: float = 0.2) CheckResult[source]¶
Check for missing values (NaNs).
- Parameters:
df (DataFrame or ndarray) – The data to check.
threshold_warn (float) – Ratio of NaNs to trigger a warning. Default 0.01 (1%).
threshold_fail (float) – Ratio of NaNs to trigger a failure. Default 0.20 (20%).
- Returns:
Quality check result.
- Return type:
Examples
>>> data = np.array([1, 2, np.nan, 4]) >>> check_missingness(data, threshold_warn=0.1) CheckResult(check_name='Missingness', status='FAIL', ...)
- coco_pipe.report.quality.check_constant_columns(df: pandas.DataFrame | numpy.ndarray) List[CheckResult][source]¶
Check for columns/features with zero variance.
- Parameters:
df (DataFrame or ndarray) – The data to check.
- Returns:
List of findings. Empty if no constant columns found.
- Return type:
List[CheckResult]
Examples
>>> df = pd.DataFrame({'a': [1,1,1], 'b': [1,2,3]}) >>> check_constant_columns(df) [CheckResult(check_name='Constant Features', ...)]
- coco_pipe.report.quality.check_outliers_zscore(df: pandas.DataFrame | numpy.ndarray, sigma: float = 5.0) CheckResult | None[source]¶
Check for extreme values (> sigma). Uses a simple global Z-score approach.
- Parameters:
df (DataFrame or ndarray) – Data to check.
sigma (float) – Z-score threshold. Default 5.0.
- Returns:
CheckResult if outliers found, else None.
- Return type:
Optional[CheckResult]
- coco_pipe.report.quality.check_flatline(signal: numpy.ndarray, threshold: float = 1e-10) CheckResult[source]¶
Check if signal is effectively dead (flatline).
- Parameters:
signal (ndarray) – 1D signal array or flattened data.
threshold (float) – Standard deviation threshold. Default 1e-10.
- Returns:
Result indicating if signal is flat.
- Return type: