coco_pipe.report.quality

Functions for performing automated quality checks on data containers. These checks are used by the Report engine to flag potential issues.

Classes

CheckResult

Result of a data quality check.

Functions

check_missingness(→ CheckResult)

Check for missing values (NaNs).

check_constant_columns(→ List[CheckResult])

Check for columns/features with zero variance.

check_outliers_zscore(→ Optional[CheckResult])

Check for extreme values (> sigma).

check_flatline(→ CheckResult)

Check if signal is effectively dead (flatline).

Module Contents

class coco_pipe.report.quality.CheckResult[source]

Result of a data quality check.

check_name

Name of the check (e.g., “Missing Values”).

Type:

str

status

“OK”, “WARN”, or “FAIL”.

Type:

str

message

Human-readable description of the issue.

Type:

str

severity

0 (Info) to 10 (Critical).

Type:

int

metric_name

Name of the metric evaluated (e.g., “missing_pct”).

Type:

str, optional

metric_value

Value of the metric.

Type:

float, optional

Examples

>>> res = CheckResult("Missingness", "FAIL", "Too many NaNs", 9)
>>> res.is_issue
True
check_name: str
status: str
message: str
severity: int
metric_name: str | None = None
metric_value: float | None = None
property is_issue: bool

Return True if status is WARN or FAIL.

coco_pipe.report.quality.check_missingness(df: pandas.DataFrame | numpy.ndarray, threshold_warn: float = 0.01, threshold_fail: float = 0.2) CheckResult[source]

Check for missing values (NaNs).

Parameters:
  • df (DataFrame or ndarray) – The data to check.

  • threshold_warn (float) – Ratio of NaNs to trigger a warning. Default 0.01 (1%).

  • threshold_fail (float) – Ratio of NaNs to trigger a failure. Default 0.20 (20%).

Returns:

Quality check result.

Return type:

CheckResult

Examples

>>> data = np.array([1, 2, np.nan, 4])
>>> check_missingness(data, threshold_warn=0.1)
CheckResult(check_name='Missingness', status='FAIL', ...)
coco_pipe.report.quality.check_constant_columns(df: pandas.DataFrame | numpy.ndarray) List[CheckResult][source]

Check for columns/features with zero variance.

Parameters:

df (DataFrame or ndarray) – The data to check.

Returns:

List of findings. Empty if no constant columns found.

Return type:

List[CheckResult]

Examples

>>> df = pd.DataFrame({'a': [1,1,1], 'b': [1,2,3]})
>>> check_constant_columns(df)
[CheckResult(check_name='Constant Features', ...)]
coco_pipe.report.quality.check_outliers_zscore(df: pandas.DataFrame | numpy.ndarray, sigma: float = 5.0) CheckResult | None[source]

Check for extreme values (> sigma). Uses a simple global Z-score approach.

Parameters:
  • df (DataFrame or ndarray) – Data to check.

  • sigma (float) – Z-score threshold. Default 5.0.

Returns:

CheckResult if outliers found, else None.

Return type:

Optional[CheckResult]

coco_pipe.report.quality.check_flatline(signal: numpy.ndarray, threshold: float = 1e-10) CheckResult[source]

Check if signal is effectively dead (flatline).

Parameters:
  • signal (ndarray) – 1D signal array or flattened data.

  • threshold (float) – Standard deviation threshold. Default 1e-10.

Returns:

Result indicating if signal is flat.

Return type:

CheckResult