Hyfe CoughMonitor Suite (CMS) V3+ Evidence Dossier

Inter and intra-labeler agreement

The reliability of any cough detection system hinges on the extent to which the underlying biological phenomenon in question can be accurately/objectively detected and quantified in the first place. Accordingly, Hyfe has worked closely with external researchers to quantify intra- and inter-labeler agreement in the labeling of real-world cough sounds.

The results of these collaborations are below:

Chaccour, C., Sánchez-Olivieri, I., Siegel, S. et al. Validation and accuracy of the Hyfe cough monitoring system: a multicenter clinical study. Sci Rep 15, 880 (2025). https://doi.org/10.1038/s41598-025-85341-3
Sanchez-Olivieri I, Rudd M, Gabaldon-Figueira JC, et al. Performance evaluation of human cough annotators: optimal metrics and sex differences. BMJ Open Respir Res 2023;10:e001942. doi:10.1136/ bmjresp-2023-001942

In summary:

Intra-labeler (intrarater) agreement was very high – e.g., Pearson’s r ≈ 0.98. BMJ Open Respiratory Research
Inter-labeler (interrater) agreement was also high – e.g., Pearson’s r ≈ 0.96.
Both studies demonstrate strong intrarater and interrater agreement in cough-event annotation, supporting the reliability of the human ground truth in Hyfe’s datasets.
Sánchez-Olivieri et al. clearly show that unit of analysis (cough seconds vs cough counts) affects agreement, implying that the choice of unit of analysis is an important methodological factor.
Together, the findings give confidence that the labelling protocols used by Hyfe are reproducible and sufficiently consistent to support high-quality algorithm training and validation workflows.