Hyfe CoughMonitor Suite (CMS) V3+ Evidence Dossier

Algorithm Development: Principles and Practice

This document describes the principles, processes, and best practices used in the development of the Hyfe cough detection algorithm.

Overview of the Hyfe Cough Detection Algorithm

The Hyfe cough detection algorithm is an on-device machine learning system that runs on a wrist-worn smartwatch. Its purpose is to detect cough events from a stream of real-time, continuous audio, and generate corresponding timestamps. The algorithm is optimized for:

The system continuously acquires audio from an onboard microphone, and then processes that audio through a two-step algorithm (first, a lightweight feature-extraction pipeline for the identification of onset/explosive events; second, an ML classifier which categorizes events as cough or non-cough) and, when a cough signature is detected, stores a timestamp locally.

Principles Guiding Algorithm Development

Though not a medical device, the algorithm development process followed well-established machine learning and medical device software practices:

Dataset Composition and Data Provenance

Algorithm development requires a large collection of raw acoustic data representing both coughs and non-cough events and contexts across a variety of environmental and physiological contexts. The training/development dataset consists of continuous audio recordings collected from individuals wearing/using an audio capture device (the ID206 device itself; phones; third-party wrist-worn audio recorders) in both real-world and controlled settings. Data sources included:

Strict controls governed data provenance:

High-quality labels are critical, as the algorithm’s performance depends directly on the accuracy and consistency of ground truth. Accordingly, the generation of labels is governed by a rigorous multi-step process:

Train/test split

The dataset used for algorithm training was partitioned using best practices that prevent overfitting and ensure generalization:

The classifier training dataset consisted of over 500,000 snippets derived from the labeling process. Ambiguous or low-confidence labels (ie, those labeled as “far” or “not sure”) were excluded entirely from training, reducing the risk of label noise affecting model performance. In testing, they were included, and accuracy statistics were computed for both their inclusion and exclusion.

The “holdout” dataset consists of data with the following characteristics and provenance:

Model architecture

The model architecture was optimized for:

Hyperparameters were tuned through systematic experimentation, always referencing the validation set—not the test set.

The algorithm was built specifically for deployment on a constrained wearable device. This meant that the strict conditions had to be met in regards to the following areas:

Privacy by design

The cough detection algorithm was developed using a strict privacy-by-design approach in which the device never stores, transmits, or makes accessible any raw audio. All sound captured by the microphone is processed immediately through a lightweight feature-extraction pipeline, and only the resulting cough/non-cough decision and timestamp are retained. No audio files, waveforms, or spectral representations are written to persistent storage, and no audio leaves the device at any stage. This eliminates the possibility of reconstructing speech, background conversations, or other sensitive sounds, protecting users from inadvertent capture of personally identifiable information. By architecting the system so that the algorithm’s inputs exist only ephemerally and the outputs contain no acoustic content, Hyfe ensures that continuous monitoring can occur safely in intimate, everyday environments without compromising user privacy.

In the algorithm’s output:

Noise Handling and Robustness Measures

Real-world audio contains substantial variability. To address this, the algorithm incorporates:

Training data was intentionally curated to include both challenging and typical environments, enabling stronger generalization.

Adherence to industry best practices

The development of the Hyfe cough detection algorithm followed widely recognized industry best practices for machine learning systems specifically, and software development more generally. These practices emphasize reproducibility, transparency of development processes, data governance, and robustness in real-world deployment. Hyfe adhered to these principles in the following ways:

  1. All data used for algorithm development was collected, labeled, versioned,and stored under controlled procedures, ensuring traceability and full auditability of the training corpus.
  2. The algorithm was developed using strict separation of training, validation, and hold-out test datasets, preventing data leakage and guaranteeing unbiased model evaluation.
  3. Hyfe followed best practices for documentation of model architectures, preprocessing steps, feature definitions, and iterative retraining, enabling reproducibility and controlled lifecycle management.
  4. The dataset was constructed to reflect the broad variability in user demographics, environments, and cough types expected in the real world, aligning with best practices for generalizability.
  5. The model was designed and tested according to the principles of reliable embedded ML deployment: resource-aware design, explainable inference pathways, and deterministic behavior under constrained hardware conditions.
  6. Hyfe Inc. privacy-by-design principles, and risk-based development considerations to ensure the algorithm behaves safely under environmental unpredictability, individual diversity, and general background noise.

Collectively, these elements demonstrate alignment with modern development frameworks such as human-centered design principles, edge-AI safety guidelines, and emerging regulatory expectations for trustworthy and transparent machine-learning systems.