Provider-Specific Optimization Improves Wesper Lab Automated Scoring Accuracy

Antonio Artur Moura, M.S., Chelsie Rohrscheib, Ph.D.

Last reviewed: May 15, 2026

⚠️ Pre-publish checklist: 1. Verify every quantitative claim against the peer-reviewed manuscript. 2. The mean-bias p-value (0.059) is not statistically significant at the conventional 0.05 threshold. The abstract below has been reframed accordingly. Confirm with research team. 3. Link “Click here to download the full article” to the gated PDF or journal DOI before publishing.

In brief

Wesper Lab is a Type III home sleep apnea test (HSAT) that uses an AI-driven automated scoring algorithm.

Out of the box, the algorithm shows strong agreement with provider scoring (Pearson r = 0.978, mean bias 1.83 events/hour).

Provider-specific optimization — training the algorithm on a sample of an individual provider’s prior scoring — further improves alignment with that provider’s scoring patterns (correlation +0.41 percentage points, p = 0.029; limits of agreement width −1.30 events/hour, p = 0.019).

The work suggests automated scoring can be tuned to provider preferences, reducing manual correction time and improving scoring workflow efficiency.

Click here to download the full article →

Why This Matters

Automated scoring is meant to give clinicians their time back. But when an algorithm doesn’t match how a provider scores, every study comes back to the queue for manual correction — and the time savings disappear. Provider-specific optimization is a step toward automated scoring that adapts to the clinician, not the other way around. For sleep medicine programs running at volume, that’s the difference between AI as a help and AI as homework.

Abstract

Automated scoring algorithms are widely used in home sleep apnea testing (HSAT) to improve efficiency and standardization in sleep medicine. However, automated scoring systems are typically designed to apply standardized AASM scoring guidelines and therefore may not fully capture subtle differences in scoring practices among individual providers. When automated outputs differ from a provider’s scoring preferences, additional manual corrections may be required during study review. Tailoring automated scoring configurations to individual providers may improve alignment with provider scoring patterns and streamline clinical workflows.

This study evaluated whether provider-specific optimization of the Wesper Lab Home Sleep Apnea Test (HSAT) automated scoring algorithm improves agreement with provider-scored apnea–hypopnea index (AHI). Wesper Lab sessions scored between January 2025 and February 2026 were analyzed. Providers were eligible if they had at least 50 studies with a manual respiratory event scoring rate of ≥50%. Eleven providers met inclusion criteria, contributing between 79 and 503 studies each. For each provider, 30 studies were randomly selected as a training pool for algorithm optimization, and the remaining studies were reserved for evaluation.

Baseline agreement between automated and provider scoring was high, as the default configuration demonstrated strong correlation with provider-scored AHI (Pearson r = 0.978), a mean bias of 1.83 events/hour, and Bland–Altman limits of agreement spread of 16.37 events/hour. Provider-specific optimization further improved agreement. Mean Pearson correlation increased by +0.41 percentage points (t = 2.14, p = 0.029). Mean bias showed a trend toward improvement (−0.80 events/hour; t = −1.71, p = 0.059) that did not reach conventional significance. Limits of agreement width decreased by −1.30 events/hour (t = −2.38, p = 0.019).

These findings demonstrate that the Wesper Lab automated scoring algorithms already achieve strong agreement with provider scoring, and that provider-specific optimization can further improve alignment with individual scoring patterns. Such optimization may reduce the need for manual corrections and improve the efficiency of clinical scoring workflows.

Introduction

Artificial intelligence–enabled automated scoring algorithms are increasingly integrated into home sleep apnea testing (HSAT) systems to improve the efficiency and scalability of sleep diagnostics [1, 2]. These algorithms automatically detect respiratory events and compute clinically relevant indices such as the apnea–hypopnea index (AHI), substantially reducing the time required for manual scoring of sleep studies [3].

The Wesper Lab is a Type III HSAT that incorporates an artificial intelligence–driven automated scoring algorithm to detect obstructive apneas, hypopneas, and central apneas in accordance with the respiratory event definitions outlined in the American Academy of Sleep Medicine (AASM) scoring manual [1, 4, 5]. The system records thoracic and abdominal respiratory effort signals and derives airflow using a summed respiratory signal analogous to respiratory inductance plethysmography, allowing the algorithm to distinguish obstructive events from central events based on the presence or absence of respiratory effort.

The diagnostic performance of the Wesper Lab automated scoring algorithm has previously been validated against both in-laboratory polysomnography (PSG) and expert rescoring of raw HSAT signals. In those validation analyses, the algorithm demonstrated strong agreement with expert scoring, indicating that the algorithm performs at a high level relative to human scorers [1]. In routine clinical workflows, however, automated scoring outputs are typically reviewed and, when necessary, adjusted by providers prior to final interpretation in accordance with AASM scoring practices. Because individual providers may interpret borderline respiratory events or artifact signals slightly differently, automated scoring outputs may require manual adjustments to align with a provider’s preferred scoring approach.

Rather than attempting to standardize provider scoring behavior, an alternative approach is to optimize automated scoring configurations to align with the scoring patterns of individual providers. By learning from a subset of previously scored studies, the automated algorithm may be tuned to better reflect how a specific provider scores respiratory events. Such alignment has the potential to reduce the number of manual scoring edits required during study review and improve overall scoring workflow efficiency.

The objective of this study was therefore to evaluate whether provider-specific optimization of the Wesper Lab HSAT automated scoring algorithm improves agreement with provider-scored AHI and further aligns automated outputs with provider scoring patterns.

Frequently Asked Questions

What is automated scoring in home sleep apnea testing?

Automated scoring uses an algorithm to detect respiratory events (apneas, hypopneas, central events) and compute clinical indices such as the apnea–hypopnea index (AHI) without a human scorer reviewing every signal manually. Most modern home sleep apnea testing (HSAT) systems use automated scoring to improve efficiency and standardization.

How accurate is the Wesper Lab automated scoring algorithm?

In this study, the default Wesper Lab automated scoring configuration showed a Pearson correlation of 0.978 with provider-scored AHI, a mean bias of 1.83 events/hour, and Bland–Altman limits of agreement spread of 16.37 events/hour. The algorithm has also been previously validated against in-lab polysomnography (PSG) and expert manual rescoring.

What is provider-specific optimization?

Provider-specific optimization adapts the automated scoring algorithm to the scoring patterns of an individual provider, using a training sample of that provider’s previously scored studies. This allows the algorithm to better reflect how a specific provider interprets borderline respiratory events or artifact signals.

Does provider-specific optimization actually improve scoring accuracy?

In this study of eleven providers, provider-specific optimization further improved agreement between automated and provider-scored AHI. Mean Pearson correlation increased by 0.41 percentage points (p = 0.029), and Bland–Altman limits of agreement width decreased by 1.30 events/hour (p = 0.019). Mean bias showed a trend toward improvement that did not reach conventional significance.

What does this mean for clinical workflow?

Better automated-to-provider agreement means fewer manual scoring corrections are needed during review. For high-volume sleep medicine programs, this can meaningfully reduce per-study review time and standardize the experience providers have when interpreting Wesper Lab studies.

Want to see Wesper Lab in your workflow? Talk to our team →

References

Full reference list is included with the peer-reviewed manuscript. Key methods reference: Berry RB, Quan SF, Abreu AR, et al. The AASM manual for the scoring of sleep and associated events: rules, terminology and technical specifications. Version 3. Darien (IL): American Academy of Sleep Medicine; 2023. Additional references available in the full paper download.

Back to Articles and Case Studies