A Greater Manchester NHS trust has suspended its artificial intelligence radiology triage system after an internal audit revealed the tool was systematically deprioritising urgent scans flagged outside daytime hours — including suspected stroke cases — during the very periods when radiology staffing was at its thinnest. The discovery, confirmed to TheAIPulse this week, has reignited concerns about whether clinical AI is being validated against the messy realities of NHS shift patterns, or merely against the clean, well-curated datasets used in vendor demonstrations.
While the trust has declined to name the supplier, citing a contractual review, it confirmed that the tool had been live across its acute imaging pathway for several months before the pattern was identified. No patient harm has so far been formally attributed to the issue, but a clinical safety investigation is ongoing.
What the audit found
The triage system was designed to rank incoming CT and MRI scans by clinical urgency, helping overstretched radiologists work through their lists in priority order. In theory, a suspected acute stroke should sit at the very top of any queue, given that treatment windows for thrombolysis and thrombectomy are measured in minutes.
Instead, the trust’s audit found that scans submitted between roughly 10pm and 7am were consistently assigned lower urgency scores than comparable scans submitted during the day. In several reviewed cases, suspected stroke scans flagged overnight were ranked below routine daytime studies — a reversal of the intended logic.
The effect was subtle enough to evade casual notice. Reporting radiologists, often working solo on night cover, tended to trust the queue order they were presented with. It was only when an audit cross-referenced timestamps against final diagnoses that the temporal bias became visible.
“This wasn’t a model that was wrong in an obvious, headline way,” said Dr Priya Nandakumar, a clinical AI safety researcher at the fictional Northern Institute for Health Informatics. “It was a model that was quietly wrong at exactly the wrong time. Overnight is when you have the fewest eyes on the system, so a bias that pushes urgent cases down the list is most dangerous precisely when it’s least likely to be caught.”
How a time-of-day bias creeps in
Experts say the most plausible explanation lies in the training data. Triage models are frequently trained on historical scan records, where urgency labels are inferred from how quickly a scan was reported and acted upon. But report turnaround times are themselves shaped by staffing — daytime scans get read faster simply because more radiologists are on shift.
If a model learns that fast-reported scans are ‘more urgent’, it can absorb the hospital’s own operational constraints as if they were clinical signal. The result is a feedback loop: the AI encodes the night-time slowdown as lower clinical priority, then perpetuates it.
- Confounded labels: urgency proxies based on response time reflect staffing, not just clinical need.
- Temporal sampling gaps: fewer overnight scans in training data mean weaker performance in that window.
- Validation blind spots: test sets balanced for pathology but not for time-of-day mask the problem entirely.
“Vendors love to show you an AUC on a balanced test set,” said Tom Hargreaves, a health technology analyst at the fictional consultancy Meridian Clinical Advisory. “But a balanced test set is a fantasy. Real hospitals have rhythms — Friday nights, bank holidays, the 3am lull. If you don’t stress-test against those rhythms, you’re validating against a hospital that doesn’t exist.”
Questions for procurement and regulation
The case lands amid a broader push to deploy AI across NHS imaging to ease radiologist shortages. The Royal College of Radiologists has repeatedly warned of significant workforce gaps, and triage tools are often pitched as a way to make scarce expertise go further.
But the Greater Manchester incident underlines a gap between how such tools are sold and how they behave under operational stress. Clinical AI in the UK falls under medical device regulation overseen by the MHRA, yet critics argue that approval processes still lean heavily on retrospective performance metrics rather than prospective, time-aware monitoring.
“Post-market surveillance is the weak link,” Dr Nandakumar argued. “A device can pass its initial evaluation and then drift, or reveal a bias that only shows up once it’s embedded in a live workflow. Trusts need continuous, stratified auditing — by time, by site, by patient group — not a one-off sign-off.”
The trust said it had reverted to its previous manual triage process and would not reinstate the tool until independent validation against its own shift patterns was complete.
What this means
The suspension is a cautionary tale rather than a verdict against AI triage as a concept. The core lesson is that clinical AI cannot be validated in the abstract: a model is only as safe as the real-world conditions it is tested against, and NHS conditions vary dramatically by hour, day and site. For trusts rushing to plug workforce gaps with automation, the Greater Manchester case is a reminder to demand time-stratified evidence, build in continuous auditing, and treat ‘passed the test set’ as the beginning of safety assurance — not the end.
Photo by SHVETS production on Pexels