Somewhere in the last decade, healthcare AI settled on an architecture. The sensor collects data. The data goes to the cloud. The cloud runs inference. The decision comes back down. It is a reasonable default, it scales well, and it works reliably in controlled clinical environments.
The problem is that healthcare does not happen in controlled environments.
It happens in semi-rural homes where broadband drops twice a week. In sheltered housing where Wi-Fi is shared across thirty residents. At 3am when no carer is present and the nearest clinical contact is a phone line. The patients who stand to benefit most from continuous AI-powered monitoring are, as a demographic, the patients least likely to have stable connectivity. Cloud-first clinical AI was optimised for an infrastructure that many of those patients simply do not have. This is not a minor calibration issue. It is a structural assumption baked into most healthcare AI deployments, and the field has not seriously reckoned with it.
I have spent twenty years building infrastructure at scale: LinkedIn’s APAC data centres, ByteDance’s global edge estate, financial trading systems where latency is measured in microseconds. Across all of it, I have watched the same pattern play out in different industries at different times. An architecture designed for controlled conditions gets deployed into uncontrolled ones. It appears to work. It passes every test, every demo, every pilot review. Then it encounters the conditions it was not designed for, and it fails quietly. The failure is invisible because the system does not flag it. It just stops being useful.
Healthcare AI is at that inflection point now.
The connectivity gap nobody is measuring
Ofcom’s Connected Nations reports consistently document significant broadband reliability gaps in rural areas and among older households. Meanwhile, NHS England’s digital inclusion research shows that people aged 75 and over and those in lower socioeconomic groups are both least likely to have reliable home internet access and most likely to be managing multiple chronic conditions that would benefit from remote monitoring. The overlap between “patient who most needs continuous monitoring” and “patient least likely to have stable connectivity” is substantial and largely unacknowledged in how the industry designs its systems.
The consequence follows a predictable shape. A patient misses a dose at 11pm. The monitoring device cannot reach the cloud. Connectivity returns at 6am. The clinical system receives the event seven hours late and triggers an alert. By then, the intervention window has closed. The carer gets a notification about a problem that is now historical. The system registers no error. No alarm fired. No exception was raised. From the platform’s perspective, everything worked.
That is what a silent failure looks like in clinical AI.
Why cloud-first made sense, and why it stopped
Cloud-first was the right default for most AI applications a decade ago, and for many it still is. Cloud inference is cheaper to operate, easier to update, simpler to audit, and scales horizontally without hardware constraints. For consumer and enterprise AI, those advantages are usually decisive.
Clinical IoMT is different in three specific ways. The cost of a missed event is asymmetrically high. The target patient population is concentrated in exactly the demographic with lowest connectivity reliability. And the regulatory trajectory is moving firmly toward real-world performance accountability. The MHRA’s evolving guidance on Software as a Medical Device and the FDA’s parallel AI/ML device framework both point toward accountability for how systems perform across the actual deployment population, not just in controlled pilots. A system that works in a hospital trial and degrades silently in a home care setting is not going to satisfy that standard.
The field built cloud-first because cloud-first was the sensible default. It became a problem when it was applied without asking whether the default fit the use case.
What the correct architecture requires
When I designed the ML pipeline for Adhicine, the IoMT medication adherence platform I co-founded, the connectivity constraint forced a different set of decisions from the start. The system had to run inference on-device, operate without cloud contact, synchronise cleanly when connectivity returned, and do so without introducing duplicate or missing records in a patient’s clinical history. In a medication adherence context, a duplicate “dose taken” entry can conceal a genuine missed dose. The stakes made every architectural choice non-negotiable.
The platform runs three models in production, all on-device. An LSTM network trained on each individual patient’s adherence history handles timing adaptation, learning what “late” looks like for a specific patient relative to “missed” rather than applying a static schedule uniformly. A Random Forest classifier ingesting live IoMT telemetry produces missed-dose predictions up to four hours in advance. That horizon separates a preventive alert from a historical notification, which is the difference between a useful system and an expensive audit log. An Autoencoder handles anomaly detection, separating genuine missed doses from sensor noise.
That last component matters more than it might seem. The research on clinical alert fatigue is sobering: studies in acute care settings have documented false positive alarm rates exceeding 85% in some environments, with systematic desensitisation as the result. Clinicians stop responding to alerts that are usually meaningless. The same erosion happens in home monitoring systems, just more slowly and with less visibility. An IoMT platform that cannot separate signal from noise will be switched off, quietly, by the carers who gave up trusting it.
The harder engineering problem was not inference. It was data integrity. When a device operates autonomously offline and synchronises later, you need a protocol that makes duplicate records structurally impossible. This is what led to our filing of UK Patent GB2520176.5 for a dual-acknowledgement offline-tolerant synchronisation protocol: events are queued in non-volatile memory with checksum validation, and both device and backend must acknowledge receipt before any event is committed on either side. Connectivity interruption is a normal operating condition, not a failure mode. Building for that assumption changes the architecture from the ground up.
What the field should do next
Edge-first IoMT architecture is not a solution for one product category. It is what the field should adopt as a default for any clinical monitoring application where the target population includes elderly patients, people in lower-income settings, or anyone in a geography with unreliable broadband. The regulatory pressure is moving in this direction regardless. Real-world performance requirements are becoming the standard, and cloud-first architectures that fail silently under real-world conditions will not meet them.
The practical starting point is a set of questions that should be standard in any clinical IoMT architecture review. What is the expected connectivity reliability across the actual deployment population? What happens to inference, alerting, and data integrity when connectivity drops for six hours? How does the system behave across the hardware variants in real patient homes, not the reference hardware in the lab?
These are not difficult questions. The industry has simply not been asking them consistently, because the architecture that works for the easy cases also passes the easy tests. The patients it fails are not in the pilot cohort. They are the people the system was supposed to reach.
That is the blindspot. Fixing it starts with the architecture review, not the algorithm.




