Researchers from MIT and Empirical Health recently developed a foundation model to predict medical conditions by utilizing 3 million person-days of Apple Watch data.
The study, titled JETS: A Self-Supervised Joint Embedding Time Series Foundation Model for Behavioral Data in Healthcare, has been accepted to a workshop at NeurIPS. It adapts the Joint-Embedding Predictive Architecture (JEPA), an AI concept proposed by Yann LeCun, which teaches an AI to infer the meaning of missing data rather than reconstructing the data itself. This approach allows the model to predict what missing parts represent.
Meta previously released its I-JEPA model in 2023. At that time, Meta stated LeCun’s vision was to create “machines that can learn internal models of how the world works so that they can learn much more quickly, plan how to accomplish complex tasks, and readily adapt to unfamiliar situations.” LeCun has since left Meta to focus on world models, which he argues are critical for achieving Artificial General Intelligence (AGI).
The JETS model applies JEPA’s joint-embedding approach to irregular multivariate time-series data, such as long-term wearable data where measurements like heart rate, sleep, and activity may appear inconsistently or with large gaps. The study utilized a longitudinal dataset from 16,522 individuals, totaling approximately 3 million person-days. For each individual, the researchers recorded 63 distinct time series metrics at a daily or lower resolution. These metrics fall into five domains:
- Cardiovascular health: Metrics related to heart function.
- Respiratory health: Metrics pertaining to breathing.
- Sleep: Data on sleep patterns.
- Physical activity: Information on movement and exercise.
- General statistics: Broader health-related measurements.
Only 15% of the participants had labeled medical histories, which would have rendered 85% of the data unusable in traditional supervised learning methods. JETS, however, addressed this by first learning from the complete dataset through self-supervised pre-training and then fine-tuning on the labeled subset. The researchers converted each observation into a token by making triplets of data from day, value, and metric type. This token then underwent a masking process, encoding, and was fed through a predictor to anticipate the embedding of the missing patches.
The researchers evaluated JETS against other baseline models, including a Transformer-based version of JETS, using AUROC and AUPRC. These metrics assess how well an AI model discriminates between positive and negative cases. JETS achieved the following AUROC percentages for various conditions:
- High blood pressure: 86.8%
- Atrial flutter: 70.5%
- Chronic fatigue syndrome: 81%
- Sick sinus syndrome: 86.8%
AUROC and AUPRC indicate how effectively a model ranks or prioritizes likely cases, rather than providing a strict accuracy index. This study demonstrates a method for maximizing insights from incomplete or irregular data collected by consumer wearables, even when wear times are intermittent. Some health metrics were recorded as little as 0.4% of the time, while others appeared in 99% of daily readings.





