Beyond sensor data: Foundation models of behavioral data from wearables

2025-08-2114:3923054arxiv.org

Wearable devices record physiological and behavioral signals that can improve health predictions. While foundation models are increasingly used for such predictions, they have been primarily applied…

Show article

View PDF HTML (experimental)

Abstract:Wearable devices record physiological and behavioral signals that can improve health predictions. While foundation models are increasingly used for such predictions, they have been primarily applied to low-level sensor data, despite behavioral data often being more informative due to their alignment with physiologically relevant timescales and quantities. We develop foundation models of such behavioral signals using over 2.5B hours of wearable data from 162K individuals, systematically optimizing architectures and tokenization strategies for this unique dataset. Evaluated on 57 health-related tasks, our model shows strong performance across diverse real-world applications including individual-level classification and time-varying health state prediction. The model excels in behavior-driven tasks like sleep prediction, and improves further when combined with representations of raw sensor data. These results underscore the importance of tailoring foundation model design to wearables and demonstrate the potential to enable new health applications.

From: Joseph Futoma [view email]
[v1] Mon, 30 Jun 2025 19:01:00 UTC (2,155 KB)

Read the original article

Comments

By brandonb 2025-08-2115:084 reply

I worked on one of the first wearable foundation models in 2018. The innovation of this 2025 paper from Apple is moving up to a higher level of abstraction: instead of training on raw sensor data (PPG, accelerometer), it trains on a timeseries of behavioral biomarkers derived from that data (e.g., HRV, resting heart rate, and so on.).

They find high accuracy in detecting many conditions: diabetes (83%), heart failure (90%), sleep apnea (85%), etc.

By teiferer 2025-08-226:191 reply

What is an "accuracy" of 83%? Do 83% of predicted diabetes cases actually have diabetes? Or did 83% of those who have diabetes get diagnosed as such? It's about precision vs. recall. You can improve one by sacrificing the other. Boiling it down to one number is hard.

By topaz0 2025-08-2211:581 reply

They use the area under the receiver operating curve, which is a pretty standard way to boil that down to one number.

By teiferer 2025-08-237:05

Ah, thanks for the pointer!

https://en.m.wikipedia.org/wiki/Receiver_operating_character...

So, 83% is actually not that great, given that you can achieve 50% by guessing randomly.

By crorella 2025-08-2118:322 reply

Insurance and health insurance companies must be super interested in this research and its applications.

By jeron 2025-08-2119:282 reply

I'm sure they're also interested in the data. Imagine raising premiums based on conditions they detect from your wearables. That's why it's of utmost importance to secure biometrics data

By brandonb 2025-08-2119:391 reply

At least in the US, health insurers can’t raise rates or deny coverage based on pre-existing conditions. That was a major part of the Affordable Care Act.

By abenga 2025-08-2120:171 reply

The ACA will not survive the next couple of years.

By daveguy 2025-08-2213:161 reply

That's what they said in 2016.

By abenga 2025-08-2216:331 reply

This time he has control over all the arms of government + the support of much of the private sector though. The last time there was at least some push-back.

By daveguy 2025-08-2415:40

They had the exact same thing last time too. Control over all arms + support of much of the private sector. We told them to go fuck themselves then and we will again.

By apwell23 2025-08-2219:52

how would that work. i pay flat rate through my employer.

By autoexec 2025-08-220:481 reply

There are so many companies across many industries who are salivating at the thought of everyone using wearables to monitor their "health" and getting their hands on that data. Including law enforcement, lawyers, and other government agencies.

By teiferer 2025-08-226:20

It's industry leaders that are salivating the most.

By throwaway314155 2025-08-2117:301 reply

Had the phrase "foundation model" become a term of art yet?

By brandonb 2025-08-2118:021 reply

By 2018, the concept was definitely in the air since you had GPT-1 (2018) and BERT (2018). You could argue even Word2Vec (2013) had the core concept of pre-training on an unsupervised or self-supervised objective leading to performance on a downstream semantic task. However, the phrase "foundation model" wasn't coined until 2021, to my knowledge.

By throwaway314155 2025-08-230:53

I guess I just find the whole "foundation model" phrasing to be designed in a way to pat the backs of the "winners" who would of course be those with the most money. I'm sure there are foundation models from groups that aren't e.g. OpenAI, but the origins felt egotistical and asserting that you made one prior to the phrase's inception only feels more-so.

Had you merely called it an early instance of pretraining, I'd be fine with it.

By puppymaster 2025-08-2115:422 reply

reminds me of Jim Simons of Renaissance advise when it comes to data science - sort first, then regress.

By dinobones 2025-08-2116:182 reply

Not sort in the literal way right?

https://stats.stackexchange.com/questions/185507/what-happen...

By clickety_clack 2025-08-2117:241 reply

The guy was sorting the X separately from y? That can’t be a real.

By falcor84 2025-08-2119:58

"Nothing is foolproof to a sufficiently talented fool"

By tomrod 2025-08-2118:01

Not every day you find pseudo permutation in the wild!

By LPisGood 2025-08-2117:392 reply

Is anyone else surprised by how poorly performing the results are for the vast majority of cases? The foundation model which had access to sensor data and behavioral biomarkers actually _underperformed_ the baseline predictor that just uses nonspecific demographic data in almost 10 areas.

In fact, even when the wearable foundation model was better, it was only marginally better.

I was expecting much more dramatic improvements with such rich data available.

By bumby 2025-08-221:56

I wonder how much of that is driven by poor performing behavioral models. There was a HN article from a few weeks back and it only had an accuracy of about 70% determining if someone was awake or asleep. I would guess that the secondary behavioral data used in this data (like cardiovascular fitness) are much harder to predict from raw sensor data than being awake or asleep.

By Herring 2025-08-2120:04

I worked with similar data in grad school. I'm not surprised. You can have a lot of data, but sometimes the signal (or signal quality) just isn't present in that haystack, and there's nothing you can do about it.

Sometimes you just have to use ultrasound or MRI or stick a camera in the body, because everything else might as well be reading tea leaves, and people generally demand very high accuracy when it comes to their health.

By rsanek 2025-08-2120:06

Cool way of integrating the two approaches. For those on mobile, I created an infographic that's a bit more accessible: https://studyvisuals.com/artificial-intelligence/beyond-sens...

Hacker News