Apple Watch 'black box' algorithms unreliable for medical research [u]

article thumbnail

Apple's use of algorithms to analyze data may be an issue for medical research, after a Harvard professor discovered inconsistencies in data from one Apple Watch accessed at different times.

One of the benefits of mobile devices and wearable devices like the Apple Watch is that improvements can be made in software. In medical research, this may not necessarily be a good thing, and has prompted one study to rethink its methodology.

According to JP Onnela, an associate professor of biostatistics at the Harvard T.H. Chan School of Public Health, these changes can produce inconsistencies in data collection. This can even be the case for analyzing the same data, but at different moments in time.

While Onnela typically prefers using research-grade devices for data collection for studies, The Verge reports a collaboration with the department of neurosurgery at Brigham and Women's Hospital prompted an examination of consumer hardware. Specifically, the study's team wanted to check how different the results from commercial products like the Apple Watch could be in terms of accuracy.

Two sets of the same daily heart rate variability data collected from one Apple Watch were collected, covering the same period from December 2018 until September 2020. While the sets were collected on September 5, 2020, and April 15, 2021, the data should have been identical given they dealt with identical timeframes, but differences were discovered.

It is thought that tweaks by Apple to algorithms used in the Apple Watch changed how the data was interpreted before being collected.

"These algorithms are what we would call black boxes - they're not transparent. So it's impossible to know what's in them," said Onnela. "What was surprising was how different they are. This is probably the cleanest example that I have seen of this phenomenon."

The changes are a concern for scientific researchers, who want there to be minimal changes or variances in how devices report or record data the same sets of data. Small changes may not be a problem for typical users, but for researchers where consistency is required, Onnela says "that's the concern."

The findings caused the team to shift away from using consumer hardware and back to medical-grade devices. Onnela proposes that the Apple Watch and other wearable items should only be used if raw data is available or if researchers can be informed of when algorithm changes occur.

The Apple Watch and other Apple hardware have been used for medical studies in the past, and sometimes as the primary device. In April, Apple partnered with the University of Washington to study how the Apple Watch could be used to predict illnesses like flu, or the coronavirus.

Stanford University also looked into whether an iPhone and Apple Watch could be used to remotely assess a heart disease patient's frailty, in a study funded by Apple. Researchers found there was a slight dip in accuracy in at-home testing versus in-clinic versions, though it was put down to "out-of-clinic variability" rather than Apple's sensors.

Update: Apple later told The Verge that algorithm changes are not retroactively applied to past data. The company had no explanation for the discrepancy found by Onnela, but suggested issues might arise when using third-party apps to export data.