Affiliate Disclosure
If you buy through our links, we may get a commission. Read our ethics policy.

Future iPhone or AirPods could watch and learn a user's unique gestures

Machine Learning could bring touch sensitive controls the basic AirPods and more

Apple wants a future iPhone or AirPods to watch the user and learn a user's movements with Machine Learning so that swiping on any device or surface can cause a reaction like turning volume up or down.

It sounds like a parlor trick. In a newly-granted patent, Apple wants devices to be able to pretend to have touch-sensitive controls. It stresses that this is for any device, but the repeated example is of earbuds where this could mean having a volume control where there is no volume control.

The patent is called "Machine-Learning Based Gesture Recognition," but gestures are only part of it. Apple's idea is that a device such as a wearable earbud could have other sensors, typically an optical or proximity one, but perhaps a temperature sensor or a motion one.

"However, an earbud and/or earphones may not include a touch sensor for detecting touch inputs and/or touch gestures," says Apple, "in view of size/space constraints, power constraints, and/or manufacturing costs."

If you haven't got a touch sensor, today, that's limiting. But, that's not stopping Apple.

"Nonetheless, it may be desirable to allow devices that do not include touch sensors, such as earbuds, to detect touch input and/or touch gestures from users," it continues. "[This proposal] enables a device that does not include a touch sensor to detect touch input and/or touch gestures from users by utilizing inputs received via one or more non-touch sensors included in the device."

So maybe your earbud has a microphone and it could pick up the sound of you tapping your finger on the device. Or it has an optical sensor and your finger blocks the light as you go to stroke the device.

Machine Learning can be trained to react when multiple sensors are registering a change Machine Learning can be trained to react when multiple sensors are registering a change

Apple is being very thorough about just what sensors could conceivably be used — and used in conjunction with one another. "The sensors... may include one or more sensors for detecting device motion, user biometric information (e.g., heartrate), sound, light, wind, and/or generally any environmental input," it says.

"For example," it continues, "the sensors... may include one or more of an accelerometer for detecting device acceleration, one or more microphones for detecting sound and/or an optical sensor for detecting light."

The point is that on its own, any one sensor could be wrong. You might be scratching your ear right next to the microphone, for instance. Or the light is blocked because you're leaning against a wall.

Apple's idea, and the reason this involves Machine Learning, is that it is the combination of sensors that can work together. "[The] inputs detected by the optical sensor, accelerometer, and microphone may individually and/or collectively be indicative of a touch input," says the patent.

"After training, the machine learning model generates a set of output predictions corresponding to predicted gesture(s)," continues Apple. So ML can learn that a scratching sound on the microphone isn't enough by itself, but when light into an optical sensor is blocked, something's happening.

As ever with patents, the descriptions are more about how something is detected than what will then be done with the information. In this case, though Apple does say that after such "predictions are generated, a policy may be applied to the predictions to determine whether to indicate an action for the wireless audio output device 104 to perform."

So if ML thinks a change in, say, three sensors, is significant, it can pass that on to software. That software can then, for one example, raise or lower volume on a device.

Multiple sensors across multiple devices can be combined. Multiple sensors across multiple devices can be combined.

It sounds like Apple is cramming in many sensors into devices and has had to decide which one to leave out for lack of space. But by extension, if ML can learn from all of the sensors in a device, it can surely learn from every device a user has.

Consequently, if an AirPod detects a certain sound but the Apple Watch does not, then that sound is happening next to the earbud. It is therefore that much more likely that the user wants to do something with the AirPod.

This invention is credited to eight inventors. They include Timothy S. Paek, whose previous work for Apple includes having Siri take notes when you're on the phone.



2 Comments

tht 23 Years · 5654 comments

One of my cutesy ideas for Airpods is "jaw detection". ;)

An in-ear device can tell when your jaw moves. So, uh, if you have a device that counts how many times you've bitten, it could warn you that you've eaten enough. It could be used as a secondary sensor for Siri conversations. By knowing that you are speaking, not someone else, it could know when to listen to you and not someone else, in say a loud room full of people.

And if people haven't been thinking about it already, eye+hand tracking on Macs, iPads and even iPhones should really be considered a possibility.

byronl 4 Years · 377 comments

tht said:
One of my cutesy ideas for Airpods is "jaw detection". ;)

An in-ear device can tell when your jaw moves. So, uh, if you have a device that counts how many times you've bitten, it could warn you that you've eaten enough. It could be used as a secondary sensor for Siri conversations. By knowing that you are speaking, not someone else, it could know when to listen to you and not someone else, in say a loud room full of people.

And if people haven't been thinking about it already, eye+hand tracking on Macs, iPads and even iPhones should really be considered a possibility.

Don't the AirPods already know when you talk? They showed it in WWDC, I think that since the microphones are beam forming they can detect that the voice is yours.

As for the eye and hand tracking, I do really like the idea of apps and buttons lighting up when you look at them, a la vision pro, but won't this cause too much power draw and maybe processing power for such a small feature?