Apple joins project to improve speech recognition for disabled users

The University of Illinois (UIUC) is working with Apple and other tech giants on the Speech Accessibility Project, which aims to improve voice recognition systems for people with speech patterns and disabilities current versions have trouble understanding.

While often derided for mishearing a user's request, voice recognition systems for digital assistants like Siri have become more accurate over the years, including the development of on-device recognition. In a new move, a project is aiming to increase the accuracy further, by targeting people with speech impediments and disabilities.

Partnering with Apple, Amazon, Google, Meta, and Microsoft, as well as non-profits, UIUC's Speech Accessibility Project will try to expand the range of speech patterns that voice recognition systems can understand. This includes a focus on speech affected by diseases and disabilities, including Lou Gehrig's disease, Amyotrophic Lateral Sclerosis, Parkinson's, cerebral palsy, and Down syndrome.

In some cases, speech recognition systems could provide quality-of-life improvements to users with ailments that inhibit movement, but issues affecting the user's voice can impact its effectiveness.

Under the Speech Accessibility Project, samples will be collected from individuals "representing a diversity of speech patterns," to create a private and de-identified dataset. That dataset, which will focus on American English at first, could then be used to train machine learning models to better cope with the speech.

Watch the Latest from AppleInsider TV

The involvement of a wide array of tech companies that have virtual assistants or offer speech recognition features in their tools could help speed up developments within the project. Instead of using separate teams that could duplicate the results found by others, the teams can instead collaborate directly through the project.

"Speech interfaces should be available to everybody, and that includes people with disabilities," said Mark Hasegawa-Johnson, a professor at UIUC. "This task has been difficult because it requires a lot of infrastructure, ideally the kind that can be supported by leading technology companies, so we've created a uniquely interdisciplinary team with expertise in linguistics, speech, AI, security, and privacy."

4 Comments

FileMakerFeller 7 Years · 1563 comments

About 2 years ago

Tough problem to solve. But the CS work at UIUC has long had a reputation of quality, so I'm confident good progress will be made.

1 Like · 0 Dislikes

DAalseth 7 Years · 3220 comments

About 2 years ago

This is one of those initiatives that has real benefits on the face of it, but will benefit all of us as well. Most everyone has an accent that will give Siri trouble on occasion.

1 Like · 0 Dislikes

Mariner8 7 Years · 42 comments

About 2 years ago

I work in the field, and it has often occurred to me that some speech adaptations could be made by training Siri (or any other speech to text application) on an individual basis. Someone’s skillful and experienced communication partner could work phrase by phrase - e.g. “When John says something that sounds like this [….] it means this […….]“.

Working on [disabled] accents-in-general is likely to be a very fruitful tack, but John’s unique [Down Syndrome] accent may call for a significant amount of fine-tuning. A hybrid approach could be useful here.

There are a lot of skilled people, both with and without disabilities, who would jump at the chance to lean into this exploration. This is a very big deal.

On a personal note, I come from a time when the entry-level speech to text solution cost around $30,000 and had to be pushed around on a steel cart. Now it’s a free part of most desktop and mobile operating systems — an amazing evolution in 30 years, and a lot of it pushed by Apple. Kudos!

DAalseth 7 Years · 3220 comments

About 2 years ago

Mariner8 said:

I work in the field, and it has often occurred to me that some speech adaptations could be made by training Siri (or any other speech to text application) on an individual basis. Someone’s skillful and experienced communication partner could work phrase by phrase - e.g. “When John says something that sounds like this [….] it means this […….]“.

Working on [disabled] accents-in-general is likely to be a very fruitful tack, but John’s unique [Down Syndrome] accent may call for a significant amount of fine-tuning. A hybrid approach could be useful here.
There are a lot of skilled people, both with and without disabilities, who would jump at the chance to lean into this exploration. This is a very big deal.

On a personal note, I come from a time when the entry-level speech to text solution cost around $30,000 and had to be pushed around on a steel cart. Now it’s a free part of most desktop and mobile operating systems — an amazing evolution in 30 years, and a lot of it pushed by Apple. Kudos!

A couple of decades ago I had to set up a Dragon Naturally Speaking system for one of the staff, I was the IT Dept at the time. Setting it up involved exactly what you describe, the person had to read long passages, several sets of a page or more if I remember correctly, into the microphone to train the system. After that it worked fairly well, I mean given that this was just after the new millennium.