Speech recognition systems from major tech companies have a harder time understanding words spoken by black people than the same ones spoken by whites, a new study finds.
These types of systems are commonly used in digital assistants like Siri, as well as tools like closed captioning and hands-free controls. But, as with any machine learning system, their accuracy is only as good as their dataset.
Automated speech recognition (ASR) systems developed by companies like Apple, Google and Facebook tend to have higher error rates when transcribing speech from African Americans than white Americans, according to a Stanford University study published in Proceedings of the National Academy of Sciences.
Researchers carried out 115 human-transcribed interviews and compared them to ones produced by speech recognition tools. Of those, 73 conversations were with black speakers, while 42 were with white speakers.
The team found that the "average word error rate" was nearly double (35%) when the ASR systems transcribed black speech, compared to 19% when it transcribed white speakers.
To rule out differences in vocabulary and dialect, the researchers also matched speech by gender and age, and had speakers say the same words. Even then, they found error rates nearly twice as high for black speakers than for white ones.
"Given that the phrases themselves have identical text, these results suggest that racial disparities in ASR performance are related to differences in pronunciation and prosody— including rhythm, pitch, syllable accenting, vowel duration, and lenition— between white and black speakers," the study reads.
Error rates tended to be higher for African American men than for women, though there was a similar disparity among white men and women. The accuracy was the worst for speakers who made heavy use of African American Vernacular English (AAVE).
Of course, machine learning systems can't be biased the same way people can. But if there's a lack of diversity in the data they are trained on, that's going to show up in their accuracy and performance. The study concludes that the primary issue seems to be a lack of audio data from black speakers when training machine learning models.
It's worth noting that the researchers used a custom-designed iOS app that leveraged Apple's free speech recognition technology, and it isn't clear whether Siri uses that exact machine-learning model. The tests were also conducted last spring, so the models may have changed since then.
While the study looked specifically at black and white speakers, digital assistants can also have a harder time interpreting other accents.
A 2018 story by The Washington Post found that digital assistants like Alexa or Google Assistant have a harder time understanding people with accents of all kinds. Generally, speakers from the West Coast — where most tech giants are located — were the best understood.
And in 2019, U.S. federal researchers also found widespread evidence of racial bias in nearly 200 facial recognition algorithms, cementing the fact that lack of diverse data sets can cause similar issues in all types of machine-learning platforms.