In a bid to enhance its in-house Alexa voice recognition technology, Amazon reportedly employs thousands of people around the world to listen, transcribe and annotate audio clips recorded by Echo devices.
Citing sources who have worked on the project, Bloomberg reports Amazon tasks outside contractors, as well as full-time Amazon employees, to comb through snippets of audio from Echo devices to help train Alexa, the company's voice-enabled assistant.
The clips include both explicit Alexa commands and background conversations. Echo devices are constantly scanning audio for a trigger phrase — "Alexa," "Echo" or "computer" — which when deciphered activates the assistant and initiates a connection to Amazon's servers. Audio recordings begin shortly thereafter.
"By default, Echo devices are designed to detect only your chosen wake word (Alexa, Amazon, Computer or Echo). The device detects the wake word by identifying acoustic patterns that match the wake word. No audio is stored or sent to the cloud unless the device detects the wake word (or Alexa is activated by pressing a button)," an Amazon spokesperson told AppleInsider.
With Echo in homes and offices, workers expectedly hear users discussing people by name or run across customers talking about sensitive topics like banking information, the report said. When snippets contain personal information, workers are instructed to mark the clip as "critical data" and move on.
Amazon does not inform customers that recordings of Alexa conversations are heard by employees, though the company's website does note user requests are applied to "train our speech recognition and natural language understanding systems."
According to people who have worked on the project, the job is typically uninteresting, with one employee saying he was tasked with parsing mentions of "Taylor Swift." Other recordings border on what can be considered intrusion. Two workers said they heard what they believe to be an instance of sexual assault, the report said.
Amazon reportedly constructed policies to deal with distressing audio clips, but two employees who sought help when they stumbled upon such snippets were told it was not Amazon's responsibility to take action. Employees have taken to sharing unsettling — or amusing — clips on an internal chat system.
The report notes Amazon's review system strips identifying information like a user's full name and address from the clips, but leaves the customer's first name and product serial number intact.
"We take the security and privacy of our customers' personal information seriously," Amazon said in a statement to the publication. "We only annotate an extremely small sample of Alexa voice recordings in order [to] improve the customer experience. For example, this information helps us train our speech recognition and natural language understanding systems, so Alexa can better understand your requests, and ensure the service works well for everyone."
With human intervention, Amazon is able to teach Alexa's software to recognize and respond to certain words and phrases. Like most voice-recognition technology, Alexa is powered by artificial intelligence, but the system requires training by human operators, in this case workers based out of Boston, Costa Rica, India and Romania. These people can listen to up to 1,000 clips per day, the report said.
Apple, too, employs a human review process to improve Siri. In a security white paper (PDF link), Apple notes Siri saves voice recordings "so that the recognition system can utilize them to better understand the user's voice." The recordings are stripped of identifiable information, assigned a random device identifier and saved for six months, over which time the system can tap into the data for learning purposes. Following the six-month period, the identifier is erased and the clip is saved "for use by Apple in improving and developing Siri for up to two years."
Machine learning, both on-device and in the cloud, is also used to tweak Apple's voice recognition technology, including "Hey Siri" wake word pronunciation.
Updated with statement from Amazon.
42 Comments
Next up, Facebook watches you through the Portal in your bedroom.
Can you hear me now?
I'm trying to decide whether I care about this or not. My bank records our telephone conversations, and those are much more detailed than an Alexa or Siri request, and are directly associated with my personal information. This seems pretty benign.
The only real risk to me is embarrassment, but it's not likely someone I know personally is ever going to hear me saying or doing something I'd wish they hadn't. It's true that I don't want an Amazon contractor hearing my passwords or financial codes, but in the absence of a way for them to determine exactly who I am, even that information is essentially useless to them.
The lack of transparency and anonymity is troubling. The article mentioned “banking information” but I wonder how many people are actually using explicit information about their account to the point it could be compromised. My guess is zero to not very many. I mean, when was the last time you came home and said, “Hey, honey! I just wanted to let you know that I deposited that $10,000 into our Bank of America checking account # 520439203949!”?
I'll stick with HomePod thank you very much! Worth every penny!