AppleInsider is supported by its audience and may earn commission as an Amazon Associate and affiliate partner on qualifying purchases. These affiliate partnerships do not influence our editorial content.
In a bid to enhance its in-house Alexa voice recognition technology, Amazon reportedly employs thousands of people around the world to listen, transcribe and annotate audio clips recorded by Echo devices.
Citing sources who have worked on the project, Bloomberg reports Amazon tasks outside contractors, as well as full-time Amazon employees, to comb through snippets of audio from Echo devices to help train Alexa, the company's voice-enabled assistant.
The clips include both explicit Alexa commands and background conversations. Echo devices are constantly scanning audio for a trigger phrase — "Alexa," "Echo" or "computer" — which when deciphered activates the assistant and initiates a connection to Amazon's servers. Audio recordings begin shortly thereafter.
"By default, Echo devices are designed to detect only your chosen wake word (Alexa, Amazon, Computer or Echo). The device detects the wake word by identifying acoustic patterns that match the wake word. No audio is stored or sent to the cloud unless the device detects the wake word (or Alexa is activated by pressing a button)," an Amazon spokesperson told AppleInsider.
With Echo in homes and offices, workers expectedly hear users discussing people by name or run across customers talking about sensitive topics like banking information, the report said. When snippets contain personal information, workers are instructed to mark the clip as "critical data" and move on.
Amazon does not inform customers that recordings of Alexa conversations are heard by employees, though the company's website does note user requests are applied to "train our speech recognition and natural language understanding systems."
According to people who have worked on the project, the job is typically uninteresting, with one employee saying he was tasked with parsing mentions of "Taylor Swift." Other recordings border on what can be considered intrusion. Two workers said they heard what they believe to be an instance of sexual assault, the report said.
Amazon reportedly constructed policies to deal with distressing audio clips, but two employees who sought help when they stumbled upon such snippets were told it was not Amazon's responsibility to take action. Employees have taken to sharing unsettling — or amusing — clips on an internal chat system.
The report notes Amazon's review system strips identifying information like a user's full name and address from the clips, but leaves the customer's first name and product serial number intact.
"We take the security and privacy of our customers' personal information seriously," Amazon said in a statement to the publication. "We only annotate an extremely small sample of Alexa voice recordings in order [to] improve the customer experience. For example, this information helps us train our speech recognition and natural language understanding systems, so Alexa can better understand your requests, and ensure the service works well for everyone."
With human intervention, Amazon is able to teach Alexa's software to recognize and respond to certain words and phrases. Like most voice-recognition technology, Alexa is powered by artificial intelligence, but the system requires training by human operators, in this case workers based out of Boston, Costa Rica, India and Romania. These people can listen to up to 1,000 clips per day, the report said.
Apple, too, employs a human review process to improve Siri. In a security white paper (PDF link), Apple notes Siri saves voice recordings "so that the recognition system can utilize them to better understand the user's voice." The recordings are stripped of identifiable information, assigned a random device identifier and saved for six months, over which time the system can tap into the data for learning purposes. Following the six-month period, the identifier is erased and the clip is saved "for use by Apple in improving and developing Siri for up to two years."
Machine learning, both on-device and in the cloud, is also used to tweak Apple's voice recognition technology, including "Hey Siri" wake word pronunciation.
Updated with statement from Amazon.