Data collected by Apple to improve its voice-driven Siri service is anonymized and kept on the company's servers for up to two years before it is discarded.
The disclosure was made by Apple to Wired after privacy advocates called on the company to reveal exactly what information it knows and keeps about users. Apple spokeswoman Trudy Muller said the anonymized data is collected solely to improve the service, and that the company takes customer privacy very seriously.
Much of the work for Siri is done remotely, which is why the personal assistant software available on iPhone, iPad and iPod touch requires a data connection to operate.
Voice clips stored by Apple are categorized by random numbers to represent the user who recorded it. The number is not associated with an Apple ID, email address, or anything else that could be easily personally identifiable.
After six months, the random number is no longer associated with the saved clip, but the audio file may be saved for up to two years in total for what Wired said are "testing and product improvement purposes."
However, if a user turns off Siri on their device, their randomized identifier is deleted, along with any data associated with it.
The fact that Siri data must be sent to Apple before it can provide results has been a concern for security advocates, as well as some companies. For example, last year it was revealed that security-conscious IBM barred the use of Siri on its corporate networks, out of concern that sensitive information could leak.
58 Comments
Wasn't there a thing a few years ago, where Google kept all their search queries associated with an anonymous id, and they released a database to a university researcher, and she was able to identify people and contact them just by going through all the the queries associated with a particular id? Sometimes just the fact of linking things together has a de-anonymising effect.
This isn't unexpected. However, I do wish Apple would use something like the [URL=http://en.wikipedia.org/wiki/Harvard_sentences]Harvard Sentences[/URL] or some other derivation so that you can train Siri to know how you speak so it can tie what you say to various phonemes. [quote name="AppleInsider" url="/t/157073/apple-reveals-it-keeps-anonymized-siri-data-for-up-to-2-years#post_2313624"]However, if a user turns off Siri on their device, their randomized identifier is deleted, along with any data associated with it.[/quote] I wonder if this is accurate as stated. I wouldn't think the data associated with it is deleted, but rather just the identifier that ties it to that device. From a user's perspective it's gone, but I would think it's still on Apple's servers.
If you're worried about it, don't use it.
[quote name="ascii" url="/t/157073/apple-reveals-it-keeps-anonymized-siri-data-for-up-to-2-years#post_2313629"]Wasn't there a thing a few years ago, where Google kept all their search queries associated with an anonymous id, and they released a database to a university researcher, and she was able to identify people and contact them just by going through all the the queries associated with a particular id? Sometimes just the fact of linking things together has a de-anonymising effect.[/quote] Yeah, I do recall that now. Hopefully Apple's service is more anonymous, but since we're talking bits of speech it's unless to get looked over at some point by the public or sold.
I'm not sure how this is much of a shock: we already knew it was a cloud service, and Apple's terms already stated that info was stored. The main comfort for me: unlike SOME companies I could name, Apple doesn't make essentially all its profit by selling such "anonymized" personal info! Apple makes its profit by making its users happy, and selling the Siri data would not achieve that! As for Google and DE-anonymizing... I am NEVER logged into Google. I use DoNotTrack. I use the Google opt-out extension. I block third-party cookies. Yet when I searched Google for specific car models last week, I almost immediately received spam about those same cars! Now, I don't think Google turned around and directly sold my info... but they are part of the chain, which is not anonymous at all! Clearly third parties have profiles on me including private email addresses. I don't think Siri plays any part in that kind of personal profile-building. (This is all the more puzzling because I thought Google said they'd stop sharing search terms in their referrals. Really not sure how this kind of thing happens. Something similar happened to my friend with a medical condition: she did some searches on Google, and shortly afterward, entirely unrelated sites were showing relevant banner ads! She too is never logged into Google.) Bottom line... privacy worries me. Siri doesn't.