No, Google Bard is not trained on Gmail data

Google's large language model tool named Bard says that it was trained with Gmail — but Google has denied that is the case.

Bard is a generative AI or Large Language Model (LLM) which can provide information based on its large data set. Like ChatGPT and similar tools, it isn't actually intelligent and will often get things wrong, which is referred to as "hallucinating."

A tweet from Kate Crawford, author and principal researcher at Microsoft Research, shows a Bard response suggesting Gmail was included in its dataset. This would be a clear violation of user privacy, if true.

Umm, anyone a little concerned that Bard is saying its training dataset includes... Gmail?

I'm assuming that's flat out wrong, otherwise Google is crossing some serious legal boundaries. pic.twitter.com/0muhrFeZEA
— Kate Crawford (@katecrawford) March 21, 2023

But, Google's Workspace Twitter account responded, stating that Bard is an early experiment and will make mistakes — and confirmed that the model was not trained with information gleaned from Gmail. The pop-up on the Bard website also warns users that Bard will not always get queries right.

These generative AI tools aren't anywhere near foolproof, and users with access often try to pull out information that would otherwise be hidden. Queries such as Crawford's can sometimes provide useful information, but in this case, Bard got it wrong.

Generative AI and LLMs have become a popular topic in the tech community. While these systems are impressive, they are also filled with early problems.

Watch the Latest from AppleInsider TV

Users are urged, even by Google itself, to fall back onto web search whenever an LLM like Bard provides a response. While it might be interesting to see what it will say, it is not guaranteed to be accurate.

17 Comments

tnet-primary 14 Years · 243 comments

About 2 years ago

1 Like · 0 Dislikes

DAalseth 7 Years · 3232 comments

Let’s see
i don’t trust Bard to get the right answer.
i don’t trust Google to tell the truth.

6 Likes · 0 Dislikes

chasm 11 Years · 3714 comments

I think it would have been more prudent to say that Google denies Bard was trained on Gmail data, as given Google’s track record on honesty is … not something you should take as definitive, let’s just say.

Out of the mouths of babes … or bards, in this case …

5 Likes · 0 Dislikes

genovelle 17 Years · 1481 comments

chasm said:

I think it would have been more prudent to say that Google denies Bard was trained on Gmail data, as given Google’s track record on honesty is … not something you should take as definitive, let’s just say.
Out of the mouths of babes … or bards, in this case …

So please tell me when they actually stopped scanning emails for data. They said the would stop a year after being caught doing it, but didn’t at that point. There have been no announcements claiming they had stopped since. This may be a carefully worded statement where gmail wasn’t used directly but a repository of data gleaned from all of their sources was used.