Over 1.2B profiles found in unsecured server shows severity of data collection by tech firms

Nov 22, 2019

The discovery of an unprotected data store containing 1.2 billion records of personal information gleaned from data brokerage services offers a glimpse into not only the kind of resources scammers and hackers can acquire about a large number of potential targets, but also the amount of data online services share or sell with other entities.

A still from an Apple marketing campaign about the privacy security offered by iPhone and iOS

It isn't a secret that online services like Google, Facebook, and many apps take advantage of the data of its users to serve advertising to them, which usually includes creating a profile for each person and potentially tracking them as they use other services and browse the Internet. It is also well known that the same data can circulate around and be collected together by some firms to create vast marketing databases, making them potentially quite valuable to acquire from a data breach.

In October, Wired reports dark web researcher Viny Troia discovered a data store on an unsecured server hosted on Google's cloud infrastructure. The data consisted of approximately 4 terabytes of personal data, amounting to about 1.2 billion records, compiled into databases.

The data on show didn't include any sensitive details, like payment details or passwords, but did contain lots of basic data that could have been scraped from social media, such as names, home and cellular phone numbers, and links to individual social media profiles. Approximately 50 million unique phone numbers were found on the store, as well as 622 million email addresses.

Troia reported the existence of the store to the FBI, with the server and the data pulled offline within a few hours. As the server was found by Troia as part of a search with researcher Bob Diachenko using scanning services BinaryEdge and Shodan, only the IP address for the server was discovered, with no way of knowing who compiled the data collection at all, except that it was easy to find and to acquire data from it.

Due to the inability to determine its creator, it is also not possible to know exactly what the store was used for, be it by criminals or by a larger company with exceptionally poor security. Despite existing on a server hosted on Google's cloud services, it is unlikely Google itself created the cache, as it is far more likely someone paying for Google's cloud services formed the server instead.

Sourcing Questions

What is known is that the data is made up of four datasets, with three seemingly from one data broker called People Data Labs, while the other may have come from Oxydata.

It is suggested by People Data Labs the server creator used one of its "enrichment products" along with other service to compile the collection. "Once a customer receives data from us, or any other data providers, the data is on their servers and the security is their responsibility," advised co-founder Sean Thorne.

It is unlikely the data was sourced from PDL as part of a breach, Troia believes, as it would be easy enough to simply pay for the data in the first place. An alternate option would have been to sign up for a free trial service from PDL that provides 1,000 consumer profiles per month, with the use of a thousand burner accounts potentially resulting in a million profiles in a short space of time, if there are no duplicates.

Though it is doubtful either firm endured a breach of the data, and also insist on their clients securing the data and signing agreements to not resell the data onward, neither PDL nor Oxydata are able to enforce the security of their customers, leaving the possibility of it being staggeringly poor security by a client.

An Even Bigger Issue

"What stands out about this incident is the sheer volume of data that's been collected and how it's been aggregated, stored, and commercialized without the knowledge of the data owners," said security researcher and operator of HaveIBeenPwned Troy Hunt, noting his own personal data was found in the store. "We're definitely seeing more data than ever circulating," which Hunt believes is not just from breaches, but also from data being "taken by other services, duplicated, then breached again."

The sheer amount of data being compiled and seemingly acquired with ease highlights not only the amount of data at risk from the regularly-reported breaches but also how much tech companies have compiled about their users. The creation of marketing profiles has helped refine the advertising campaigns and revenue of companies like Google, but at the expense of user privacy.

In some cases, this has resulted in major scandals, with the biggest being the privacy breach of Cambridge Analytica, which misused data sourced from Facebook for political purposes.

Apple is seemingly one of the few companies attempting to take a stand against the practice, with CEO Tim Cook often referring to privacy as a fundamental human right along with the creation of advertising campaigns hammering home the message to customers.

The company has taken steps to anonymize data in a variety of different ways, minimizing what it collects to the bare essentials to perform an operation, while also attempting to perform protection on behalf of the user against other firms' best efforts. This includes Safari's Intelligent Tracking Protection blocking the vast number of online tracking systems, while Sign In with Apple attempts to limit the usage of the usual Facebook and Google-based sign-in systems that have dominated online services so far by creating a privacy-focused version.

Cook has also suggested to the U.S. Congress there should be some form of privacy legislation to protect against data brokers, including how data is collected and stored.

Read More from AppleInsider

Sourcing Questions

An Even Bigger Issue