Over 48 Million Users Information Accessible

Given the recent Facebook and Cambridge Analytica scandal users of social media platforms, not just Facebook, should be considering what information they are allowing corporations access to. If those self-same users are still wondering about what information is left online the article that follows may help in their decision.

In an article published by ZDNet it was revealed that information concerning 48 million users was left publicly accessible. The information was publicly accessible via an Amazon Web Services (AWS) S3 bucket, according to an UpGuard security researcher who discovered the data on February 28. The company responsible for the potential oversight is LocalBlox, a company that scrapes data from public web profiles. The company quickly corrected the oversight once contacted by the UpGuard researcher.

LocalBlox

Since the company’s founding in 2010, the Washington based firm has focused its collection on publicly accessible data sources, like social networks Facebook, Twitter, and LinkedIn, and real estate site Zillow to name a few, to produce profiles. In order to do this the company says it:

“automatically crawls, discovers, extracts, indexes, maps and augments data in a variety of formats from the web and from exchange networks.”

One could wonder what all this information can possibly be used for. There are multiple uses for creating this so-called “true 360 degree people view” designed to “marry work-life and personal-life individual data to generate combined intelligence.” According to the company. One of the most used purposes of such an application is targeted marketing. Essentially the data is manipulated to best determine how to persuade a target market.

48 million user data leak

Even before the Facebook saga of recent memory, some had questioned the ethics and legality of such practices. Scraping, which can be defined as the extracting of data from websites where it is copied and sent to a central database, all too often amounts to data held by widely used websites is targeted by unknown third parties seeking to monetize this information. In such cases, both a targeted website like Facebook and any affected users are being victimized, as personal information entrusted to the social network is snatched up for the benefit of a platform of which no one is aware.

The Discovery

In the report published by UpGuard, it was revealed that LocalBlox left the massive store of information on a public but unlisted Amazon S3 storage bucket without a password, allowing anyone to download its contents. The bucket, called “lbdumps”, contained a file that unpacked to a single file over 1.2 terabytes in size. The file listed 48 million individual records, scraped from public profiles, consolidated, and then stitched together. Within the file names, physical addresses, dates of birth, (LinkedIn) job history, Twitter handles, and in some cases, IP and email addresses could be found.

The discovery was made by Chris Vickery, director of cyber risk research at security firm, who has developed a reputation for being an ethical data breach hunter. It was further revealed in the report that the data scraped from Facebook might have been collected using the social network's search feature that allows users to find profiles based on an email address, a feature that Facebook has recently discontinued in the light of the Cambridge Analytica scandal.

A reporter at ZDNet contacted Localblox's chief technology officer Ashfaq Rahman for a response via a phone call. Rahman claimed that Vickery had hacked the bucket rather than it being accidentally exposed online. This seems to fly in the face of Vickery’s earned reputation. To further muddy the proverbial waters Rahman also disputed the 48 million figure saying that most of the data was fabricated and for internal tests, but would not give a percentage.

Social Media Giants Certainly Aware of the Problem

Despite the likes of Facebook, LinkedIn, and Twitter all stipulate in their public sites' terms of service that forbid the scraping of public pages. While the companies forbid such perceived abuse a recent decision by a US Federal judge contradicts their wishes. The court ruled against that Microsoft's LinkedIn cannot block third-party web scrapers from scraping data from publicly available profiles. The ruling which was published August 14, 2017, follows a lawsuit filed by startup HiQ Labs against LinkedIn after LinkedIn issued a cease and desist letter to prevent the startup from scraping data. HiQ argued that the company would likely go under without access it's primary data source. In his ruling, Judge Edward Chen specifically called out LinkedIn's “broad interpretation” of the CFAA, which, “if adopted, could profoundly impact open access to the Internet, a result that Congress could not have intended when it enacted the CFAA over three decades ago.” The CFAA, or the Computer Fraud and Abuse Act, is both a criminal law and a statute that creates a private right of action, allowing private individuals and companies to sue to recover damages caused by violations of this law. While the legal arguments supporting the Judge’s decision are better discussed by experts but the result of such a decision is that data published in public profiles do not fall under copyright or privacy protection laws.

Meanwhile, the European Union on the May 25 of this year will enforce the stipulations of the General Data Protection Regulation (GDPR). The legislation is designed to reform data security which will affect how companies handle the data of individuals living in the EU. One of the central ideas of the legislation is the culture of legitimate use. This will involve companies having to ask users’ permission to use personal data, while also supplying a legitimate reason for needed to use that information. This will put an end to storage mines holding dormant data; as soon as the legitimate interest of information’s use has expired, that information will have to be erased. Such legislation should help social media platforms in protecting their users’ data.