Many major social networking sites are leaking information that allows third-party advertising and tracking companies to associate the Web browsing habits of users with a specific person, researchers warn.
That’s the conclusion of a study on the leakage of personally identifiable information on social networks done at AT&T Labs and the Worcester Polytechnic Institute.
The findings (PDF document), which appears to have received scant public attention so far, was presented by the study’s two researchers at a conference in Barcelona more than a month ago. Earlier this week, civil liberties group Electronic Frontier Foundation (EFF) referred to the study in a blog post.
The research, by Craig Wills of Worcester Polytechnic and Balachander Krishnamurthy of ATT, presents “some interesting technical details” on how social networking sites are leaking personal data, the EFF blog post said.
“In some cases, the leakage may be unintentional, but in others, there is clever and surreptitious anti-privacy engineering at work,” the EFF said.
Wills told Computerworld that he and Krishnamurthy surveyed 12 of the biggest social networks for the study. They discovered that 11 of them were leaking personal identity information to third-parties including data aggregators, which track and aggregate user viewing habits for targeted ad-serving purposes.
What the study shows is that most users on social networking sites are vulnerable to having their identity information from their profiles, associated with tracking cookies used by data aggregators, he said.
The information allows aggregators to relatively easily scoop up personal data from a user’s social network page and to track that user’s movement’s across multiple Web sites across the Internet.
While aggregators have typically claimed that a person’s movement on the Internet is tracked just as an anonymous IP address, the information from social networking sites allows them to attach a unique identity to each profile, Wills said.
What is not known, however, is if data aggregators are actually recording any of the personal identity information being relayed to them from social media sites, Wills said.
He said personal identity data or unique identifiers that point to a person’s real identity are often relayed by social networking sites to third parties via so-called HTPP referrer headers. HTTP headers basically identity to a Web page the URL of any resources that link to it.
In the case of the social networks surveyed, all of the URLs being relayed via such HTTP headers included the user’s unique identifier, he said.
When a user’s page is being loaded on such sites, third-party tracking and advertising services that have a relationship with the site get not only the data from their tracking cookies but also the data containing the users unique identifier from the HTTP header, he said.
Another way in which identity data is leaked to third-party providers is when a social networking site contains objects from a server that appears to be part of the site, but in reality belongs to the third-party.
At least two of the social networks surveyed were relaying personal identity data to such hidden third-party servers, the report said. Also, five of the 12 social networks surveyed were also leaking unique user identifiers via so-called Request-URIs, which identify pages or objects on a Web site.
“We don’t know what the specific practice of a third-party tracking site,” when it comes to using the information, Wills said. “But this information is available to them. It is particularly worrisome because third-party aggregators are creeping into a lot of sites that you and I visit.”
EFF staff technologist Peter Eckersley noted in the blog post that there appears to be no easy way for users of such sites to avoid being tracked in this fashion.
To mitigate the risk, users of social networking sites need to disable flash cookies and ensure that all other cookies are deleted when the browser is closed, Eckersley wrote.
Certain Firefox extensions are also available that allow users to control when third-party sites can include content or run code on their browsers and plug-ins are available to help them opt out of targeted advertising cookies, he wrote.
But the steps can be hard to follow and can limit browser functionality. “We’re fearful that the vast majority of Internet users will continue to be tracked by dozens of companies—companies they’ve never heard of, companies they have no relationship with, companies they would never choose to trust with their most private thoughts and reading habits,” he wrote.