AOL has apparently released details of Internet searches performed over a period of three months by hundreds of thousands of its subscribers, raising privacy concerns.
The data, apparently made available for research purposes, is no longer available at the Web site, but details of the data were cited by technology blog site Techcrunch, and the page linking to it was cached by Google’s search engine.
The cached copy of the page said the data comprised about 19 million Web searches performed by 658,000 users from March through May. The page warned of sexually explicit language in some of the queries, and said of the data, “This collection is distributed for noncommercial research use only.” The page contained a link to a compressed copy of the data archive.
The page asked researchers using the data to cite a research paper (PDF) entitled “A Picture of Search” based on the data, which names two AOL employees as co-authors.
AOL officials in London are aware of the issue, they said Monday morning. They had no further comment, and referred queries to the company’s U.S. headquarters. Reached in the U.S., company officials did not have an immediate comment.
The release of such information poses serious privacy concerns. Major search engine companies fought a request for similar data on user searches last year by the U.S. Department of Justice.
The U.S. government wanted to use the data to check the effectiveness of a federal law aimed at minors’ access to harmful material. In January it filed a motion with the court to compel Google to comply with its subpoena and turn over a “random sample” of 1 million Web site addresses found in its search engine index.
It also asked the company the text of all queries filed on the search engine during a specific week. America Online, Yahoo and Microsoft’s MSN were also subpoenaed, and complied to varying degrees.
The alleged release of AOL’s data has sparked concern over how it might be used after its widespread release. While the original page is gone, the data has since been made available on several other Web sites.
The data is valuable from a market research perspective, said David Bradshaw, principal analyst at Ovum. Normally, similar kinds of data sets are only released to trusted researchers, not the general public, he said.
Even then, the resulting research is released as a batch of aggregated statistics, masking signs of individual users’ behavior, he said.
“I do think this was foolhardy at best and a complete disaster or worse for AOL,” Bradshaw said. “If I were an AOL user, I’d be up in arms.”
The researchers who used the data wrote in an introduction that user IDs were replaced with an anonymous number. However, observers are expressing concern about whether users could be tracked based on their queries.
The data also contains the time when a particular query was executed. If a user clicked on a result, the rank of the item was recorded, along with the domain portion of the URL (uniform resource locator).
The release of the AOL data prompted numerous comments on blog entries dedicated to the issue.
Ben Noble of Aberystwyth, Wales, wrote in a blog posting that the data is anonymous enough that “there’s still an amount of deniability, but it’s appalling that anyone should be put in the position of having to deny anything.”
Noble wrote that AOL could possess a file linking anonymous users with their real ID and their searches.