Twitter has overhauled the back-end infrastructure of its search engine, boosting its speed and capacity to index posts, process queries and deliver results, while making the system more stable and better suited for the addition of new features.
Twitter transferred its search engine to the new platform in recent weeks, after working on the new back-end system for about six months, according to the company.
Twitter’s search engine ran on a system based on MySQL from Summize, a company Twitter acquired in mid-2008, but scaling up the system had become difficult.
The engineering team in charge of the project decided to do an extreme makeover of the search engine using a different technology: the open-source, text search engine Lucene, which is written in Java.
Twitter modified some aspects of Lucene, including its garbage collection, query termination, posting lists, and data structures and algorithms, and was left with an inverted, index-based search engine that scales much more and has better performance.
Twitter fields 12,000 search queries per second—or more than 1 billion per day—and “tweets” become part of its search index less than 10 seconds after they are posted.
“We estimate that we're only using about 5 percent of the available backend resources, which means we have a lot of headroom. Our new indexer could also index roughly 50 times more Tweets per second than we currently get,” Twitter official Michael Busch wrote in a blog post.
Twitter will contribute back to the Lucene project the modifications and improvements it made to the code.
Although Twitter makes available its index of “tweets” to external search engines like Google and Microsoft’s Bing, its internal search engine is a key component of its microblogging service.
In addition to being the preferred vehicle among private citizens, public figures and companies for broadcasting short status updates, Twitter has become an increasingly valued repository of real-time data, tapped for following news, trends and collective musings.
To maximize the value of this “tweet” repository, the company must have a search engine that is fast, comprehensive and scalable, and its massive revamping of its search technology shows the company recognizes the importance of its internal search capabilities.