Wikia: Open-source search engine coming in Q4
Wikipedia founder Jimmy Wales is targeting the fourth quarter of this year for the unveiling of an open-source search engine that he hopes could challenge the dominance of market-leaders Google and Yahoo.
The project is being run through Wikia, a for-profit company founded by Wales that seeks to use a similar model to the Wikipedia community-written and edited encyclopedia. He hopes to provide the tools and technology to allow programmers across the Internet to collaborate on the development and testing of a search engine and make the results freely available.
“The essential core principles are that I think search is now a fundamental part of the infrastructure of the Internet and it’s really fundamental to society as a whole and therefore as citizens of the world we should be concerned about it being a secretive black box,” he said.
Efforts to create open search engines aren’t new but one of the stumbling blocks they face is a difficulty in running large-scale tests of the search algorithm, said Wales. The algorithm is the code that sits at the heart of the search engine and is responsible for its accuracy or lack thereof.
“To create a full-scale crawling spider of the Web actually requires a great deal of investment in hardware,” he said. Wikia is planning to provide resources to enable full-scale crawling of the World Wide Web so the software can be fully tested and tuned.
The project is still in the planning stages and Wales expects that the first test version due this year will help programmers spot bugs that occur with real-world usage and speed up the development process.
“Probably what we’ll do is launch something in the fourth quarter of this year with a really big warning ‘It sucks, we know it sucks, it’s experimental, don’t panic. This is just an experiment to show what could be and now we’re going to start working to see how we could make it better,’” he said.
Already the project is attracting attention, not just from engineers who want to lend a hand but from companies that are already offering search engines.
“We are getting a lot of interest from second-tier search players who are really interested in some of the alternatives that might be available. If you’re not one of the top three or four you’ve got to really wonder how could you ever catch up with Google and their billions of dollars. This provides kind of a level playing field where lots of people can contribute,” said Wales.
Wales cites a story he heard about research done on search results. In the study users were presented with results from Google and Yahoo but with the brand names switched, so the Google name was above the Yahoo results and vice versa. In most cases users picked the Google-branded results as better, he said.
“To me this shows a great vulnerability if Google is only competing on brand image,” said Wales. ”If good quality search results are becoming a commodity, if the problem of search is in some sense solved then if I can make that free it really changes the structure of competition on the Internet.”
“We’ll give away all the technology, all the data. Release everything under a free license because in my view the idea has been very solidly proven wrong by Wikipedia that in order to out-compete on the Internet you need to have a walled garden of special content no one else has. Wikipedia gives everything away. You can download the database and put up a clone of Wikipedia tomorrow. All that means is that more people find out about the brand, more people drive traffic back,” he said.
Even if multiple competitors spring up using the Wikia search engine he’s hopeful the Wikia site can still grab a 2 percent or 3 percent share of the search market and so would become “a pretty decent little business.”
But just how successful the search engine is will depend on more than just its accuracy. The search algorithm is the most closely guarded secret of companies like Google because it determines how high a particular site is ranked. If the algorithm is public then it would mean site owners could precisely tune their sites for high rankings and that’s sure to attract the attention of spammers. Wales hopes a community effort will result in a black list of abusers to keep the results clean.
“If published algorithms make it too easy for spammers to game the system then we’ve got a real problem and my whole idea won’t work,” he said.