Twitter’s persistent and disruptive service outages entered a second week, as the company scrambles to bring its site availability back to acceptable levels.
After multiple incidents brought Twitter.com and its platform for third-party applications down several times last week, the company said on Friday that it had identified the causes and had taken concrete steps to resolve the problem.
Specifically, Twitter blamed errors in planning, monitoring and configuring its internal network, and said that in response it had doubled the capacity of its internal network, sharpened its monitoring and improved its load balancing,
“By bringing the monitoring of our internal network in line with the rest of the systems at Twitter, we’ll be able to grow our capacity well ahead of user growth. Furthermore, by doubling our internal network capacity and rebalancing load across the internal network, we’re better prepared to serve today’s tweets and beyond,” wrote Jean-Paul Cozzatti from Twitter's engineering team on the company’s official blog.
However, problems continued throughout the weekend and into Monday morning, as acknowledged on the official Twitter Status blog, as the site returns its notorious “fail whale” error message.
Not even at its halfway point yet, June is already the worst month in terms of downtime for Twitter since October of last year, according to Web performance monitoring company Pingdom. So far this month, Twitter has been down for 3 hours and 3 minutes.
Twitter, launched in March 2006, had frequent and lengthy outages in 2007 and the first half of 2008, but then steadily improved its site uptime by beefing up and revamping its systems. In 2009, it had very solid months but also bad ones, like August, when it was down for more than 6 hours, according to Pingdom.
“If you look at the type of outages out there, they seem to be largely related to relatively new or fast-growing services. Often fast changes are harder to manage, especially by small, new startup teams that have not yet built lots of operational discipline and maturity,” said IDC analyst Al Hilwa.
Such services often face patterns of use they don’t fully understand and workload peaks of unknown scope for which they don’t have defined response plans, he said via e-mail. “This is a also a by-product of new architectures often used as the back-end of the cloud services supporting these types of new social networks or Web sites,” Hilwa said.
The predominant architectures to handle such scale require factoring the workload on many engines and having them collaborate as a distributed system. In the long run such architectures will mature, but for now operators are clearly challenged to provide the right level of robustness, he said.
“Finally, the main ways to achieve higher levels of availability involve spending money on redundant engines, which of course drives up the costs and is particularly challenging for start-up ventures or high-growth businesses, some of which are still trying to figure out what the revenue model will look like,” he said.
This year, Twitter’s worst month had been January with 89 minutes of downtime, but things have taken a turn for the worst in June. The difference now is that Twitter launched in April its Promoted Tweets advertising program, a key to its revenue-generating strategy.
Having repeated outages can't help Twitter's efforts to lure big-name corporate marketers. These companies will expect a certain level of system stability when committing to spending on an advertising campaign, especially when Promoted Tweets ads will be generated and posted in the same manner and with the same format as regular Twitter posts.
Twitter launched Promoted Tweets with a limited number of partners like Starbucks and Best Buy. Twitter hasn’t responded to a request seeking an update on the status of the program.
Twitter has been growing dramatically in the past two years, becoming the preferred tool for individuals to provide updates on their personal lives and for companies and public figures to promote themselves, their brands and products. Users posted about 2 million messages on Twitter in May, according to Pingdom.
Twitter has also been very popular with external developers, who have created more than 50,000 Twitter applications.
Updated at 10:25 p.m. PT with comments from an analyst.