Skype offers more details of 'perfect storm' outage
The situation that prevented millions of people from accessing Skype’s Internet telephony service late last week was a “perfect storm” and should not reoccur, the company said Tuesday.
The company initially attributed the problem, which began on Aug. 16, to the near-simultaneous rebooting of millions of computers, as Skype users running the Windows operating system attempted to reconnect to the service after downloading a series of routine software patches from Microsoft’s Windows Update service.
Skype’s service relies on some of its users’ computers to act as “supernodes,” routing traffic for other, less well-connected, users. But as Skype customers tried to reconnect, many of those supernodes were themselves in the process of rebooting. The remaining supernodes were soon overwhelmed because a bug in the company’s software did not efficiently allocate the network resources available.
Users were skeptical of this explanation, because Microsoft regularly issues patches that may cause Windows computers to reboot, and this has not caused problems for Skype before. Microsoft releases software updates on the second Tuesday of each month, a day known to systems administrators as “patch Tuesday.”
Skype spokesman Villu Arak offered a more detailed explanation of Skype’s outage on Tuesday: Last week’s problems were the result of a “"perfect storm” of exceptionally high traffic through the service at the same time as the Windows Update process led to a shortage of supernodes in the service’s peer-to-peer network.
The company did not offer an explanation for the high traffic, but accepted full responsibility for the software problem.
“Skype and Microsoft engineers went through the list of patches that had been pushed out,” Arak wrote. “We ruled each one out as a possible cause for Skype’s problems. We also walked through the standard Windows Update process to understand it better and to ensure that nothing in the process had changed from the past (and nothing had).”
The catastrophic effect on Skype’s service was entirely Skype’s fault, a result of its software being unable to deal with simultaneous high load and supernode rebooting, according to Arak.
On Aug. 17, the day after the problems began, Skype released a new version of its software client for Windows to correct the problem. That update should behave better the next time high traffic coincides with a scarcity of supernodes, he said.
Skype had updated versions of its software client for Windows, Mac and Linux since July’s patch Tuesday and before last week’s outage, but the changes made in those updates were not responsible for the problem, according to company spokeswoman Imogen Bailey.