Link 1,100 Power Mac G5s together and it will make the news. Link 1,100 Power Mac G5s together to create the third most powerful academic computing facility in the world, and it will make headlines worldwide and earn project director Dr. Srindhi Varadarajan adulation from Macintosh enthusiasts.
To close the first night of the O’Reilly Mac OS X Conference, Varadarajan gave a presentation on how his team settled on the G5 for their system and what they had to do to get it running. To begin, Varadarajan told the audience why they wanted to build a Terascale Computing Facility at Virginia Tech.
“To use one of the Department of Energy computers, you have to write a grant to get time,” Varadarajan said. “You use it, usually in about a month, and then you have to start again, essentially retarding the process of research.”
Since Virginia Tech has a world-class computational sciences and engineering program, Varadarajan said that he wanted to build world-class computational facilities to complement that program. The problem is that, while he wanted a world-class system, academia generally doesn’t have a budget to match. So, Varadarajan envisioned a system based on off the shelf processors bound together with an extremely fast off the shelf backbone.
To build the system, Varadarajan said that he and his team began by working with Dell to supply 64-bit Intel Itanium 2 processors. The key for Varadarajan was price versus performance. After going back and forth with Dell, Varadarajan said that the negotiations fell through. He then evaluated 64-bit processors from AMD, IBM and HP. But IBM said that the PowerPC 970 would be months away, and AMD and HP gave Varadarajan quotes in the $9 million to $11 million range, well over his budget. Before Apple announced the G5, Varadarajan was in a tough spot.
On June 23, Apple announced the G5. Varadarajan said that he contacted Apple on June 26 about the possibility of using the G5 for the Terascale Computing Facility. While talking with Apple, representatives from the company asked Varadarajan how long he had been a Mac user.
“I had to tell them I’d never used the Mac,” Varadarajan said. “I’m probably one of the few people who came to the platform by reading the kernel manual.”
Nevertheless, the G5 had exactly what Varadarajan was looking for. In addition to being a 64-bit processor, the PowerPC 970 processor has two floating point units allowing the processor to complete two double precision floating point calculations per clock cycle. Floating point performance is the most critical factor in scientific computing performance, and the PowerMac G5 — equipped with two PowerPC 970s running at 2GHz — can complete 8 billion floating point operations per second.
Within weeks, Varadarajan had ordered 1,100 dual processor PowerMac G5s from the Apple Store and Apple employees were helping his team out providing vast amounts of technical advice. The computers arrived at Virginia Tech between September 5 and 11. The Terascale Computing Facility made its first calculations on September 23, and by October 1 Varadarajan said that his team was making performance optimizations to the system. He expects the facility to be available for full use by January 2004.
Why so fast? Varadarajan said that designing and building quickly actually helps system designers on a budget. If a designer waits a year and a half to build a system after it’s designed, then all of the technology inside is a year and a half old and the university has lost a year and a half of potential productivity. Rapid deployment was one of Varadarajan’s primary goals in building the Terascale Computing Facility.
While the G5 had much of what Varadarajan wanted in a system, it didn’t have everything. In order for that many G5s to work together efficiently, Varadarajan needed a super high bandwidth network to link all of the systems together. The Gigabit Ethernet that ships standard on the G5 was far too slow for Varadarajan’s needs. The Gigabit Ethernet on the G5s in the Terascale Computing Facility work as a secondary communications network between the G5s in the system.
The primary communications between the 1,100 G5s in the system comes from modified Inifiniband cards in the first PCI-X slot of each G5. These cards, specially designed by Mellanox, feature extremely low latency of less than 10 microseconds and an individual bandwidth that approaches the theoretical bandwidth of the PCI-X bus of 1,250 Mbits per second. The whole network is set up in a fat tree topology with a total switching capacity of 46.02 Terabits per second, allowing all of the processors in the system to communicate and distribute computational loads efficiently.
The facility uses off the shelf G5s complete with their hard drives and Radeon graphics cards. The aluminum cases are housed in specially designed racks. Over 100 student volunteers installed the Mellanox Infiniband PCI cards, connected the copper Infiniband cables and connected the Gigabit Ethernet cables.
Where do you put 1,100 G5s? The answer is not anywhere you want. The facility is housed in 3,000 square feet of Virginia Tech’s 9,000 square foot data center. Varadarajan said that they also had to build a new cooling system to cool all of the G5s as the existing AC system would have to move air in the floor at speeds exceeding 60 miles per hour to meet the cooling demand. The new cooling system works like a distributed refrigerator that uses cooled liquid fed to smaller air driven air conditioning units housed all throughout the facility. Without the cooling system, Varadarajan said that the temperature in the facility would increase to over 100 degrees within two minutes and components would be damaged within several minutes. Virginia Tech also constructed a UPS and 1.5 Megawatt backup diesel generator for the facility.
The system costs $5.2 million for the G5s, racks, cables, and Infiniband cards. Virginia Tech spent and addition $2 million on facilities, $1 million for the air conditioning system and $1 million for the UPS and generator.
In addition to all of the hardware, Varadarajan and his team had to develop and optimize software to run the Terascale Computing Facility. Varadarajan ported a Unix system, MVAPICH, to Mac OS X to run the system and made specific optimizations to the cache memory management of the G5. Varadarajan and his team also ported several other Unix applications to manage and benchmark the system. Using experts from all over the world Varadarajan and his team are optimizing the system for scientific calculations.
The Terascale Computing Facility can solve equations with 500,000 variables, which involves creating a matrix with 500,000 values on a side. Such operations require several Terabytes of memory just to store. Performing these types of calculations, the facility’s latest benchmark is 9.555 teraflops and Varadarajan hopes to pass the 10-teraflop mark with further optimization.
The only drawback to using off the shelf components for the Terascale Facility is reliability. Varadarajan said that even a reliable server, say one that failed for a few minutes every two years, would cause failures daily when 1,100 of such computers were acting in concert. To deal with the reliability problem, Varadarajan said that his team added the ability for component failures to not bring down the entire system or even threaten ongoing calculations by moving calculations from failed components or blocks of components to working parts of the system.
Varadarajan said that many in the academic and other communities have expressed an interest in creating their own G5 superclusters or even cloning the Terascale system. Once Virginia Tech’s facility is up and running, Varadarajan said that he will place all of the documentation on how his team created their system online for others to review and implement.
“We hope a lot more of these will come up,” Varadarajan said. “We already have several contacts who basically want clones of the system, so expect to see a lot more G5 clusters from now on.”