If all things were equal and IBM Corp. made its systems as accessible as Dell Inc. and Hewlett-Packard Co. do theirs, the IBM Power5 processor could bury Intel Corp.’s Itanium 2. First introduced last summer, the Power5 is a one-two punch, a triumph of engineering from a company that excels not only in processor design but also in the submicron science of chip manufacturing and packaging.
The Power5 is plenty fast, of course. But it can also be viewed as IBM’s first serious attempt to meet customers’ needs beyond speed. The Power5 offers improved power efficiency and terrific scalability, supports non-IBM operating systems (including Linux and Windows), and delivers partitioning and virtualization unmatched by current Intel technology.
The Power5 also foreshadows a new generation of 64-bit, PowerPC-based workstations and servers from IBM’s longtime partner in Power, Apple Computer Inc. And IBM recently pulled an unexpected move for a company built on patents by publishing the Power architecture and tools under an open license.
There are so many ways in which Power5’s influence reaches beyond IBM’s primary base of well-heeled customers. Although IBM also sells Itanium 2, Opteron, and Xeon servers, the company seems clearly intent on putting Power5 systems in the hands of Linux and Windows administrators. Whether that makes sense will be up to customers, but the sheer technical muscle of Power5 and the faltering fortunes of the Itanium architecture demand IBM’s flagship processor take a trip under our microscope.
IBM has consistently attracted the brightest minds, the kind of engineers who deserve the moniker “computer scientist.” In the 1980s, these scientists cooked up a processor architecture that was built for performance: the IBM 801, the original RISC processor. The 801’s legacy lives on in the IBM Power series of enterprise-class processors.
The major difference between a RISC processor and a CISC processor, such as Intel’s x86, can be viewed as a tug-of-war between programmers and chip designers. CISC processors are designed to make application developers’ lives easier by reducing common operations to single, long-executing native instructions, giving CISC a reputation as a slow but friendly design. Compared in that light, RISC is fast and unfriendly. Each of its simple instructions serves a very narrow purpose, executes quickly, and parallelizes exceptionally well. RISC requires patient, gifted programmers and meticulously optimized compilers; RISC’s success attests to an abundance of both.
The best known Power5 attribute is its integration of two discrete RISC cores on a single chip. Announcements from Advanced Micro Devices Inc., Intel, and Sun Microsystems Inc. regarding upcoming multicore processors focused attention on this aspect of Power5, but multicore was also a feature of its predecessors, Power4 and Power4+. According to IBM, Power5 is fully compatible with Power4 executables. The wonder of multicore is that it delivers the pipe dream of more speed in less space without a marked increase in heat. But as you’ll see, multicore is not simply SMP on a chip.
For one thing, the Power5’s cores share a very fast Level 2 cache. The speed and quantity of cache is a factor in the performance of all microprocessors. (The evolution of the x86 shows Intel to be utterly cache-obsessed.) With simple instructions flying through a RISC CPU so rapidly, the cache’s efficiency in reducing the number of trips to RAM becomes the key to the whole design.
The Power5’s Level 2 cache totals just less than 2MB. With a shared cache, data fetched by one core is immediately available to the other, increasing the likelihood that fetching the next program instruction or block of data won’t require a trip to performance-killing RAM. But the shared cache also makes it more likely that the cores will try to access the cache at the same time, which they cannot do.
IBM implemented a cache-contention stopgap, splitting the Level 2 cache into three segments. This design permits quasi-simultaneous access to cache as long as both cores are hitting different cache segments. IBM has another creative solution to the Level 2 cache-contention issue: a ponderous 36MB external Level 3 cache. Each core owns its Level 3 cache exclusively, so there’s no possibility for conflict between cores. Although Level 3 cache isn’t nearly as fast as Level 2, Level 3 is much faster than main memory, and Power5’s design makes the connection between its core and its associated Level 3 cache a direct link. We consider IBM’s reworking of the Level 3 cache design to be one of the top design wins in Power5.
Another substantial Power5 gain is its on-chip memory controllers. Each Power5 core has its own controller and is capable of managing a dedicated block of main memory. This has a huge impact on overall performance, as we’ve seen in comparing the memory throughput of Opteron and Xeon, for example. And in Power5’s case, the design fits with IBM’s strategy of multilevel parallelization.
Two is not enough
Power5 isn’t just dual-core; it implements Power4’s SMT (Simultaneous Multi-Threading) facility, which gives each core the capability of executing instructions from two threads simultaneously, under certain conditions. SMT is similar to Intel’s HTT (Hyper-Threading Technology) but with distinct advantages that make “certain conditions” broader and that dynamically optimize parallelization by analyzing and prioritizing threads to make parallel execution more efficient — much more efficient, we think. Although it’s difficult to isolate in testing, Power5’s implementation should outgun the maximum 30 percent boost that Intel projects for HTT.
Power5 adds two basic, but much-needed, thread-prioritization schemes. Dynamic Resource Balancing attempts to keep instruction streams flowing smoothly by analyzing the behavior of threads and by sidelining code that could slow down an SMT stream. For example, instructions that must be executed in sequence to derive an accurate result can lock that thread in the processor for a time. Power5 tries to predict this and run simpler instructions until there’s room to execute the sequence without clogging SMT.
In another awesome design gain, Power5’s adjustable thread priority gives OSes, drivers, and applications the capability of assigning an arbitrary priority level to each thread. This application-defined thread priority is factored into Dynamic Resource Balancing calculations and is used more broadly to determine the length of time a thread remains active in the CPU. It also gives operating systems an easy way to control power conservation.
If you’ve got a lot of high-priority threads running, the box will run hot. But as the OS knocks thread priorities down, the CPU will run more idle cycles and therefore run cooler. If you knock all thread priorities down to their lowest level, the CPU goes into a sleeplike low-power mode. That’s the simplest approach to power management we can imagine.
Finally, Power5 uses what it knows about the facilities needed by each RISC instruction to, in essence, power down portions of the chip that aren’t needed at that moment. This potentially puts a new spin on Power’s infamous power and heat problems. It certainly seems simpler than OS-driven power management schemes such as those employed by x86 processors.
You might never notice
On technology alone, Power5 is positioned to rule. But unbelievable as it might seem to the many Itanium 2 skeptics who share their opinions with InfoWorld, the majority of observers have already called the Itanium 2/Power5 contest in Intel’s favor.
That’s an odd assessment because, in this case, IBM is pulling an Intel on Intel. RISC owns the Unix market, Unix owns the midrange to high-end market, and Intel doesn’t do RISC. It’s out in the cold on those multimillion dollar, big-iron purchase orders. Intel is effectively locked out unless it can convince buyers that Itanium 2 obsoletes RISC. Will Intel be able to break in? We think it’ll take years for the Itanium to push RISC aside, and while it’s breaking in, Power and Sparc will continue to evolve.
What makes this hard to call is that IBM wants Intel’s market as much as Intel wants IBM’s. IBM is selling Power5 servers for US$5,000 with Linux preinstalled. Go back up and scan the specs to understand why a $5,000 Power5 server might be nice to have around.
Analysts etching headstones for Power note that IBM’s chip business isn’t making money. But its systems business is, and now those two units are one. That’s a smart move: Make chips for systems you sell; build systems around the chips you’re making. Releasing the design and tools to the public is smart, too. Every open licensee is a potential manufacturing customer, and unencumbered intellectual property is going to flow in from geniuses not on IBM’s payroll.
These are good strategies for cozying up to the entry market. If only IBM didn’t have to deal with customers. Big Blue has never been able to bring the low end of its catalog the brand polish and customer trust that Dell and HP enjoy in spades. The great work IBM’s engineers have done is gated by the company’s poor marketing. In all likelihood, if you’re not running IBM gear now, you’ll never look at a Power5 server regardless of the price.
IBM has intentionally hitched Power5’s success to Linux at the entry level. But it’s hard to extract added value from software that the public believes it can download for free, and Linux is an OS that buyers don’t tend to purchase new hardware to run. In other words, Linux won’t sell Power5 entry servers. At $5,000 to $6,000, IBM’s least expensive Power5 server isn’t cheap enough compared with a dirt cheap Opteron or Xeon EM64T (Extended Memory 64 Technology) server running Linux.
On the other hand, big Unix iron sells itself, and customers will always buy more of what they’re already using. They’ll buy what their solution consultants advise. IBM exceeds all others in its ability to fawn over major accounts. You cannot pry a customer loose from IBM hardware at the midrange and up. So the overall message on Power5 will be garbled to the press and the public at large, but the suits in the field bypass IBM’s marketing. In IBM-to-customer relationships, you can’t beat IBM.
Power5’s got just about everything: speed, simplicity, innovation, seamless backward compatibility, a mature development toolset, and the backing of a technological giant. It’s an unrivaled engineering achievement, created by what may be the world’s smartest engineers. If IBM’s marketing ever matches the intelligence of its engineering, watch out, Intel.
Tom Yager is technical director of the InfoWorld Test Center. Tomorrow Tom will look at Apple’s relationship with IBM.