At the recent Hot Chips conference in Palo Alto, AMD outlined its upcoming 12-core server processor, codenamed "Magny-Cours." The new CPU will arrive in 2010, and will fit into the same power envelope as the existing six-core Istanbul processor. But to make that happen, AMD had to make some compromises.
Due in 2010 on AMD's 45nm SOI process, Magny-Cours uses the same basic core microarchitecture as the current Shanghai quad-core server processor, so if there's any improvement in per-thread performance it will have to come from better system design.
The basic idea behind Magny-Cours is simple: take two six-core Istanbul processors, downclock them a bit to reduce power, and squeeze them into a multichip module (MCM) so that they can fit into a single socket. By using an MCM, AMD will be able to fit 12 cores into the same thermal and power envelope as Istanbul.
For system architecture reasons, AMD's MCM picture is a little more complex than was Intel's, because each Istanbul chip has its own on-die dual-channel DDR3 memory controller, along with four HyperTransport links. Obviously, you can't push each chip's full interconnect bandwidth through a single socket, so AMD had to cut out some links. The company's MCM 2.0 design has four total HT ports (two per chip) and four DDR3 memory ports (two per chip) on each MCM. For each individual chip, one of the links is x16 and another is x8. The two chips are connected inside the module by a x16 HT link.
Even with four HT links and four memory channels to keep the MCM fed, 12 cores is still a lot to pack into a single socket, and bandwidth starvation is a concern. To help alleviate the bandwidth pressure AMD's Istanbul made a very smart tradeoff in the form of HT Assist, and this tradeoff is carried over to Magny-Cours, where it's even more necessary.
One of the big challenges in multiprocessor system design is keeping the various processors' caches in sync with one another; solutions to this problem all involve some amount of communication among the processors, and this "snoop" traffic eats up valuable bus bandwidth. The solution that AMD has adopted with Istanbul and Magny-Cours involves setting aside 1MB of each chip's 6MB cache to store a directory of the contents of the other chips' caches, so that by consulting this local directory each chip can avoid broadcasting a significant number of traffic-increasing snoop requests to the other chips. more