Wednesday, January 19, 2011

Single, Dual and Triple Channel RAM

In response to an article by Mark about how he's configured his new Mac Pro for TTFN's video system, I posted the following comment.  I figured, after reading it, that it has wider appeal than just in that thread, so I have re-posted it below.  Please note, Mark had configured his Dual CPU Mac Pro with 4GB, 1GB, 1GB modules per CPU, which is what prompted my reply.  Also, the following information applies equally to Intel-based Mac Pro systems (and possibly other Mac systems) and Intel-based PCs and servers.


G'day Mark,

I need to bring you up to speed on how RAM performance works with Intel chipsets and Nehalem/Westmere CPUs as it seems you're a little confused here.

Without knowing exactly what chipset is in your Mac Pro (or which CPUs) as there's no mention of this information in your article, I'll need to be a little more general than I could if I had more of this information available about your particular system.

The exact type of CPU (and chipset) will determine whether the system can utilize single/dual or single/dual/triple channel RAM, with performance increasing as you go up and for this to reach its maximum performance the banks need to be filled with identical RAM modules (at least as far as timing and access patterns go - it is best/easiest to use identical modules). Now, as each CPU contains the RAM controller to address the RAM directly attached, that effectively means that each CPU can deliver single, dual or triple channel RAM performance, depending on how each CPU's RAM is configured.

As the Mac Pro has 4 modules per CPU and 4 doesn't fit nicely into 3, what ends up happening is that anyone who wants RAM performance over RAM size ignores this 4th slot. How does this work exactly?

We'll work on a single CPU for the explanation below because each CPU manages its own attached RAM, therefore if you have a dual CPU system, you need to do the same to each CPU...

First, and most importantly, all of the information below involves *IDENTICAL* RAM modules. You cannot mix and match size, speed, refresh rates or anything else unless you want to drop back to single channel speeds. This is important.

So, assuming you install 1*1GB module in the first bank, you'll get 1X RAM speeds - ie, the CPU can address the single RAM module at its maximum speed - Single Channel. This may sound ideal, however the CPU can issue data requests much faster than the RAM can handle them, resulting in the CPU being bottlenecked by the RAM module. To get faster than this, we need Dual Channel.

To run RAM in Dual Channel mode, you need identical RAM modules in the first and second slots. Then the CPU will access *each* RAM module at its maximum speed and interleave access requests - so it will access the first module, then the second module, then the first, second, first... This will result in the RAM access speeds being around double the speed of using a single module. (If you install 2 * non-identical modules, you'll have 2 different banks of Single Channel RAM, which will be accessed at 1X RAM speeds.) To get faster again...

Triple Channel RAM is the next step after Dual Channel - there's 3 * identical RAM modules installed in the first 3 slots, resulting in the CPU talking 1, 2, 3, 1, 2, 3, 1, 2, 3, 1... to the RAM modules, allowing the CPU to access the whole RAM subsystem around 3X the speed of a single module. This is nice! :) (If the modules aren't identical, then you have 3 banks of 1X RAM speed, not 1 bank of 3X RAM speed, resulting in an overall reduction in the performance of the RAM subsystem.)

Now, if you add a 4th identical module, the CPU will drop back to 2 * Dual Channel banks of RAM, resulting in maximizing the amount of installed RAM with a 33% sacrifice in overall speed, but still running at 2X RAM speed of a single module. This can obviously have its benefits (i.e. more RAM).

So, if in Triple Channel mode, you're getting around 9600 MB/sec in RAM subsystem performance, you'll get around 6200 MB/sec in Dual Channel mode (using either one pair or two pairs of identical modules) and around 3100 MB/sec in Single Channel mode.

Clearly, running 3 * identical RAM modules, resulting in Triple Channel RAM mode, will result in the fastest possible RAM subsystem performance speed whilst sacrificing only a little (25%) of the maximum RAM able to be installed in the system.

Now, you *CAN* run 3 * 4GB modules on one CPU and 3 * 1GB modules on the second CPU and still have this access all performed in Triple Channel mode on each CPU, however this leaves the whole RAM subsystem a little unbalanced, resulting in the 2nd CPU needing to ask the 1st CPU for access to data it has more often than it would were the RAM balanced (say 3 * 4GB or 3 * 2GB modules on each CPU), making the whole system a little slower, in general, depending what's running on each CPU.

Of course, the RAM subsystem performance is only one component of the overall system performance, however optimizing each component to work well and in balance with the other subsystems will result in the best performance at the best price point - for example, it is no use running a crazy fast HDD subsystem if you're running only Single Channel RAM as the RAM will bottleneck the fast HDD subsystem.

So, right now, as you're running unmatched (non-identical) RAM modules in the first 2 banks, your Mac Pro will suffer in its RAM performance as it will be running only at Single Channel speeds - and this has been done to each CPU, so both CPUs are accessing their RAM at Single Channel speeds (around 3100 MB/sec in the example listed above, whereas they could be performing at around 9600 MB/sec if running in Triple Channel mode).

I hope this helps clear up how Intel-based Nehalem/Westmere systems access their RAM and enables you to get the maximum performance from your Mac Pro setup.


The Outspoken Wookie

1 comment:

Chris Knight said...

And this is even before you look at any memory rank limitations the chipset may have.