Wednesday, June 18, 2014

CPU Cores, NUMA Nodes and Performance Issues

I have a client who has been suffering from performance issues on a Remote Desktop Server guest that's running on a Hyper-V server.  I suppose some details may help here:

Original Configuration
Dell PowerEdge T410 Server
2 * Intel Xeon E5649 CPUs @ 2.53GHz
4 * 8GB 1333MHz DDR3 modules (32GB total)
Windows Server 2008 R2 Standard SP1 as the Hyper-V Host OS
 - Windows SBS 2011 (Server 2008 R2 SP1) as a Hyper-V Guest
 - Windows Server 2008 R2 SP1 (Remote Desktop and LOB Server) as a Hyper-V Guest

Current Configuration
Dell PowerEdge T410 Server
2 * Intel Xeon E5649 CPUs @ 2.53GHz
8 * 8GB 1333MHz DDR3 modules (64GB total)
Windows Server 2012 R2 Standard as the Hyper-V Host OS
 - Original Windows SBS 2011 (Server 2008 R2 SP1) as a Hyper-V Guest
 - Original Windows Server 2008 R2 SP1 as a Hyper-V Guest
 - New Windows SBS 2011 (Server 2008 R2 SP1) as a Hyper-V Guest (will replace original instance)
 - New Windows Server 2008 R2 SP1 (RDS) as a Hyper-V Guest (will replace original instance(1))
 - New Windows Server 2008 R2 SP1 (LOB) as a Hyper-V Guest
 - New Windows Server 2012 R2 (LOB) as a Hyper-V Guest

Now, we took this particular client over recently and they have been suffering various performance-related issues as well as LOB-related issues since the new system was installed (Aug-Sep, 2012). We'll just speak about the performance-related issues here...

This system has always been under-performing, sluggish and unstable. None of those are good things and we found a few causes for some of the issues, but realistically we felt the best result would be achieved by upgrading the RAM in the server and by rebuilding all the servers (software) and adding some more for application isolation purposes - we're not fans of doing what was originally done here (running LOB applications on an SBS 2011 box) or what was then tried as a fix (running LOB applications on a Remote Desktop Server). So, as Server 2012 R2 is the current Windows Server release, that's what we decided to run with - and also because its Hyper-V implementation gives us a lot more options such as live exports and much improved Hyper-V replication.

The one remaining major issue, after the RAM and Host OS upgrade, is still the sluggish performance of the original 2008 R2 RDS guest. It was topping its CPU out (in the guest) whilst barely using any host CPU resources (16%). Yes, the latest Hyper-V Integration Components are installed (for those wondering).

So, under the original Hyper-V Host (ie, 2008 R2), there were 4 Virtual Processors assigned to each guest, which is the maximum number of Virtual Processors that any guest can have under 2008 R2 Hyper-V.

Now, as this is a 2*6-core host (i.e. 12 real cores), which means a total of 24 Logical Processors including HyperThreading, we assigned 8 cores to the original RDS guest and 4 cores to the original SBS guest and moved on to other things such as building the new servers. Apparently, that's not all we needed to do - the SBS box was running fine using 16% of the host CPU resources however the RDS box was CPU-starved.

After a fair bit of investigation, fiddling, Googling, asking questions of people such as Kyle Rosenthal from WindowsPCGuy and general head scratching, hair pulling and frustration (all-round, from both the client and ourselves) I found the issue earlier this afternoon.

But first, things that COULD well have been the issue, but weren't:

1. I thought I bumped the CPU count up but hadn't
2. The host was actually flat-lining its CPU
3. I needed to install the latest Hyper-V Integration components
4. I needed to reinstall the latest Hyper-V Integration components over the top of the existing (latest) components (yet to find a way to actually achieve this)
5. I needed to uninstall and reinstall the latest components (again, yet to find a way to uninstall them)
6. I needed to drop the number of assigned logical processors from 8 back to 4, reboot, then bump from 4 to 8 and reboot again
7. I needed to drop back to a single logical processor, reboot, then up to 8 and reboot again

And now what I found to be the actual issue: "msconfig" seems to have been run in that 2008 R2 RDS virtual guest and then under Boot/Advanced, the # processors was limited to 4. I first thought about something like this after seeing that Device Manager showed all 8 virtual processors, but Task Manager/Perfmon only showed 4. So I had a look in "msconfig" and lo and behold - there was a limit of 4 CPUs set. I unchecked this option, rebooted and amazingly (well, OK, not really), all 8 CPUs were showing.

So, for good measure, I increased this to 12 virtual processors, rebooted again, and all 12 were showing in Task Manager. WOOHOO!!!

Once you go past 12 virtual processors (in this dual 6-core server), NUMA comes into play. NUMA (Non-Uniform Memory Access) is a way of allowing a processor (in hardware) or a virtual machine (in software) to access local memory faster than remote memory - in hardware, "remote" memory counts as memory connected directly to the bus of a different physical CPU. Now, NUMA not only comes into play when you start adding a large number of cores to a virtual guest, but also when you add more RAM than is physically available on one CPU to a guest - you get an approximately 20% performance hit when crossing a NUMA boundary. Because of this performance hit, you can actually get a performance reduction by adding too many logical processors and/or too much RAM to a virtual guest.

Microsoft has some information on NUMA that states (basically) the maximum memory in a NUMA node is the amount of physical RAM divided by the number of logical processors. That information seems to be rather outdated when you use current multi-core CPUs. If you're looking for some more updated information regarding NUMA node boundaries, I strongly suggest having a read of the article on Aidan Finn's blog found at which refers to this blog post


The Outspoken Wookie

No comments: