Understanding vNUMA (Virtual Non-Uniform Memory Access)
Sometime last year, I had a conversation about some interesting concepts around vSphere design and vNUMA was brought up as well as some considerations for large virtual machines. It’s funny how starting a conversation on one topic can lead to something completely different and address some of the issue or even expose some of the constraints that we can miss.
As I am preparing for the VCAP-DCD/DCA I figured why not just create a post explaining the basics of UMA, NUMA, vNUMA, and SMP. I find it pretty important to understand these details because they play an important part in designing VMware vSphere Environments.
SYMETRIC MULTIPROCESSING (SMP)
To keep it simple, SMP architecture allows for multiprocessor servers to share a single bus and memory, while being controlled by the operating system. In other words, applications that are optimized for multi-threading can take advantage of servers that are equipped with multiple processors. Most modern Unix and Windows operating systems will support SMP architecture. Same idea applies to CPU’s with multiple cores. Each core will be treated as a separate CPU allowing for added benefit of multi-threading for applications and operating systems that support it.
However, as you’ve probably realized, sharing memory and bus can lead to a performance issue if we start adding more processors.SMP architecture within a server
Unified Memory Architecture (UMA) may also be known as Shared Memory Architecture (SMA) is where the CPU’s within the single server both share the same memory uniformly.
NON-UNIFORM MEMORY ACCESS (NUMA)
NUMA architecture works by grouping or clustering CPU and Memory together to create what we call a NUMA node. The key benefit of NUMA architecture is to reduce memory latency and increase application memory performance by grouping memory and CPU together in multiprocessor servers.
Now, the memory within the same NUMA node becomes local to the CPU and provides dedicated memory access for that particular CPU. Every time the CPU tries to access memory in a different NUMA node, it is considered remote access. Remote access means higher latency and higher latency can translate to reduced application memory performance.NUMA architecture
It is also important to understand why NUMA plays a key role when it comes to virtualization. In fact, Frank Denneman wrote a great article which I will link here. He also has an article on NUMA Scheduling if you’re interested, you can find it here.
VIRTUAL NON-UNIFORM MEMORY ACCESS
In virtualized environments, the hypervisor has to be able to manage compute resources really well for the sake of virtual machine performance. Remember, we can place many virtual machines on one host, that’s the idea behind server virtualization, but those virtual machines still need to run just as well as if they were running on a physical host. Virtual Non-Uniform Memory Access allows for better virtual machine placement. When we refer to placement, we’re referring to virtual machine to NUMA node alignment/placement
With vNUMA enabled, the hypervisor will be able to map out a reference NUMA topology of the underlying system and then present this topology to the virtual machine.vNUMA disabled
As you can see, the virtual machine above, does not really know about NUMA, meaning, there is no reference topology of the underlining NUMA environment so the virtual machine will use any CPU/Memory for it’s Operating system and applicationsHere’s another diagram with vNUMA enabled
As you can see from the second image, the virtual machine nicely aligns with the NUMA node allowing for better performance. Where this get’s interesting is when we’re dealing with large virtual machines, by large, I mean virtual machines that have more vCPU or vCores then the underlying physical CPU in the host. We will talk about it next.
For vNUMA support, we have to be at vSphere 5 and the virtual machine needs to be version 8
DEALING WITH LARGE VIRTUAL MACHINES (MONSTER VM)
One of the main reasons for vNUMA is to address large virtual machines that have more vCPU/vCores than the NUMA nodes that are presently available. For example, let’s say that you have a NUMA node with 1 CPU and 8 cores, however your monster virtual machine requires 1 vCPU with 16 cores. This means that without vNUMA enabled, your virtual machine will access any available CPU/Memory regardless of the latency. That’s because it will see these resources as one big pool of available Memory and CPU. The outcome of such placement could result in poor application performance.
By enabling vNUMA, the large virtual machines will be presented with the underlying NUMA nodes and placed intelligently.
You can refer to the two diagrams above for regarding vNUMA.
vNUMA plays an important part when dealing with large virtual machines. It’s important that we understand the inner workings of such architecture because let’s be realistic, benchmarking virtual machine or application performance is a lot more than just testing memory and CPU, there has to be architecture in the background like NUMA that helps optimize the communication between the virtual machines and the CPU/Memory of servers.