Full Introduction to NUMA (Non-Uniform Memory Access) [MiniTool Wiki]
What Is NUMA?
What is NUMA? It is the abbreviation of Non-Uniform Memory Access. As a computer memory design, NUMA can configure a cluster of microprocessors in a multiprocessing system, thereby improving performance and expanding system capabilities.
Under NUMA, a processor can access its own local memory faster than non-local memory (memory local to another processor or memory shared between processors). The advantages of NUMA are limited to specific workloads, especially on servers, which usually associate data closely with certain tasks or users.
NUMA is used in symmetric multiprocessing (SMP) systems. The SMP system is a "tightly coupled", "share everything" system, in which multiple processors working under single operating system access each other’s memory through a common bus or "interconnect" path.
Generally, the limitation of SMP is that when a microprocessor is added, the shared bus or data path becomes overloaded and becomes a performance bottleneck. NUMA adds some intermediate levels of memory shared between microprocessors, so all data access does not have to be performed on the main bus.
NUMA can be thought of as a "cluster in a box". A cluster usually consists of four microprocessors (for instance, four Pentium microprocessors) interconnected on a local bus (such as the Peripheral Component Interconnect bus) to a shared memory (called "L3 cache") on a single motherboard (it may be called a card).
You may like this: ATX VS EATX Motherboard: What Is the Difference Between Them?
This unit can be added to similar units to form symmetric multiprocessing system in which a common SMP bus interconnects all clusters. Such systems usually contain 16 to 256 microprocessors. For applications running in an SMP system, all single processor memories look like one memory.
When the processor looks for data at a certain memory address, it first looks in the L1 cache of the microprocessor itself, then in the larger L1 and L2 cache chips nearby, and then in the third level of cache provided by the NUMA configuration before seeking data in "remote memory" located near other microprocessors. NUMA treats each cluster as a "node" in an interconnected network. NUMA maintains a hierarchical view of the data on all nodes.
Using the scalable coherent interface (SCI) technology, data moves on the bus between the clusters of the NUMA SMP system. SCI coordinates so-called "cache coherence" or coherence among the nodes of multiple clusters.
SMP and NUMA systems are commonly used in applications such as data mining and decision support systems. In these applications, the processing can be split into multiple processors that work together on a common database. Sequent, Data General and NCR belong to companies that produce NUMA SMP systems.
By the way, the first commercial implementation of a NUMA-based Unix system was the Symmetrical Multi Processing XPS-100 server series, designed by Dan Gielan of VAST Corporation for Honeywell Information Systems in Italy.
Software Support of NUMA
Because NUMA greatly affects memory access performance, certain software optimizations are required to allow scheduling threads and processes to approach their memory data.
- Silicon Graphics IRIX supports ccNUMA architecture on 1240 CPU with the Origin server series.
- Microsoft Windows 7 and Windows Server 2008 R2 added support for the NUMA architecture on 64 logical cores.
- Java 7 added support for NUMA-aware memory allocator and garbage collector.
- Linux kernel version 2.5 already included basic NUMA support, which was further improved in subsequent kernel releases.
- Linux kernel version 3.8 introduced a new NUMA foundation, which allowed the development of more effective NUMA strategies in later kernel releases.
- Version 3.13 of the Linux kernel introduced a number of strategies designed to place processes near their memory and handle situations such as sharing memory pages between processes or using transparent huge pages. The new sysctl settings allow enabling or disabling NUMA balancing, as well as the configuration of various NUMA memory balancing parameters.
- OpenSolaris uses lgroup to model the NUMA architecture.
- FreeBSD added the initial NUMA affinity and policy configuration in version 11.0.
To sum up, after reading this post, you can know the NUMA definition and work manner. In addition, you can also know that NUMA can affect memory access performance so that certain software optimize to support NUMA.