From: Wim ten Have <wim.ten.have@xxxxxxxxxx> This patch extends guest domain administration by adding a feature that creates a guest with a NUMA layout, also referred to as vNUMA (Virtual NUMA). NUMA (Non-Uniform Memory Access) is a method of configuring a cluster of nodes within a single multiprocessing system such that each node shares its processor local memory with other nodes, improving performance and the ability of the system to be expanded. The illustration below shows a typical 4-node NUMA system. Within this system, each socket is equipped with its own distinct memory and some also with I/O. Access to memory or I/O on remote nodes is only possible communicating through the "Interconnect." +-------------+-------+ +-------+-------------+ |NODE0| | | | | |NODE3| | | CPU00 | CPU03 | | CPU12 | CPU15 | | | | | | | | | | | Mem +--- Socket0 ---<-------->--- Socket3 ---+ Mem | | | | | | | | | +-----+ CPU01 | CPU02 | | CPU13 | CPU14 | | | I/O | | | | | | | +-----+-------^-------+ +-------^-------+-----+ | | | Interconnect | | | +-------------v-------+ +-------v-------------+ |NODE1| | | | | |NODE2| | | CPU04 | CPU07 | | CPU08 | CPU11 | | | | | | | | | | | Mem +--- Socket1 ---<-------->--- Socket2 ---+ Mem | | | | | | | | | +-----+ CPU05 | CPU06 | | CPU09 | CPU10 | | | I/O | | | | | | | +-----+-------+-------+ +-------+-------+-----+ Unfortunately, NUMA architectures have some drawbacks. For example, when data is stored in memory associated with Socket2 but is accessed by a CPU in Socket0, that CPU uses the interconnect to access the memory associated with Socket2. These interconnect hops add data access delays. Some high performance software takes NUMA architecture into account by carefully placing data in memory and pinning the processes most likely to access that data to CPUs with the shortest access times. Similarly, such software can pin its I/O processes to CPUs with the shortest access times to I/O devices. When such software is run within a guest VM, constructing the VM such that its virtual NUMA topology mirrors the physical NUMA topology preserves the application software's performance. The changes brought by this patch series add a new libvirt domain element named <vnuma> that allows for dynamic 'host' or 'node' partitioning of a guest where libvirt inspects the host capabilities and renders a best guest XML design holding a host matching vNUMA topology. <domain> .. <vnuma mode='host|node' distribution='contiguous|siblings|round-robin|interleave'> <memory unit='KiB'>524288</memory> <partition nodeset="1-4,^3" cells="8"/> </vnuma> .. </domain> The content of this <vnuma> element causes libvirt to dynamically partition the guest domain XML into a 'host' or 'node' numa model. Under <vnuma mode='host' ... > the guest domain is automatically partitioned according to the "host" capabilities. Under <vnuma mode='node' ... > the guest domain is partitioned according to the nodeset and cells under the vnuma partition subelement. The optional <vnuma> attribute distribution='type' is to indicate the guest numa cell cpus distribution. This distribution='type' can have the following values: - 'contiguous' delivery, under which the cpus enumerate sequentially over the numa defined cells. - 'siblings' cpus are distributed over the numa cells matching the host CPU SMT model. - 'round-robin' cpus are distributed over the numa cells matching the host CPU topology. - 'interleave' cpus are interleaved one at a time over the numa cells. The optional subelement <memory> specifies the memory size reserved for the guest to dimension its <numa> <cell id> size. If no memory is specified, the <vnuma> <memory> setting is acquired from the guest's total memory, <domain> <memory> setting. The optional attribute <partition> is only active when <vnuma mode='node'> is in effect and allows for defining the active "nodeset" and "cells" to target for under the "guest" domain. For example, the specified attribute "nodeset" can limit the assigned host NUMA nodes in effect under the guest with help of NUMA node tuning (<numatune>.) Alternatively, the provided "cells" attribute can define the guest number of vNUMA cells to render. We're planning a 'virsh vnuma' command to convert existing guest domains to one of these vNUMA models. Wim ten Have (4): XML definitions for guest vNUMA and parsing routines qemu: driver changes adding vNUMA vCPU hotplug support qemu: driver changes adding vNUMA memory hotplug support tests: add various tests to exercise vNUMA host partitioning docs/formatdomain.html.in | 94 ++++ docs/schemas/domaincommon.rng | 65 +++ src/conf/domain_conf.c | 482 +++++++++++++++++- src/conf/domain_conf.h | 2 + src/conf/numa_conf.c | 241 ++++++++- src/conf/numa_conf.h | 58 ++- src/libvirt_private.syms | 8 + src/qemu/qemu_driver.c | 65 ++- src/qemu/qemu_hotplug.c | 95 +++- .../cpu-host-passthrough-nonuma.args | 29 ++ .../cpu-host-passthrough-nonuma.xml | 19 + .../cpu-host-passthrough-numa-contiguous.args | 37 ++ .../cpu-host-passthrough-numa-contiguous.xml | 20 + .../cpu-host-passthrough-numa-interleave.args | 41 ++ .../cpu-host-passthrough-numa-interleave.xml | 19 + ...host-passthrough-numa-node-contiguous.args | 53 ++ ...-host-passthrough-numa-node-contiguous.xml | 21 + ...host-passthrough-numa-node-interleave.args | 41 ++ ...-host-passthrough-numa-node-interleave.xml | 22 + ...ost-passthrough-numa-node-round-robin.args | 125 +++++ ...host-passthrough-numa-node-round-robin.xml | 21 + ...u-host-passthrough-numa-node-siblings.args | 32 ++ ...pu-host-passthrough-numa-node-siblings.xml | 23 + ...cpu-host-passthrough-numa-round-robin.args | 37 ++ .../cpu-host-passthrough-numa-round-robin.xml | 22 + .../cpu-host-passthrough-numa-siblings.args | 37 ++ .../cpu-host-passthrough-numa-siblings.xml | 20 + .../cpu-host-passthrough-numa.args | 37 ++ .../cpu-host-passthrough-numa.xml | 20 + tests/qemuxml2argvtest.c | 10 + 30 files changed, 1765 insertions(+), 31 deletions(-) create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-contiguous.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-contiguous.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-interleave.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-interleave.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-contiguous.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-contiguous.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-interleave.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-interleave.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-round-robin.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-round-robin.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-siblings.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-siblings.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-round-robin.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-round-robin.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-siblings.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-siblings.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml -- 2.21.0 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list