On Tue, Apr 13, 2021 at 02:38:05PM +0800, Luyao Zhong wrote:
Before this patch set, numatune only has three memory modes: static, interleave and prefered. These memory policies are ultimately set by mbind() system call. Memory policy could be 'hard coded' into the kernel, but none of above policies fit our requirment under this case. mbind() support default memory policy, but it requires a NULL nodemask. So obviously setting allowed memory nodes is cgroups' mission under this case. So we introduce a new option for mode in numatune named 'restrictive'. <numatune> <memory mode="restrictive" nodeset="1-4,^3"/> <memnode cellid="0" mode="restrictive" nodeset="1"/> <memnode cellid="2" mode="restrictive" nodeset="2"/> </numatune> The config above means we only use cgroups to restrict the allowed memory nodes and not setting any specific memory policies explicitly. For this new "restrictive" mode, there is a concrete use case about a new feature in kernel but not merged yet, we call it memory tiering. (https://lwn.net/Articles/802544/). If memory tiering is enabled on host, DRAM is top tier memory, and PMEM(persistent memory) is second tier memory, PMEM is shown as numa node without cpu. Pages can be migrated between DRAM node and PMEM node based on DRAM pressure and how cold/hot they are. *this memory policy* is implemented in kernel. So we need a default mode here, but from libvirt's perspective, the "defaut" mode is "strict", it's not MPOL_DEFAULT (https://man7.org/linux/man-pages/man2/mbind.2.html) defined in kernel. And to make memory tiering works well, cgroups setting is necessary, since it restricts that the pages can only be migrated between the DRAM and PMEM nodes that we specified (NUMA affinity support). Just using cgroups with multiple nodes in the nodeset makes kernel decide on which node (out of those in the restricted set) to allocate on, but specifying "strict" basically allocates it sequentially (on the first one until it is full, then on the next one and so on). In a word, if a user requires default mode(MPOL_DEFAULT), that means they want kernel decide the memory allocation and also want the cgroups to restrict memory nodes, "restrictive" mode will be useful.
I applied the changes locally and fixed some changes that happened in the meantime. I also split the patches differently as we usually add conf, docs and schemas (driver-unrelated code) and some possible tests in one patch and then add support for each applicable driver in separate patches. I reworded some comments there were also two memory leaks that I fixed and I will resend the series later to see if we have everything in order. If we disagree on the naming, then we can change it until the release, but I do not think that is something that should stall the patches. Thanks.
BR, Luyao Luyao Zhong (3): docs: add docs for 'restrictive' option for mode in numatune schema: add 'restrictive' config option for mode in numatune qemu: add parser and formatter for 'restrictive' mode in numatune docs/formatdomain.rst | 7 +++- docs/schemas/domaincommon.rng | 2 + include/libvirt/libvirt-domain.h | 1 + src/conf/numa_conf.c | 9 ++++ src/qemu/qemu_command.c | 6 ++- src/qemu/qemu_process.c | 27 ++++++++++++ src/util/virnuma.c | 3 ++ .../numatune-memnode-invalid-mode.err | 1 + .../numatune-memnode-invalid-mode.xml | 33 +++++++++++++++ ...emnode-restrictive-mode.x86_64-latest.args | 38 +++++++++++++++++ .../numatune-memnode-restrictive-mode.xml | 33 +++++++++++++++ tests/qemuxml2argvtest.c | 2 + ...memnode-restrictive-mode.x86_64-latest.xml | 41 +++++++++++++++++++ tests/qemuxml2xmltest.c | 1 + 14 files changed, 201 insertions(+), 3 deletions(-) create mode 100644 tests/qemuxml2argvdata/numatune-memnode-invalid-mode.err create mode 100644 tests/qemuxml2argvdata/numatune-memnode-invalid-mode.xml create mode 100644 tests/qemuxml2argvdata/numatune-memnode-restrictive-mode.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/numatune-memnode-restrictive-mode.xml create mode 100644 tests/qemuxml2xmloutdata/numatune-memnode-restrictive-mode.x86_64-latest.xml -- 2.25.4
Attachment:
signature.asc
Description: PGP signature