On Wed, Jul 11, 2018 at 06:03:08PM +0200, Pavel Hrdina wrote: > To make it clear I'll summarize all the possible combinations and how it > should work so we are on the same page. originally: before commit [1] now: after commit [1] (current master) expect: what this patch series should fix ======= Non-NUMA guests ======= * Only one hugepage specified without any nodeset <memoryBacking> <hugepages> <page size='1048576' unit='KiB'/> </hugepages> </memoryBacking> This is correct, was always working and we should not change it. originally: works now: works expected: works * Only one hugapage specified with nodeset <memoryBacking> <hugepages> <page size='1048576' unit='KiB' nodeset='0'/> </hugepages> </memoryBacking> This is questionable since there is no guest NUMA node specified, but on the other hand we can consider non-NUMA guest to have exactly one NUMA node. This was working but has been broken by commit [1] which tried to fix a case where you are trying to specify non-existing NUMA node. Because of that assumption we can consider this as valid XML even though there is no NUMA node specified [2]. There are two possible solutions: - we can leave the XML intact - we can silently remove the 'nodeset' attribute to 'fix' the XML definition originally: works now: fails expect: works <memoryBacking> <hugepages> <page size='1048576' unit='KiB' nodeset='1'/> </hugepages> </memoryBacking> If the nodeset is != 0 it should newer work becuase there is no guest NUMA topology and even if we take into account the assumption that there is always one NUMA node it is still invalid. originally: works now: fails expect: fails * One hugepage with specific nodeset and second default hugepage <memoryBacking> <hugepages> <page size='1048576' unit='KiB' nodeset='0'/> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> This was working but was 'fixed' by commit [1] because the code checks only the first hugepage and uses only the first hugepage. It should never worked because it doesn't make any sense, there is no guest NUMA node configured and even if we take into account the assumption that non-NUMA guest has one NUMA node there is need for the default page size. originally: works now: fails expect: fails There is yet another issue with the current state in libvirt, if you swap the order of pages: <memoryBacking> <hugepages> <page size='2048' unit='KiB'/> <page size='1048576' unit='KiB' nodeset='0'/> </hugepages> </memoryBacking> it will work even with current libvirt with the fix [1]. The reason is that code in qemuBuildMemPathStr() function takes into account only the first page size so it depends on the order of elements which is wrong. We should not allow any of these two configurations. Setting nodeset to != 0 will should not make any difference. originally: works now: works expect: fails ====== NUMA guest ====== * Only one hugepage specified without any nodeset <memoryBacking> <hugepages> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> ... <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='4' threads='2'/> <numa> <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/> <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/> </numa> </cpu> originally: works now: works expect: works * Only one hugapage specified with nodeset <memoryBacking> <hugepages> <page size='2048' unit='KiB' nodeset='0'/> </hugepages> </memoryBacking> ... <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='4' threads='2'/> <numa> <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/> <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/> </numa> </cpu> All possible combinations for the nodeset attribute are allowed if it always corresponds to existing guest NUMA node: <page size='2048' unit='KiB' nodeset='1'/> or <page size='2048' unit='KiB' nodeset='0,1'/> originally: works now: works expect: works <memoryBacking> <hugepages> <page size='2048' unit='KiB' nodeset='2'/> </hugepages> </memoryBacking> ... <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='4' threads='2'/> <numa> <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/> <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/> </numa> </cpu> There is invalid guest NUMA node specified for the hugepage. originally: fails now: fails expect: fails * One hugepage with specific nodeset and second default hugepage <memoryBacking> <hugepages> <page size='1048576' unit='KiB' nodeset='0'/> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> ... <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='4' threads='2'/> <numa> <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/> <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/> </numa> </cpu> There are two guest NUMA nodes where we have default hugepage size configured and for NUMA node '0' we have a different size. originally: works now: works expect: works <memoryBacking> <hugepages> <page size='1048576' unit='KiB' nodeset='0,1'/> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> ... <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='4' threads='2'/> <numa> <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/> <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/> </numa> </cpu> originally: works now: works expect: fails ??? In this situation all the guest NUMA nodes are covered by the hugepage size with specified nodeset attribute. The default one is not used at all so is pointless here. The difference between this and non-NUMA guest is that if we change the order it will still work as expected, it doesn't depend on the order of elements. However, we might consider is as invalid configuration because there is no need to have the default page size configured at all. * Multiple combination of default and specific hugepage sizes There are some restriction if we use multiple page sizes: - There cannot be two different default hugepage sizes - Two different page elements cannot have the same guest NUMA node specified in nodeset attribute - hugepages are not allowed if memory backing allocation is set to 'ondemand' (not documented) - hugepages are not allowed if memory backing source is set to 'anonymous' (not documented) I hope that there is no other configuration that we need to care about. Pavel [1] <fa6bdf6afa878b8d7c5ed71664ee72be8967cdc5> [2] <https://bugzilla.redhat.com/show_bug.cgi?id=1591235>
Attachment:
signature.asc
Description: PGP signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list