On Mon, Nov 30, 2020 at 11:48:28AM +0100, Michal Privoznik wrote: > On 11/30/20 11:16 AM, Daniel P. Berrangé wrote: > > On Mon, Nov 30, 2020 at 11:06:14AM +0100, Michal Privoznik wrote: > > > Currently, we configure QEMU to prealloc memory almost by > > > default. Well, by default for NVDIMMs, hugepages and if user > > > asked us to (via memoryBacking <allocation mode="immediate"/>). > > > > > > However, there are two cases where this approach is not the best: > > > > > > 1) in case when guest's NVDIMM is backed by real life NVDIMM. In > > > this case users should put <pmem/> into the <memory/> device > > > <source/>, like this: > > > > > > <memory model='nvdimm' access='shared'> > > > <source> > > > <path>/dev/pmem0</path> > > > <pmem/> > > > </source> > > > </memory> > > > > > > Instructing QEMU to do prealloc in this case means that each > > > page of the NVDIMM is "touched" (the first byte is read and > > > written back - see QEMU commit v2.9.0-rc1~26^2) which cripples > > > device wear. > > > > > > 2) if free-page-reporting is turned on. While the > > > free-page-reporting feature might not have a catchy or obvious > > > name, when enabled it instructs KVM and subsequently QEMU to > > > free pages no longer used by guest resulting in smaller memory > > > footprint. And preallocating whole memory goes against this. > > > > > > The BZ comment 11 mentions another, third case 'virtio-mem' but > > > that is not implemented in libvirt, yet. > > > > > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1894053 > > > Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> > > > --- > > > src/qemu/qemu_command.c | 11 +++++++++-- > > > .../memory-hotplug-nvdimm-pmem.x86_64-latest.args | 2 +- > > > 2 files changed, 10 insertions(+), 3 deletions(-) > > > > > > diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c > > > index 479bcc0b0c..3df8b5ac76 100644 > > > --- a/src/qemu/qemu_command.c > > > +++ b/src/qemu/qemu_command.c > > > @@ -2977,7 +2977,11 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, > > > if (discard == VIR_TRISTATE_BOOL_ABSENT) > > > discard = def->mem.discard; > > > - if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE) > > > + /* The whole point of free_page_reporting is that as soon as guest frees > > > + * any memory it is freed in the host too. Prealloc doesn't make much sense > > > + * then. */ > > > + if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE && > > > + def->memballoon->free_page_reporting != VIR_TRISTATE_SWITCH_ON) > > > prealloc = true; > > > > If the user asked for allocation == immediate, we should not be > > silently ignoring that request. Isn't the scenario described simply > > a wierd user configuration scenario and if they don't want that, then > > then they can set <allocation mode="ondemand"/> instead. > > Okay. > > > > > > if (virDomainNumatuneGetMode(def->numa, mem->targetNode, &mode) < 0 && > > > @@ -3064,7 +3068,10 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, > > > if (mem->nvdimmPath) { > > > memPath = g_strdup(mem->nvdimmPath); > > > - prealloc = true; > > > > > > > > > + /* If the NVDIMM is a real device then there's nothing to prealloc. > > > + * If anyhing, we would be only wearing off the device. */ > > > + if (!mem->nvdimmPmem) > > > + prealloc = true; > > > > I wonder if QEMU itself should take this optimization to skip its > > allocation logic ? > > Also would make sense. This is that kind of bug which lies in between > libvirt and qemu. Although, since we are worried in silently ignoring user > requests, then wouldn't this be exactly what QEMU would be doing? I mean, if > an user/libvirt put both .prealloc=yes and .pmem=yes onto cmd line then > these would cancel out, wouldn't they? The difference is that an real NVDIMM is inherantly preallocated. QEMU would not be ignoring the prealloc=yes arg - its implementation would merely be a no-op. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|