On 11/30/20 11:21 AM, Peter Krempa wrote:
On Mon, Nov 30, 2020 at 11:06:14 +0100, Michal Privoznik wrote:
Currently, we configure QEMU to prealloc memory almost by
default. Well, by default for NVDIMMs, hugepages and if user
asked us to (via memoryBacking <allocation mode="immediate"/>).
However, there are two cases where this approach is not the best:
1) in case when guest's NVDIMM is backed by real life NVDIMM. In
this case users should put <pmem/> into the <memory/> device
<source/>, like this:
<memory model='nvdimm' access='shared'>
<source>
<path>/dev/pmem0</path>
<pmem/>
</source>
</memory>
Instructing QEMU to do prealloc in this case means that each
page of the NVDIMM is "touched" (the first byte is read and
written back - see QEMU commit v2.9.0-rc1~26^2) which cripples
device wear.
2) if free-page-reporting is turned on. While the
free-page-reporting feature might not have a catchy or obvious
name, when enabled it instructs KVM and subsequently QEMU to
free pages no longer used by guest resulting in smaller memory
footprint. And preallocating whole memory goes against this.
The BZ comment 11 mentions another, third case 'virtio-mem' but
that is not implemented in libvirt, yet.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1894053
Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx>
---
src/qemu/qemu_command.c | 11 +++++++++--
.../memory-hotplug-nvdimm-pmem.x86_64-latest.args | 2 +-
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 479bcc0b0c..3df8b5ac76 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -2977,7 +2977,11 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps,
if (discard == VIR_TRISTATE_BOOL_ABSENT)
discard = def->mem.discard;
- if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE)
+ /* The whole point of free_page_reporting is that as soon as guest frees
+ * any memory it is freed in the host too. Prealloc doesn't make much sense
+ * then. */
+ if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE &&
+ def->memballoon->free_page_reporting != VIR_TRISTATE_SWITCH_ON)
prealloc = true;
IIUC def->mem.allocation is an explicit user-specified configuration, in
such case we should not override the user wish based on a different
configuration.
If they don't make sense together technically, we should reject the
config rather than silently changing it. If it's just semantically wrong
and pre-existing we may leave it be.
Additionally the patch is missing a test case for this scenario.
Yeah, Dan already pointed out that this is not really desired. So I will
leave this hunk out. But to address your point - would it make sense to
add a valiador check? I mean, something like:
if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE &&
def->memballoon->free_page_reporting != VIR_TRISTATE_SWITCH_ON) {
virReportError("this combination doesn't make much sense");
return -1;
}
Technically, it is possible to fully allocate memory on domain startup
and then leave QEMU to free pages (which happens as soon virtio_balloon
module is loaded), but IMO it doesn't make much sense. Semantically, at
least to me these two options are mutually exclusive.
Michal