Re: [PATCH v2 1/2] domain_validate: Account for NVDIMM label size properly when checking for memory conflicts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 21, 2024 at 05:40:05PM +0100, Michal Prívozník wrote:
On 2/20/24 17:29, Martin Kletzander wrote:
On Tue, Feb 20, 2024 at 04:53:29PM +0100, Michal Privoznik wrote:
As of v9.8.0-rc1~7 we check whether two <memory/> devices don't
overlap (since we allow setting where a <memory/> device should
be mapped to). We do this pretty straightforward, by comparing
start and end address of each <memory/> device combination.
But since only the start address is given (an exposed in the
XML), the end address is computed trivially as:

 start + mem->size * 1024

And for majority of memory device types this works. Except for
NVDIMMs. For them the <memory/> device consists of two separate
regions: 1) actual memory device, and 2) label.

Label is where NVDIMM stores some additional information like
namespaces partition and so on. But it's not mapped into the
guest the same way as actual memory device. In fact, mem->size is
a sum of both actual memory device and label sizes. And to make
things a bit worse, both sizes are subject to alignment (either
the alignsize value specified in XML, or system page size if not
specified in XML).

Therefore, to get the size of actual memory device we need to
take mem->size and substract label size rounded up to alignment.

If we don't do this we report there's an overlap between two
NVDIMMs even when in reality there's none.

Fixes: 3fd64fb0e236fc80ffa2cc977c0d471f11fc39bf
Fixes: 91f9a9fb4fc0d34ed8d7a869de3d9f87687c3618
Resolves:
https://issues.redhat.com/browse/RHEL-4452?focusedId=23805174#comment-23805174
Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx>
---
src/conf/domain_validate.c | 50 ++++++++++++++++++++++++++++++++++++--
1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c
index 46479f10f2..5a9398b545 100644
--- a/src/conf/domain_validate.c
+++ b/src/conf/domain_validate.c
@@ -2225,6 +2225,52 @@ virDomainHostdevDefValidate(const
virDomainHostdevDef *hostdev)
}


+/**
+ * virDomainMemoryGetMappedSize:
+ * @mem: memory device definition
+ *
+ * For given memory device definition (@mem) calculate size mapped into
+ * the guest. This is usually mem->size, except for NVDIMM where its
+ * label is mapped elsewhere.
+ *
+ * Returns: Number of bytes a memory device takes when mapped into a
+ * guest.
+ */
+static unsigned long long
+virDomainMemoryGetMappedSize(const virDomainMemoryDef *mem)
+{
+    unsigned long long ret = mem->size;
+
+    if (mem->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) {
+        unsigned long long alignsize = mem->source.nvdimm.alignsize;
+        unsigned long long labelsize = 0;
+
+        /* For NVDIMM the situation is a bit more complicated. Firstly,
+         * its <label/> is not mapped as a part of memory device, so we
+         * must subtract label size from NVDIMM size. Secondly, label is
+         * also subject to alignment so we need to round it up before
+         * subtraction. */
+
+        if (alignsize == 0) {
+            long pagesize = virGetSystemPageSizeKB();
+
+            /* If no alignment is specified in the XML, fallback to
+             * system page size alignment. */
+            if (pagesize > 0)
+                alignsize = pagesize;

I'm not very well versed in memory modules and architectures, but isn't
this restricted a bit more, or rather isn't the QEMU default slightly
different?  The following two functions suggest it might be:

qemuDomainGetMemorySizeAlignment

qemuDomainGetMemoryModuleSizeAlignment

Looking into QEMU sources, this is independent. I'm looking at
nvdimm_prepare_memory_region() declared at hw/mem/nvdimm.c in which I
can find:


   if (!dimm->hostmem) {
       error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property must be set");
       return;
   }

   mr = host_memory_backend_get_memory(dimm->hostmem);
   align = memory_region_get_alignment(mr);
   size = memory_region_size(mr);

   pmem_size = size - nvdimm->label_size;
   nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size;
   pmem_size = QEMU_ALIGN_DOWN(pmem_size, align);



In here, dimm->hostmem points to corresponding memory-backend-file
object with NVDIMM path (/memory/source/path in our XML). IOW, any huge
pages setting made to domain is independent of this. Then,
memory_region_get_alignment() gets alignment of that memory region,
which (looking at file_backend_memory_alloc()) corresponds to .align
attribute of the aforementioned memory-backend-file object => it
corresponds to /memory/source/alignsize in our XML or
mem->source.nvdimm.alignsize in our code. Finally, size reflects memory
size (.size attribute => /memory/target/size => mem->size).


I was more afraid about the QEMU vs. libvirt default of the alignment if
left unspecified.  But since that would only apply to niche scenarios
and this patch makes the code strictly better, so with the below diff
squashed in

Reviewed-by: Martin Kletzander <mkletzan@xxxxxxxxxx>

Long story short, my comment "label is also subject to alignment" is
misleading, as it is not. The remaining memory (after label_size was
substracted) is then aligned down.

Consider the following squashed into my commit:


diff --git i/src/conf/domain_validate.c w/src/conf/domain_validate.c
index 5a9398b545..faa7659f07 100644
--- i/src/conf/domain_validate.c
+++ w/src/conf/domain_validate.c
@@ -2247,9 +2247,10 @@ virDomainMemoryGetMappedSize(const virDomainMemoryDef *mem)

        /* For NVDIMM the situation is a bit more complicated. Firstly,
         * its <label/> is not mapped as a part of memory device, so we
-         * must subtract label size from NVDIMM size. Secondly, label is
-         * also subject to alignment so we need to round it up before
-         * subtraction. */
+         * must subtract label size from NVDIMM size. Secondly,
+         * remaining memory is then aligned again (rounded down). But
+         * for our purposes we might just round label size up and
+         * achieve the same (numeric) result. */

        if (alignsize == 0) {
            long pagesize = virGetSystemPageSizeKB();

Nice catch!

Michal

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Devel mailing list -- devel@xxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxx

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux