Re: [PATCH v2 1/2] domain_validate: Account for NVDIMM label size properly when checking for memory conflicts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/20/24 17:29, Martin Kletzander wrote:
> On Tue, Feb 20, 2024 at 04:53:29PM +0100, Michal Privoznik wrote:
>> As of v9.8.0-rc1~7 we check whether two <memory/> devices don't
>> overlap (since we allow setting where a <memory/> device should
>> be mapped to). We do this pretty straightforward, by comparing
>> start and end address of each <memory/> device combination.
>> But since only the start address is given (an exposed in the
>> XML), the end address is computed trivially as:
>>
>>  start + mem->size * 1024
>>
>> And for majority of memory device types this works. Except for
>> NVDIMMs. For them the <memory/> device consists of two separate
>> regions: 1) actual memory device, and 2) label.
>>
>> Label is where NVDIMM stores some additional information like
>> namespaces partition and so on. But it's not mapped into the
>> guest the same way as actual memory device. In fact, mem->size is
>> a sum of both actual memory device and label sizes. And to make
>> things a bit worse, both sizes are subject to alignment (either
>> the alignsize value specified in XML, or system page size if not
>> specified in XML).
>>
>> Therefore, to get the size of actual memory device we need to
>> take mem->size and substract label size rounded up to alignment.
>>
>> If we don't do this we report there's an overlap between two
>> NVDIMMs even when in reality there's none.
>>
>> Fixes: 3fd64fb0e236fc80ffa2cc977c0d471f11fc39bf
>> Fixes: 91f9a9fb4fc0d34ed8d7a869de3d9f87687c3618
>> Resolves:
>> https://issues.redhat.com/browse/RHEL-4452?focusedId=23805174#comment-23805174
>> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx>
>> ---
>> src/conf/domain_validate.c | 50 ++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 48 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c
>> index 46479f10f2..5a9398b545 100644
>> --- a/src/conf/domain_validate.c
>> +++ b/src/conf/domain_validate.c
>> @@ -2225,6 +2225,52 @@ virDomainHostdevDefValidate(const
>> virDomainHostdevDef *hostdev)
>> }
>>
>>
>> +/**
>> + * virDomainMemoryGetMappedSize:
>> + * @mem: memory device definition
>> + *
>> + * For given memory device definition (@mem) calculate size mapped into
>> + * the guest. This is usually mem->size, except for NVDIMM where its
>> + * label is mapped elsewhere.
>> + *
>> + * Returns: Number of bytes a memory device takes when mapped into a
>> + * guest.
>> + */
>> +static unsigned long long
>> +virDomainMemoryGetMappedSize(const virDomainMemoryDef *mem)
>> +{
>> +    unsigned long long ret = mem->size;
>> +
>> +    if (mem->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) {
>> +        unsigned long long alignsize = mem->source.nvdimm.alignsize;
>> +        unsigned long long labelsize = 0;
>> +
>> +        /* For NVDIMM the situation is a bit more complicated. Firstly,
>> +         * its <label/> is not mapped as a part of memory device, so we
>> +         * must subtract label size from NVDIMM size. Secondly, label is
>> +         * also subject to alignment so we need to round it up before
>> +         * subtraction. */
>> +
>> +        if (alignsize == 0) {
>> +            long pagesize = virGetSystemPageSizeKB();
>> +
>> +            /* If no alignment is specified in the XML, fallback to
>> +             * system page size alignment. */
>> +            if (pagesize > 0)
>> +                alignsize = pagesize;
> 
> I'm not very well versed in memory modules and architectures, but isn't
> this restricted a bit more, or rather isn't the QEMU default slightly
> different?  The following two functions suggest it might be:
> 
> qemuDomainGetMemorySizeAlignment
> 
> qemuDomainGetMemoryModuleSizeAlignment

Looking into QEMU sources, this is independent. I'm looking at
nvdimm_prepare_memory_region() declared at hw/mem/nvdimm.c in which I
can find:


    if (!dimm->hostmem) {
        error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property must be set");
        return;
    }

    mr = host_memory_backend_get_memory(dimm->hostmem);
    align = memory_region_get_alignment(mr);
    size = memory_region_size(mr);

    pmem_size = size - nvdimm->label_size;
    nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size;
    pmem_size = QEMU_ALIGN_DOWN(pmem_size, align);



In here, dimm->hostmem points to corresponding memory-backend-file
object with NVDIMM path (/memory/source/path in our XML). IOW, any huge
pages setting made to domain is independent of this. Then,
memory_region_get_alignment() gets alignment of that memory region,
which (looking at file_backend_memory_alloc()) corresponds to .align
attribute of the aforementioned memory-backend-file object => it
corresponds to /memory/source/alignsize in our XML or
mem->source.nvdimm.alignsize in our code. Finally, size reflects memory
size (.size attribute => /memory/target/size => mem->size).

Long story short, my comment "label is also subject to alignment" is
misleading, as it is not. The remaining memory (after label_size was
substracted) is then aligned down.

Consider the following squashed into my commit:


diff --git i/src/conf/domain_validate.c w/src/conf/domain_validate.c
index 5a9398b545..faa7659f07 100644
--- i/src/conf/domain_validate.c
+++ w/src/conf/domain_validate.c
@@ -2247,9 +2247,10 @@ virDomainMemoryGetMappedSize(const virDomainMemoryDef *mem)
 
         /* For NVDIMM the situation is a bit more complicated. Firstly,
          * its <label/> is not mapped as a part of memory device, so we
-         * must subtract label size from NVDIMM size. Secondly, label is
-         * also subject to alignment so we need to round it up before
-         * subtraction. */
+         * must subtract label size from NVDIMM size. Secondly,
+         * remaining memory is then aligned again (rounded down). But
+         * for our purposes we might just round label size up and
+         * achieve the same (numeric) result. */
 
         if (alignsize == 0) {
             long pagesize = virGetSystemPageSizeKB();

Nice catch!

Michal
_______________________________________________
Devel mailing list -- devel@xxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxx




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux