On 2/20/24 17:29, Martin Kletzander wrote: > On Tue, Feb 20, 2024 at 04:53:29PM +0100, Michal Privoznik wrote: >> As of v9.8.0-rc1~7 we check whether two <memory/> devices don't >> overlap (since we allow setting where a <memory/> device should >> be mapped to). We do this pretty straightforward, by comparing >> start and end address of each <memory/> device combination. >> But since only the start address is given (an exposed in the >> XML), the end address is computed trivially as: >> >> start + mem->size * 1024 >> >> And for majority of memory device types this works. Except for >> NVDIMMs. For them the <memory/> device consists of two separate >> regions: 1) actual memory device, and 2) label. >> >> Label is where NVDIMM stores some additional information like >> namespaces partition and so on. But it's not mapped into the >> guest the same way as actual memory device. In fact, mem->size is >> a sum of both actual memory device and label sizes. And to make >> things a bit worse, both sizes are subject to alignment (either >> the alignsize value specified in XML, or system page size if not >> specified in XML). >> >> Therefore, to get the size of actual memory device we need to >> take mem->size and substract label size rounded up to alignment. >> >> If we don't do this we report there's an overlap between two >> NVDIMMs even when in reality there's none. >> >> Fixes: 3fd64fb0e236fc80ffa2cc977c0d471f11fc39bf >> Fixes: 91f9a9fb4fc0d34ed8d7a869de3d9f87687c3618 >> Resolves: >> https://issues.redhat.com/browse/RHEL-4452?focusedId=23805174#comment-23805174 >> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> >> --- >> src/conf/domain_validate.c | 50 ++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 48 insertions(+), 2 deletions(-) >> >> diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c >> index 46479f10f2..5a9398b545 100644 >> --- a/src/conf/domain_validate.c >> +++ b/src/conf/domain_validate.c >> @@ -2225,6 +2225,52 @@ virDomainHostdevDefValidate(const >> virDomainHostdevDef *hostdev) >> } >> >> >> +/** >> + * virDomainMemoryGetMappedSize: >> + * @mem: memory device definition >> + * >> + * For given memory device definition (@mem) calculate size mapped into >> + * the guest. This is usually mem->size, except for NVDIMM where its >> + * label is mapped elsewhere. >> + * >> + * Returns: Number of bytes a memory device takes when mapped into a >> + * guest. >> + */ >> +static unsigned long long >> +virDomainMemoryGetMappedSize(const virDomainMemoryDef *mem) >> +{ >> + unsigned long long ret = mem->size; >> + >> + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { >> + unsigned long long alignsize = mem->source.nvdimm.alignsize; >> + unsigned long long labelsize = 0; >> + >> + /* For NVDIMM the situation is a bit more complicated. Firstly, >> + * its <label/> is not mapped as a part of memory device, so we >> + * must subtract label size from NVDIMM size. Secondly, label is >> + * also subject to alignment so we need to round it up before >> + * subtraction. */ >> + >> + if (alignsize == 0) { >> + long pagesize = virGetSystemPageSizeKB(); >> + >> + /* If no alignment is specified in the XML, fallback to >> + * system page size alignment. */ >> + if (pagesize > 0) >> + alignsize = pagesize; > > I'm not very well versed in memory modules and architectures, but isn't > this restricted a bit more, or rather isn't the QEMU default slightly > different? The following two functions suggest it might be: > > qemuDomainGetMemorySizeAlignment > > qemuDomainGetMemoryModuleSizeAlignment Looking into QEMU sources, this is independent. I'm looking at nvdimm_prepare_memory_region() declared at hw/mem/nvdimm.c in which I can find: if (!dimm->hostmem) { error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property must be set"); return; } mr = host_memory_backend_get_memory(dimm->hostmem); align = memory_region_get_alignment(mr); size = memory_region_size(mr); pmem_size = size - nvdimm->label_size; nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size; pmem_size = QEMU_ALIGN_DOWN(pmem_size, align); In here, dimm->hostmem points to corresponding memory-backend-file object with NVDIMM path (/memory/source/path in our XML). IOW, any huge pages setting made to domain is independent of this. Then, memory_region_get_alignment() gets alignment of that memory region, which (looking at file_backend_memory_alloc()) corresponds to .align attribute of the aforementioned memory-backend-file object => it corresponds to /memory/source/alignsize in our XML or mem->source.nvdimm.alignsize in our code. Finally, size reflects memory size (.size attribute => /memory/target/size => mem->size). Long story short, my comment "label is also subject to alignment" is misleading, as it is not. The remaining memory (after label_size was substracted) is then aligned down. Consider the following squashed into my commit: diff --git i/src/conf/domain_validate.c w/src/conf/domain_validate.c index 5a9398b545..faa7659f07 100644 --- i/src/conf/domain_validate.c +++ w/src/conf/domain_validate.c @@ -2247,9 +2247,10 @@ virDomainMemoryGetMappedSize(const virDomainMemoryDef *mem) /* For NVDIMM the situation is a bit more complicated. Firstly, * its <label/> is not mapped as a part of memory device, so we - * must subtract label size from NVDIMM size. Secondly, label is - * also subject to alignment so we need to round it up before - * subtraction. */ + * must subtract label size from NVDIMM size. Secondly, + * remaining memory is then aligned again (rounded down). But + * for our purposes we might just round label size up and + * achieve the same (numeric) result. */ if (alignsize == 0) { long pagesize = virGetSystemPageSizeKB(); Nice catch! Michal _______________________________________________ Devel mailing list -- devel@xxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxx