On Wed, Feb 21, 2024 at 05:40:05PM +0100, Michal Prívozník wrote:
On 2/20/24 17:29, Martin Kletzander wrote:On Tue, Feb 20, 2024 at 04:53:29PM +0100, Michal Privoznik wrote:As of v9.8.0-rc1~7 we check whether two <memory/> devices don't overlap (since we allow setting where a <memory/> device should be mapped to). We do this pretty straightforward, by comparing start and end address of each <memory/> device combination. But since only the start address is given (an exposed in the XML), the end address is computed trivially as: start + mem->size * 1024 And for majority of memory device types this works. Except for NVDIMMs. For them the <memory/> device consists of two separate regions: 1) actual memory device, and 2) label. Label is where NVDIMM stores some additional information like namespaces partition and so on. But it's not mapped into the guest the same way as actual memory device. In fact, mem->size is a sum of both actual memory device and label sizes. And to make things a bit worse, both sizes are subject to alignment (either the alignsize value specified in XML, or system page size if not specified in XML). Therefore, to get the size of actual memory device we need to take mem->size and substract label size rounded up to alignment. If we don't do this we report there's an overlap between two NVDIMMs even when in reality there's none. Fixes: 3fd64fb0e236fc80ffa2cc977c0d471f11fc39bf Fixes: 91f9a9fb4fc0d34ed8d7a869de3d9f87687c3618 Resolves: https://issues.redhat.com/browse/RHEL-4452?focusedId=23805174#comment-23805174 Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> --- src/conf/domain_validate.c | 50 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index 46479f10f2..5a9398b545 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -2225,6 +2225,52 @@ virDomainHostdevDefValidate(const virDomainHostdevDef *hostdev) } +/** + * virDomainMemoryGetMappedSize: + * @mem: memory device definition + * + * For given memory device definition (@mem) calculate size mapped into + * the guest. This is usually mem->size, except for NVDIMM where its + * label is mapped elsewhere. + * + * Returns: Number of bytes a memory device takes when mapped into a + * guest. + */ +static unsigned long long +virDomainMemoryGetMappedSize(const virDomainMemoryDef *mem) +{ + unsigned long long ret = mem->size; + + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { + unsigned long long alignsize = mem->source.nvdimm.alignsize; + unsigned long long labelsize = 0; + + /* For NVDIMM the situation is a bit more complicated. Firstly, + * its <label/> is not mapped as a part of memory device, so we + * must subtract label size from NVDIMM size. Secondly, label is + * also subject to alignment so we need to round it up before + * subtraction. */ + + if (alignsize == 0) { + long pagesize = virGetSystemPageSizeKB(); + + /* If no alignment is specified in the XML, fallback to + * system page size alignment. */ + if (pagesize > 0) + alignsize = pagesize;I'm not very well versed in memory modules and architectures, but isn't this restricted a bit more, or rather isn't the QEMU default slightly different? The following two functions suggest it might be: qemuDomainGetMemorySizeAlignment qemuDomainGetMemoryModuleSizeAlignmentLooking into QEMU sources, this is independent. I'm looking at nvdimm_prepare_memory_region() declared at hw/mem/nvdimm.c in which I can find: if (!dimm->hostmem) { error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property must be set"); return; } mr = host_memory_backend_get_memory(dimm->hostmem); align = memory_region_get_alignment(mr); size = memory_region_size(mr); pmem_size = size - nvdimm->label_size; nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size; pmem_size = QEMU_ALIGN_DOWN(pmem_size, align); In here, dimm->hostmem points to corresponding memory-backend-file object with NVDIMM path (/memory/source/path in our XML). IOW, any huge pages setting made to domain is independent of this. Then, memory_region_get_alignment() gets alignment of that memory region, which (looking at file_backend_memory_alloc()) corresponds to .align attribute of the aforementioned memory-backend-file object => it corresponds to /memory/source/alignsize in our XML or mem->source.nvdimm.alignsize in our code. Finally, size reflects memory size (.size attribute => /memory/target/size => mem->size).
I was more afraid about the QEMU vs. libvirt default of the alignment if left unspecified. But since that would only apply to niche scenarios and this patch makes the code strictly better, so with the below diff squashed in Reviewed-by: Martin Kletzander <mkletzan@xxxxxxxxxx>
Long story short, my comment "label is also subject to alignment" is misleading, as it is not. The remaining memory (after label_size was substracted) is then aligned down. Consider the following squashed into my commit: diff --git i/src/conf/domain_validate.c w/src/conf/domain_validate.c index 5a9398b545..faa7659f07 100644 --- i/src/conf/domain_validate.c +++ w/src/conf/domain_validate.c @@ -2247,9 +2247,10 @@ virDomainMemoryGetMappedSize(const virDomainMemoryDef *mem) /* For NVDIMM the situation is a bit more complicated. Firstly, * its <label/> is not mapped as a part of memory device, so we - * must subtract label size from NVDIMM size. Secondly, label is - * also subject to alignment so we need to round it up before - * subtraction. */ + * must subtract label size from NVDIMM size. Secondly, + * remaining memory is then aligned again (rounded down). But + * for our purposes we might just round label size up and + * achieve the same (numeric) result. */ if (alignsize == 0) { long pagesize = virGetSystemPageSizeKB(); Nice catch! Michal
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Devel mailing list -- devel@xxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxx