Re: [PATCH v1 3/4] system/physmem: Largepage punch hole before reset of memory pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/23/24 09:30, David Hildenbrand wrote:

On 22.10.24 23:35, “William Roche wrote:
From: William Roche <william.roche@xxxxxxxxxx>

When the VM reboots, a memory reset is performed calling
qemu_ram_remap() on all hwpoisoned pages.
While we take into account the recorded page sizes to repair the
memory locations, a large page also needs to punch a hole in the
backend file to regenerate a usable memory, cleaning the HW
poisoned section. This is mandatory for hugetlbfs case for example.

Signed-off-by: William Roche <william.roche@xxxxxxxxxx>
---
  system/physmem.c | 8 ++++++++
  1 file changed, 8 insertions(+)

diff --git a/system/physmem.c b/system/physmem.c
index 3757428336..3f6024a92d 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2211,6 +2211,14 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
                  prot = PROT_READ;
                  prot |= block->flags & RAM_READONLY ? 0 : PROT_WRITE;
                  if (block->fd >= 0) {
+                    if (length > TARGET_PAGE_SIZE && fallocate(block->fd,
+                        FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
+                        offset + block->fd_offset, length) != 0) {
+                        error_report("Could not recreate the file hole for " +                                     "addr: " RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
+                                     length, addr);
+                        exit(1);
+                    }
                      area = mmap(vaddr, length, prot, flags, block->fd,
                                  offset + block->fd_offset);
                  } else {

Ah! Just what I commented to patch #3; we should be using ram_discard_range(). It might be better to avoid the mmap() completely if ram_discard_range() worked.


I think you are referring to ram_block_discard_range() here, as ram_discard_range() seems to relate to VM migrations, maybe not a VM reset.

Remapping the page is needed to get rid of the poison. So if we want to avoid the mmap(), we have to shrink the memory address space -- which can be a real problem if we imagine a VM with 1G large pages for example. qemu_ram_remap() is used to regenerate the lost memory and the mmap() call looks mandatory on the reset phase.



And as raised, there is the problem with memory preallocation (where we should fail if it doesn't work) and ram discards being disabled because something relies on long-term page pinning ...


Yes. Do you suggest that we add a call to qemu_prealloc_mem() for the remapped area in case of a backend->prealloc being true ?

Or as we are running on posix machines for this piece of code (ifndef _WIN32) maybe we could simply add a MAP_POPULATE flag to the mmap call done in qemu_ram_remap() in the case where the backend requires a 'prealloc' ?  Can you confirm if this flag could be used on all systems running this code ?

Unfortunately, I don't know how to get the MEMORY_BACKEND corresponding to a given memory block. I'm not sure that MEMORY_BACKEND(block->mr) is a valid way to retrieve the Backend object and its 'prealloc' property here. Could you please give me a direction here ?

I can send a new version using ram_block_discard_range() as you suggested to replace the direct call to fallocate(), if you think it would be better. Please let me know what other enhancement(s) you'd like to see in this code change.

Thanks in advance,
William.





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux