Re: [PATCH v2 2/7] system/physmem: poisoned memory discard on reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/12/24 12:07, David Hildenbrand wrote:
On 07.11.24 11:21, “William Roche wrote:
From: William Roche <william.roche@xxxxxxxxxx>

We take into account the recorded page sizes to repair the
memory locations, calling ram_block_discard_range() to punch a hole
in the backend file when necessary and regenerate a usable memory.
Fall back to unmap/remap the memory location(s) if the kernel doesn't
support the madvise calls used by ram_block_discard_range().

Hugetlbfs poison case is also taken into account as a hole punch
with fallocate will reload a new page when first touched.

Signed-off-by: William Roche <william.roche@xxxxxxxxxx>
---
  system/physmem.c | 50 +++++++++++++++++++++++++++++-------------------
  1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index 750604d47d..dfea120cc5 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2197,27 +2197,37 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
              } else if (xen_enabled()) {
                  abort();
              } else {
-                flags = MAP_FIXED;
-                flags |= block->flags & RAM_SHARED ?
-                         MAP_SHARED : MAP_PRIVATE;
-                flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;
-                prot = PROT_READ;
-                prot |= block->flags & RAM_READONLY ? 0 : PROT_WRITE;
-                if (block->fd >= 0) {
-                    area = mmap(vaddr, length, prot, flags, block->fd,
-                                offset + block->fd_offset);
-                } else {
-                    flags |= MAP_ANONYMOUS;
-                    area = mmap(vaddr, length, prot, flags, -1, 0);
-                }
-                if (area != vaddr) {
-                    error_report("Could not remap addr: "
-                                 RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
-                                 length, addr);
-                    exit(1);
+                if (ram_block_discard_range(block, offset + block- >fd_offset,
+                                            length) != 0) {
+                    if (length > TARGET_PAGE_SIZE) {
+                        /* punch hole is mandatory on hugetlbfs */
+                        error_report("large page recovery failure addr: "
+                                     RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
+                                     length, addr);
+                        exit(1);
+                    }

For shared memory we really need it.

Private file-backed is weird ... because we don't know if the shared or the private page is problematic ... :(


I agree with you, and we have to decide when should we bail out if ram_block_discard_range() doesn't work. According to me, if discard doesn't work and we are dealing with file-backed largepages (shared or not) we have to exit, because the fallocate is mandatory. It is the case with hugetlbfs.

In the non-file-backed case, or the file-backed non-largepage private case, according to me we can trust the mmap() method to put everything back in place for the VM reset to work as expected. Are there aspects I don't see, and for which mmap + the remap handler is not sufficient and we should also bail out here ?




Maybe we should just do:

if (block->fd >= 0) {
     /* mmap(MAP_FIXED) cannot reliably zap our problematic page. */
     error_report(...);
     exit(-1);
}

Or alternatively

if (block->fd >= 0 && qemu_ram_is_shared(block)) {
     /* mmap() cannot possibly zap our problematic page. */
     error_report(...);
     exit(-1);
} else if (block->fd >= 0) {
     /*
      * MAP_PRIVATE file-backed ... mmap() can only zap the private
      * page, not the shared one ... we don't know which one is
      * problematic.
      */
     warn_report(...);
}

I also agree that any file-backed/shared case should bail out if discard (fallocate) fails, no mater large or standard pages are used.

In the case of file-backed private standard pages, I think that a poison on the private page can be fixed with a new mmap. According to me, there are 2 cases to consider: at the moment the poison is seen, the page was dirty (so it means that it was a pure private page), or the page was not dirty, and in this case the poison could replace this non-dirty page with a new copy of the file content.
In both cases, I'd say that the remap should clean up the poison.

So the conditions when discard fails, could be something like:

   if (block->fd >= 0 && (qemu_ram_is_shared(block) ||
       (length > TARGET_PAGE_SIZE))) {
       /* punch hole is mandatory, mmap() cannot possibly zap our page*/
        error_report("%spage recovery failure addr: "
                     RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
                     (length > TARGET_PAGE_SIZE) ? "large " : "",
                     length, addr);
        exit(1);
    }


+                    flags = MAP_FIXED;
+                    flags |= block->flags & RAM_SHARED ?
+                             MAP_SHARED : MAP_PRIVATE;
+                    flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;
+                    prot = PROT_READ;
+                    prot |= block->flags & RAM_READONLY ? 0 : PROT_WRITE;
+                    if (block->fd >= 0) {
+                        area = mmap(vaddr, length, prot, flags, block->fd,
+                                    offset + block->fd_offset);
+                    } else {
+                        flags |= MAP_ANONYMOUS;
+                        area = mmap(vaddr, length, prot, flags, -1, 0);
+                    }
+                    if (area != vaddr) {
+                        error_report("Could not remap addr: "
+                                     RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
+                                     length, addr);
+                        exit(1);
+                    }
+                    memory_try_enable_merging(vaddr, length);
+                    qemu_ram_setup_dump(vaddr, length);

Can we factor the mmap hack out into a separate helper function to clean this up a bit?

Sure, I'll do that.





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux