[PATCH v7 0/6] Poisoned memory recovery on reboot

“William Roche <william.roche@xxxxxxxxxx> · Sat, 1 Feb 2025 09:57:20 +0000

From: William Roche <william.roche@xxxxxxxxxx>

Hello David,

Here is the version with the small nits corrected.
And the 'Acked-by' entries you gave me for patch 1 and 2.

 ---
This set of patches fixes several problems with hardware memory errors
impacting hugetlbfs memory backed VMs and the generic memory recovery
on VM reset.
When using hugetlbfs large pages, any large page location being impacted
by an HW memory error results in poisoning the entire page, suddenly
making a large chunk of the VM memory unusable.

The main problem that currently exists in Qemu is the lack of backend
file repair before resetting the VM memory, resulting in the impacted
memory to be silently unusable even after a VM reboot.

In order to fix this issue, we take into account the page size of the
impacted memory block when dealing with the associated poisoned page
location.

Using the page size information we also try to regenerate the memory
calling ram_block_discard_range() on VM reset when running
qemu_ram_remap(). So that a poisoned memory backed by a hugetlbfs
file is regenerated with a hole punched in this file. A new page is
loaded when the location is first touched.

In case of a discard failure we fall back to remapping the memory
location. We also have to reset the memory settings and honor the
'prealloc' attribute.

This memory setting is performed by a new remap notification mechanism
calling host_memory_backend_ram_remapped() function when a region of
a memory block is remapped.

We also enrich the messages used to report a memory error relayed to
the VM, providing an identification of memory page and its size in
case of a large page impacted.
 ----

v1 -> v2:
. I removed the kernel SIGBUS siginfo provided lsb size information
  tracking. Only relying on the RAMBlock page_size instead.
. I adapted the 3 patches you indicated me to implement the
  notification mechanism on remap.  Thank you for this code!
  I left them as Authored by you.
  But I haven't tested if the policy setting works as expected on VM
  reset, only that the replacement of physical memory works.
. I also removed the old memory setting that was kept in qemu_ram_remap()
  but this small last fix could probably be merged with your last commit.

v2 -> v3:
. dropped the size parameter from qemu_ram_remap() and determine the page
  size when adding it to the poison list, aligning the offset down to the
  pagesize. Multiple sub-pages poisoned on a large page lead to a single
  poison entry.
. introduction of a helper function for the mmap code
. adding "on lost large page <size>@<ram_addr>" to the error injection
  msg (notation used in qemu_ram_remap() too ).
  So only in the case of a large page, it looks like:
Guest MCE Memory Error at QEMU addr 0x7fc1f5dd6000 and GUEST addr 0x19fd6000 on lost large page 200000@19e00000 of type BUS_MCEERR_AR injected
. as we need the page_size value for the above message, I retrieve the
  value in kvm_arch_on_sigbus_vcpu() to pass the appropriate pointer
  to kvm_hwpoison_page_add() that doesn't need to align it anymore.
. added a similar message for the ARM platform (removing the MCE
  keyword)
. I also introduced a "fail hard" in the remap notification:
  host_memory_backend_ram_remapped()

v3 -> v4:
. Fixed some commit messages typos
. Enhanced some code comments
. Changed the discard fall back conditions to consider only anonymous
  memory
. Fixed missing some variable name changes in intermediary patches.
. Modify the error message given when an error is injected to report
  the case of a large page
. use snprintf() to generate this message
. Adding this same type of message in the ARM case too

v4->v5:
. Updated commit messages (for patches 1, 5 and 6)
. Fixed comment typo of patch 2
. Changed the fall back function parameters to match the
  ram_block_discard_range() function.
. Removed the unused case of remapping a file in this function
. add the assert(block->fd < 0) in this function too
. I merged my patch 7 with your patch 6 (we only have 6 patches now)

v5->v6:
. don't align down ram_addr on kvm_hwpoison_page_add() but create
  a new entry for each subpage reported as poisoned
. introduce similar messages about memory error as discard_range()
. introduce a function to retrieve more information about a RAMBlock
  experiencing an error than just its associated page size
. file offset as an uint64_t instead of a ram_addr_t
. changed ownership of patch 6/6

v6->v7:
. change the block location information collection function name to
  qemu_ram_block_info_from_addr()
. display the fd_offset value only when dealing with a file backend
  in kvm_hwpoison_page_add() and qemu_ram_remap()
. better placed offset alignment computation
. two empty separation lines missing

This code is scripts/checkpatch.pl clean
'make check' runs clean on both x86 and ARM.

David Hildenbrand (2):
  numa: Introduce and use ram_block_notify_remap()
  hostmem: Factor out applying settings

William Roche (4):
  system/physmem: handle hugetlb correctly in qemu_ram_remap()
  system/physmem: poisoned memory discard on reboot
  accel/kvm: Report the loss of a large memory page
  hostmem: Handle remapping of RAM

 accel/kvm/kvm-all.c       |  20 +++-
 backends/hostmem.c        | 189 +++++++++++++++++++++++---------------
 hw/core/numa.c            |  11 +++
 include/exec/cpu-common.h |  12 ++-
 include/exec/ramlist.h    |   3 +
 include/system/hostmem.h  |   1 +
 system/physmem.c          | 107 +++++++++++++++------
 target/arm/kvm.c          |   3 +
 8 files changed, 244 insertions(+), 102 deletions(-)

-- 
2.43.5