[PATCH v3 0/7] hugetlbfs memory HW error fixes

“William Roche <william.roche@xxxxxxxxxx> · Mon, 25 Nov 2024 14:27:11 +0000

From: William Roche <william.roche@xxxxxxxxxx>

Hi David,

Here is an new version of our code, but I still need to double check
the mmap behavior in case of a memory error impact on:
- a clean page of an empty file or populated file
- already mapped using MAP_SHARED or MAP_PRIVATE
to see if mmap() can recover the area or not.

But I wanted to provide this version to know if this is the kind of
implementation you were expecting.

And here is a sligthly updated description of the patch set:

 ---
This set of patches fixes several problems with hardware memory errors
impacting hugetlbfs memory backed VMs. When using hugetlbfs large
pages, any large page location being impacted by an HW memory error
results in poisoning the entire page, suddenly making a large chunk of
the VM memory unusable.

The main problem that currently exists in Qemu is the lack of backend
file repair before resetting the VM memory, resulting in the impacted
memory to be silently unusable even after a VM reboot.

In order to fix this issue, we take into account the page size of the
impacted memory block when dealing with the associated poisoned page
location.

Using the page size information we also try to regenerate the memory
calling ram_block_discard_range() on VM reset when running
qemu_ram_remap(). So that a poisoned memory backed by a hugetlbfs
file is regenerated with a hole punched in this file. A new page is
loaded when the location is first touched.

In case of a discard failure we fall back to unmap/remap the memory
location and reset the memory settings.

We also have to honor the 'prealloc' attribute even after a successful
discard, so we reapply the memory settings in this case too.

This memory setting is performed by a new remap notification mechanism
calling host_memory_backend_ram_remapped() function when a region of
a memory block is remapped.

We also enrich the messages used to report a memory error relayed to
the VM, providing an identification of memory page and its size in
case of a large page impacted.
 ----

v2 -> v3:
. dropped the size parameter from qemu_ram_remap() and determine the page
  size when adding it to the poison list, aligning the offset down to the
  pagesize. Multiple sub-pages poisoned on a large page lead to a single
  poison entry.

. introduction of a helper function for the mmap code

. adding "on lost large page <size>@<ram_addr>" to the error injection
  msg (notation used in qemu_ram_remap() too ).
  So only in the case of a large page, it looks like:
qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7fc1f5dd6000 and GUEST addr 0x19fd6000 on lost large page 200000@19e00000 of type BUS_MCEERR_AR injected

. as we need the page_size value for the above message, I retrieve the
  value in kvm_arch_on_sigbus_vcpu() to pass the appropriate pointer
  to kvm_hwpoison_page_add() that doesn't need to align it anymore.

. added a similar message for the ARM platform (removing the MCE
  keyword)

. I also introduced a "fail hard" in the remap notification:
  host_memory_backend_ram_remapped()

This code is scripts/checkpatch.pl clean
'make check' runs fine on both x86 and Arm.

David Hildenbrand (3):
  numa: Introduce and use ram_block_notify_remap()
  hostmem: Factor out applying settings
  hostmem: Handle remapping of RAM

William Roche (4):
  hwpoison_page_list and qemu_ram_remap are based of pages
  system/physmem: poisoned memory discard on reboot
  accel/kvm: Report the loss of a large memory page
  system/physmem: Memory settings applied on remap notification

 accel/kvm/kvm-all.c       |   2 +-
 backends/hostmem.c        | 189 +++++++++++++++++++++++---------------
 hw/core/numa.c            |  11 +++
 include/exec/cpu-common.h |   3 +-
 include/exec/ramlist.h    |   3 +
 include/sysemu/hostmem.h  |   1 +
 system/physmem.c          |  90 +++++++++++++-----
 target/arm/kvm.c          |  13 +++
 target/i386/kvm/kvm.c     |  18 +++-
 9 files changed, 227 insertions(+), 103 deletions(-)

-- 
2.43.5