Re: [RFC RESEND 0/6] hugetlbfs largepage RAS project

David Hildenbrand <david@xxxxxxxxxx> · Tue, 10 Sep 2024 13:36:35 +0200

On 10.09.24 12:02, “William Roche wrote:
From: William Roche <william.roche@xxxxxxxxxx>

Hi,

Apologies for the noise; resending as I missed CC'ing the maintainers of the
changed files

Hello,

This is a Qemu RFC to introduce the possibility to deal with hardware
memory errors impacting hugetlbfs memory backed VMs. When using
hugetlbfs large pages, any large page location being impacted by an
HW memory error results in poisoning the entire page, suddenly making
a large chunk of the VM memory unusable.

The implemented proposal is simply a memory mapping change when an HW error
is reported to Qemu, to transform a hugetlbfs large page into a set of
standard sized pages. The failed large page is unmapped and a set of
standard sized pages are mapped in place.
This mechanism is triggered when a SIGBUS/MCE_MCEERR_Ax signal is received
by qemu and the reported location corresponds to a large page.

This gives the possibility to:
- Take advantage of newer hypervisor kernel providing a way to retrieve
still valid data on the impacted hugetlbfs poisoned large page.
If the backend file is MAP_SHARED, we can copy the valid data into the

How are you dealing with other consumers of the shared memory, such as 
vhost-user processes, vm migration whereby RAM is migrated using file 
content, vfio that might have these pages pinned?

In general, you cannot simply replace pages by private copies when 
somebody else might be relying on these pages to go to actual guest RAM.

It sounds very hacky and incomplete at first.

--
Cheers,

David / dhildenb