RE: [RFC PATCH v1 1/2] mm/memory-failure: introduce global MFR policy

"Luck, Tony" <tony.luck@xxxxxxxxx> · Fri, 11 Oct 2024 19:44:34 +0000

> Something like by way of userfaultfd,  kernel provides a new/clean 
> hugetlb page, copied over good data from the clean subpages and then 
> present the clean hugetlb page to user process with indication that 
> subpage x is a substitute of the poisoned old subpage x, hence its data 
> might need a refill?  I am not sure how exactly to pull this through as 
> the even is not a page-fault, but just wondering whether something like 
> this is possible.

This requires serious levels of sophistication from the application.
If some thread still accesses the "lost" data, there's no signal that
anything went wrong. It just reads whatever data the kernel filled the
poisoned area with. For some applications there might be some
data pattern that would help track this down. But no general answer.

On the plus side, the amount of "lost" data need not be a page.
On Intel the poison unit is a cache line (64 bytes). So more of the
original data can potentially be preserved. This might be useful
for applications using regular pages as well as those using huge pages.

When Linux first implemented recovery, we had hopes that applications
like databases would be able to implement their own recovery. Losing
a whole page turned out to be problematic as in some implementations
the metadata for a database entry was stored at the start of the memory
block. So the SIGBUS would provide the virtual address, and it wasn't
of any practical use to determine which data structure(s) were affected
without some massive restructure of the code to separate metadata
from data.

-Tony