Hi Tony, On Thu, Oct 3, 2024 at 3:58 PM Luck, Tony <tony.luck@xxxxxxxxx> wrote: > > > Are you suggesting you prefer the per-VMA policy, or proposing a new > > "per-process policy" added via prctl? By "per-process", I imagine the > > policy to keep or offline the poisoned page will apply to all its > > VMAs? > > A "per-process policy" using prctl already exists. See prctl(PR_MCE_KILL). The policy I want to have is not about "whether to send SIGBUS or not" or "when to send SIGBUS", it is about whether to offline the error [huge]page or keep it accessible by the process. > Currently used to choose whether to eagerly send SIGBUS to a process > when a memory error is discovered asynchronously by a h/w patrol scrubber. > > What is the use case for a per-VMA policy? Do you have some application > that would like to use this? Our main use case is the virtual machine monitor and VM. VMM can track the *guest* physical addresses that are affected by the *host* physical addresses having errors. We'd like the VM to be able to continue loading guest data from the error [huge]page. Loading the clean portion should just work; loading the poisoned portion will be intercepted by KVM + VMM without going down to kernel / firmware / hardware. > > -Tony