On Thu, Oct 3, 2024 at 4:20 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Thu, Oct 03, 2024 at 03:45:09PM -0700, Jiaqi Yan wrote: > > Hi Jason, > > > > On Wed, Oct 2, 2024 at 8:02 AM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > > > On Tue, Sep 24, 2024 at 04:39:18AM +0000, Jiaqi Yan wrote: > > > > > > > So far I personally prefer the global MFR policy but open to feedbacks to both > > > > options, or new ideas. > > > > > > Why? It seems more natural that only processe that can handle the > > > SIGBUS semantics would opt into them? > > > > Are you suggesting you prefer the per-VMA policy, or proposing a new > > "per-process policy" added via prctl? By "per-process", I imagine the > > policy to keep or offline the poisoned page will apply to all its > > VMAs? > > I'm just asking why you "personally prefer" as the direction seems a > bit awkward I assume the "awkward" comes from the concern of what userspace will do if the kernel is configured to keep poisoned pages. Admittedly this direction is the high return-on-invest one for me, as we already have memory failure recovery and repair in userspace to work well with poisoned pages not offlined until hw is repaired. But I don't assume it is the also case for everyone else, so I also want to propose alternative (limit to just VMA, or memory owned by process, and limit to their lifetime) that hope work for more people. > > Jason