Hi, Axel, On Wed, May 17, 2023 at 03:28:36PM -0700, Axel Rasmussen wrote: > I do plan a v2, if for no other reason than to update the > documentation. Happy to add a cover letter with it as well. > > +Jiaqi back to CC, this is one piece of a larger memory poisoning / > recovery design Jiaqi is working on, so he may have some ideas why > MADV_HWPOISON or MADV_PGER will or won't work. > > One idea is, at least for our use case, we have to have the range be > userfaultfd registered, because we need to intercept the first access > and check at that point whether or not it should be poisoned. But, I > think in principle a scheme like this could work: > > 1. Intercept first access with UFFD > 2. Issue MADV_HWPOISON or MADV_PGERR or etc to put a pte denoting the > poisoned page in place > 3. UFFDIO_WAKE to have the faulting thread retry, see the new entry, and SIGBUS > > It's arguably slightly weird, since normally UFFD events are resolved > with UFFDIO_* operations, but I don't see why it *couldn't* work. > > Then again I am not super familiar with MADV_HWPOISON, I will have to > do a bit of reading to understand if its semantics are the same > (future accesses to this address get SIGBUS). Yes, it'll be great if this can be checked up before sending v2. What you said match exactly what I was in mind. I hope it will already work, or we can always discuss what is missing. -- Peter Xu