On Tue, Nov 15, 2022 at 04:14:42PM +0100, Vlastimil Babka wrote: > Cc'ing memory failure folks, the beinning of this subthread is here: > > https://lore.kernel.org/all/3a51840f6a80c87b39632dc728dbd9b5dd444cd7.1655761627.git.ashish.kalra@xxxxxxx/ > > On 11/15/22 00:36, Kalra, Ashish wrote: > > Hello Boris, > > > > On 11/2/2022 6:22 AM, Borislav Petkov wrote: > >> On Mon, Oct 31, 2022 at 04:58:38PM -0500, Kalra, Ashish wrote: > >>> if (snp_lookup_rmpentry(pfn, &rmp_level)) { > >>> do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS); > >>> return RMP_PF_RETRY; > >> > >> Does this issue some halfway understandable error message why the > >> process got killed? > >> > >>> Will look at adding our own recovery function for the same, but that will > >>> again mark the pages as poisoned, right ? > >> > >> Well, not poisoned but PG_offlimits or whatever the mm folks agree upon. > >> Semantically, it'll be handled the same way, ofc. > > > > Added a new PG_offlimits flag and a simple corresponding handler for it. > > One thing is, there's not enough page flags to be adding more (except > aliases for existing) for cases that can avoid it, but as Boris says, if > using alias to PG_hwpoison it depends what will become confused with the > actual hwpoison. I agree with this. Just defining PG_offlimits as an alias of PG_hwpoison could break current hwpoison workload. So if you finally decide to go forward in this direction, you may as well have some indicator to distinguish the new kind of leaked pages from hwpoisoned pages. I don't remember exact thread, but I've read someone writing about similar kind of suggestion of using memory_failure() to make pages inaccessible in non-memory error usecase. I feel that it could be possible to generalize memory_failure() as general-purpose page offlining (by renaming it with hard_offline_page() and making memory_failure() one of the user of it). Thanks, Naoya Horiguchi