On Wed, Nov 16, 2022 at 04:28:11AM -0600, Kalra, Ashish wrote: > On 11/15/2022 11:19 PM, HORIGUCHI NAOYA(堀口 直也) wrote: > > On Tue, Nov 15, 2022 at 04:14:42PM +0100, Vlastimil Babka wrote: > > > Cc'ing memory failure folks, the beinning of this subthread is here: > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F3a51840f6a80c87b39632dc728dbd9b5dd444cd7.1655761627.git.ashish.kalra%40amd.com%2F&data=05%7C01%7Cashish.kalra%40amd.com%7C7b2d39d6e2504a8f923608dac792224b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638041727879125176%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KBJLKhPQP23vmvY%2FNnbjZs8wTJs%2FrF%2BiU54Sdc4Ldx4%3D&reserved=0 > > > > > > On 11/15/22 00:36, Kalra, Ashish wrote: > > > > Hello Boris, > > > > > > > > On 11/2/2022 6:22 AM, Borislav Petkov wrote: > > > > > On Mon, Oct 31, 2022 at 04:58:38PM -0500, Kalra, Ashish wrote: > > > > > > if (snp_lookup_rmpentry(pfn, &rmp_level)) { > > > > > > do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS); > > > > > > return RMP_PF_RETRY; > > > > > > > > > > Does this issue some halfway understandable error message why the > > > > > process got killed? > > > > > > > > > > > Will look at adding our own recovery function for the same, but that will > > > > > > again mark the pages as poisoned, right ? > > > > > > > > > > Well, not poisoned but PG_offlimits or whatever the mm folks agree upon. > > > > > Semantically, it'll be handled the same way, ofc. > > > > > > > > Added a new PG_offlimits flag and a simple corresponding handler for it. > > > > > > One thing is, there's not enough page flags to be adding more (except > > > aliases for existing) for cases that can avoid it, but as Boris says, if > > > using alias to PG_hwpoison it depends what will become confused with the > > > actual hwpoison. > > > > I agree with this. Just defining PG_offlimits as an alias of PG_hwpoison > > could break current hwpoison workload. So if you finally decide to go > > forward in this direction, you may as well have some indicator to > > distinguish the new kind of leaked pages from hwpoisoned pages. > > > > I don't remember exact thread, but I've read someone writing about similar > > kind of suggestion of using memory_failure() to make pages inaccessible in > > non-memory error usecase. I feel that it could be possible to generalize > > memory_failure() as general-purpose page offlining (by renaming it with > > But, doesn't memory_failure() also mark the pages as PG_hwpoison, and then > using it for these leaked pages will again cause confusion with actual > hwpoison ? Yes, so we might need modification of memory_failure code for this approach like renaming PG_hwpoison to more generic one (although some possible names like PageOffline and PageIsolated are already used) and/or somehow showing "which kind of leaked pages" info. Thanks, Naoya Horiguchi