On 11/15/22 19:15, Kalra, Ashish wrote: > > On 11/15/2022 11:24 AM, Kalra, Ashish wrote: >> Hello Vlastimil, >> >> On 11/15/2022 9:14 AM, Vlastimil Babka wrote: >>> Cc'ing memory failure folks, the beinning of this subthread is here: >>> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F3a51840f6a80c87b39632dc728dbd9b5dd444cd7.1655761627.git.ashish.kalra%40amd.com%2F&data=05%7C01%7Cashish.kalra%40amd.com%7C944b59f239c541a52ac808dac71c2089%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638041220947600149%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=do9zzyMlAErkKx5rguqnL2GoG4lhsWHDI74zgwLWaZU%3D&reserved=0 >>> >>> On 11/15/22 00:36, Kalra, Ashish wrote: >>>> Hello Boris, >>>> >>>> On 11/2/2022 6:22 AM, Borislav Petkov wrote: >>>>> On Mon, Oct 31, 2022 at 04:58:38PM -0500, Kalra, Ashish wrote: >>>>>> if (snp_lookup_rmpentry(pfn, &rmp_level)) { >>>>>> do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS); >>>>>> return RMP_PF_RETRY; >>>>> >>>>> Does this issue some halfway understandable error message why the >>>>> process got killed? >>>>> >>>>>> Will look at adding our own recovery function for the same, but that will >>>>>> again mark the pages as poisoned, right ? >>>>> >>>>> Well, not poisoned but PG_offlimits or whatever the mm folks agree upon. >>>>> Semantically, it'll be handled the same way, ofc. >>>> >>>> Added a new PG_offlimits flag and a simple corresponding handler for it. >>> >>> One thing is, there's not enough page flags to be adding more (except >>> aliases for existing) for cases that can avoid it, but as Boris says, if >>> using alias to PG_hwpoison it depends what will become confused with the >>> actual hwpoison. >>> >>>> But there is still added complexity of handling hugepages as part of >>>> reclamation failures (both HugeTLB and transparent hugepages) and that >>>> means calling more static functions in mm/memory_failure.c >>>> >>>> There is probably a more appropriate handler in mm/memory-failure.c: >>>> >>>> soft_offline_page() - this will mark the page as HWPoisoned and also has >>>> handling for hugepages. And we can avoid adding a new page flag too. >>>> >>>> soft_offline_page - Soft offline a page. >>>> Soft offline a page, by migration or invalidation, without killing >>>> anything. >>>> >>>> So, this looks like a good option to call >>>> soft_offline_page() instead of memory_failure() in case of >>>> failure to transition the page back to HV/shared state via SNP_RECLAIM_CMD >>>> and/or RMPUPDATE instruction. >>> >>> So it's a bit unclear to me what exact situation we are handling here. The >>> original patch here seems to me to be just leaking back pages that are >>> unsafe for further use. soft_offline_page() seems to fit that scenario of a >>> graceful leak before something is irrepairably corrupt and we page fault >>> on it. >>> But then in the thread you discus PF handling and killing. So what is the >>> case here? If we detect this need to call snp_leak_pages() does it mean: >>> >>> a) nobody that could page fault at them (the guest?) is running anymore, we >>> are tearing it down, we just can't reuse the pages further on the host >> >> The host can page fault on them, if anything on the host tries to write to >> these pages. Host reads will return garbage data. >> >>> - seem like soft_offline_page() could work, but maybe we could just put the >>> pages on some leaked lists without special page? The only thing that should >>> matter is not to free the pages to the page allocator so they would be >>> reused by something else. >>> >>> b) something can stil page fault at them (what?) - AFAIU can't be resolved >>> without killing something, memory_failure() might limit the damage >> >> As i mentioned above, host writes will cause RMP violation page fault. >> > > And to add here, if its a guest private page, then the above fault cannot be > resolved, so the faulting process is terminated. BTW would this not be mostly resolved as part of rebasing to UPM? - host will not have these pages mapped in the first place (both kernel directmap and qemu userspace) - guest will have them mapped, but I assume that the conversion from private to shared (that might fail?) can only happen after guest's mappings are invalidated in the first place? > Thanks, > Ashish > >> >>>> >>>>> >>>>>> Still waiting for some/more feedback from mm folks on the same. >>>>> >>>>> Just send the patch and they'll give it. >>>>> >>>>> Thx. >>>>> >>>