On 2024/4/8 4:31, Oscar Salvador wrote: >> Totally unexpected, as this commit even removed hwpoison_entry_to_pfn(). >> Obviously even until now I assumed hwpoison is accounted as pfn swap entry >> but it's just missing.. >> >> Since this commit didn't really change is_pfn_swap_entry() itself, I was >> thinking maybe an older fix tag would apply, but then I noticed the old >> code indeed should work well even if hwpoison entry is missing. For >> example, it's a grey area on whether a hwpoisoned page should be accounted >> in smaps. So I think the Fixes tag is correct, and thanks for fixing this. >> >> Reviewed-by: Peter Xu <peterx@xxxxxxxxxx> > > Thanks Peter Thanks both. > >> Fedora stopped having DEBUG_VM for some time, but not sure about when it's >> still in the 6.1 trees. It looks like cc stable is still reasonable from >> that regard. > > Good to know, thanks for the info. > >> A side note is that when I'm looking at this, I went back and see why in >> some cases we need the pfn maintained for the poisoned, then I saw the only >> user is check_hwpoisoned_entry() who wants to do fast kills in some >> contexts and that includes a double check on the pfns in a poisoned entry. >> Then afaict this path is just too rarely used and buggy. > > Yes, unfortunately memory-failure code does not get exercised that much, > and so there might be subtly bugs lurking in there for quite some time. There're many memory-failure testcases but some code paths still didn't get exercised. That's a pity. :( > >> A few things we may need fixing, maybe someone in the loop would have time >> to have a look: >> >> - check_hwpoisoned_entry() >> - pte_none check is missing >> - all the rest swap types are missing (e.g., we want to kill the proc too >> if the page is during migration) Firstly, I thought rest swap types just won't exist in this code path. But after second thought, it seems it's possible. For example, when page is being isolated for migration, memory_failure will fails to isolate it. And the second MCE event will goes to kill_accessing_process() and see a migrate swap entry. >> - check_hwpoisoned_pmd_entry() >> - need similar care like above (pmd_none is covered not others) > > I will have a look and see what needs fixing, thanks for bringing it up. Thanks for your time. . > >