Re: [PATCH] mm,swapops: Update check in is_pfn_swap_entry for hwpoison entries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Apr 07, 2024 at 03:56:17PM +0200, Oscar Salvador wrote:
> On Sun, Apr 07, 2024 at 03:05:37PM +0200, Oscar Salvador wrote:
> > Tony reported that the Machine check recovery was broken in v6.9-rc1,
> > as he was hitting a VM_BUG_ON when injecting uncorrectable memory errors
> > to DRAM.
> > After some more digging and debugging on his side, he realized that this
> > went back to v6.1, with the introduction of 'commit 0d206b5d2e0d ("mm/swap: add
> > swp_offset_pfn() to fetch PFN from swap entry")'.
> > That commit, among other things, introduced swp_offset_pfn(), replacing
> > hwpoison_entry_to_pfn() in its favour.
> > 
> > The patch also introduced a VM_BUG_ON() check for is_pfn_swap_entry(),
> > but is_pfn_swap_entry() never got updated to cover hwpoison entries, which
> > means that we would hit the VM_BUG_ON whenever we would call
> > swp_offset_pfn() for such entries on environments with CONFIG_DEBUG_VM set.
> > Fix this by updating the check to cover hwpoison entries as well, and update
> > the comment while we are it.
> > 
> > Reported-by: Tony Luck <tony.luck@xxxxxxxxx>
> > Closes: https://lore.kernel.org/all/Zg8kLSl2yAlA3o5D@agluck-desk3/
> > Tested-by: Tony Luck <tony.luck@xxxxxxxxx>
> > Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")

Totally unexpected, as this commit even removed hwpoison_entry_to_pfn().
Obviously even until now I assumed hwpoison is accounted as pfn swap entry
but it's just missing..

Since this commit didn't really change is_pfn_swap_entry() itself, I was
thinking maybe an older fix tag would apply, but then I noticed the old
code indeed should work well even if hwpoison entry is missing.  For
example, it's a grey area on whether a hwpoisoned page should be accounted
in smaps.  So I think the Fixes tag is correct, and thanks for fixing this.

Reviewed-by: Peter Xu <peterx@xxxxxxxxxx>

> > Cc: <stable@xxxxxxxxxxxxxxx> # 6.1.x
> 
> I think I need to clarify why the stable.
> 
> It is my understanding that some distros ship their kernel with
> CONFIG_DEBUG_VM set by default (I think Fedora comes to my mind?).
> I am fine with backing down if people think that this is an
> overreaction.

Fedora stopped having DEBUG_VM for some time, but not sure about when it's
still in the 6.1 trees.  It looks like cc stable is still reasonable from
that regard.

A side note is that when I'm looking at this, I went back and see why in
some cases we need the pfn maintained for the poisoned, then I saw the only
user is check_hwpoisoned_entry() who wants to do fast kills in some
contexts and that includes a double check on the pfns in a poisoned entry.
Then afaict this path is just too rarely used and buggy.

A few things we may need fixing, maybe someone in the loop would have time
to have a look:

- check_hwpoisoned_entry()
  - pte_none check is missing
  - all the rest swap types are missing (e.g., we want to kill the proc too
    if the page is during migration)
- check_hwpoisoned_pmd_entry()
  - need similar care like above (pmd_none is covered not others)

I copied Naoya too.

Thanks,

-- 
Peter Xu





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux