On 2024/4/7 8:08, Luck, Tony wrote: >> This one is against 6.1 (previous one was against v6.9-rc2): >> Again, compile tested only > > Oscar. > > Both the 6.1 and 6.9-rc2 patches make the BUG (and subsequent issues) go away. > > Here's what's happening. > > When the machine check occurs there's a scramble from various subsystems > to report the memory error. > > ghes_do_memory_failure() calls memory_failure_queue() which later > calls memory_failure() from a kernel thread. Side note: this happens TWICE > for each error. Not sure yet if this is a BIOS issue logging more than once. > or some Linux issues in acpi/apei/ghes.c code. > > uc_decode_notifier() [called from a different kernel thread] also calls > do_memory_failure() > > Finally kill_me_maybe() [called from task_work on return to the application > when returning from the machine check handler] also calls memory_failure() > > do_memory_failure() is somewhat prepared for multiple reports of the same > error. It uses an atomic test and set operation to mark the page as poisoned. > > First called to report the error does all the real work. Late arrivals take a > shorter path, but may still take some action(s) depending on the "flags" > passed in: > > if (TestSetPageHWPoison(p)) { > pr_err("%#lx: already hardware poisoned\n", pfn); > res = -EHWPOISON; > if (flags & MF_ACTION_REQUIRED) > res = kill_accessing_process(current, pfn, flags); > if (flags & MF_COUNT_INCREASED) > put_page(p); > goto unlock_mutex; > } > > In this case the last to arrive has MF_ACTION_REQUIRED set, so calls > kill_accessing_process() ... which is in the stack trace that led to the: > > kernel BUG at include/linux/swapops.h:88! > > I'm not sure that I fully understand your patch. I guess that it is making sure to > handle the case that the page has already been marked as poisoned? > > > Anyway ... thanks for the quick fix. I hope the above helps write a good > commit message to get this applied and backported to stable. Sorry for late. I was just back from my vacation. > > Tested-by: Tony Luck <tony.luck@xxxxxxxxx> Thanks for both. This should be a issue introduced from commit: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry") hwpoison_entry_to_pfn() is replaced with swp_offset_pfn() which might not be intended to be used with hwpoison entry: /* * A pfn swap entry is a special type of swap entry that always has a pfn stored * in the swap offset. *They are used to represent unaddressable device memory* * *and to restrict access to a page undergoing migration* */ static inline bool is_pfn_swap_entry(swp_entry_t entry) { /* Make sure the swp offset can always store the needed fields */ BUILD_BUG_ON(SWP_TYPE_SHIFT < SWP_PFN_BITS); return is_migration_entry(entry) || is_device_private_entry(entry) || is_device_exclusive_entry(entry); } I think Oscar's patch is the right fix and it will be better to amend the corresponding comment too. Thanks. > > -Tony > > > > > > . >