On 2025/3/7 13:44, Shuai Xue wrote: > When an uncorrected memory error is consumed there is a race between the > CMCI from the memory controller reporting an uncorrected error with a UCNA > signature, and the core reporting and SRAR signature machine check when the > data is about to be consumed. > > - Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1] > > Prior to Icelake memory controllers reported patrol scrub events that > detected a previously unseen uncorrected error in memory by signaling a > broadcast machine check with an SRAO (Software Recoverable Action Optional) > signature in the machine check bank. This was overkill because it's not an > urgent problem that no core is on the verge of consuming that bad data. > It's also found that multi SRAO UCE may cause nested MCE interrupts and > finally become an IERR. > > Hence, Intel downgrades the machine check bank signature of patrol > scrub from SRAO to UCNA (Uncorrected, No Action required), and signal > changed to #CMCI. Just to add to the confusion, Linux does take an action > (in uc_decode_notifier()) to try to offline the page despite the UC*NA* > signature name. > > - Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1] > > Having decided that CMCI/UCNA is the best action for patrol scrub errors, > the memory controller uses it for reads too. But the memory controller is > executing asynchronously from the core, and can't tell the difference > between a "real" read and a speculative read. So it will do CMCI/UCNA if an > error is found in any read. > > Thus: > > 1) Core is clever and thinks address A is needed soon, issues a speculative read. > 2) Core finds it is going to use address A soon after sending the read request > 3) The CMCI from the memory controller is in a race with MCE from the core > that will soon try to retire the load from address A. > > Quite often (because speculation has got better) the CMCI from the memory > controller is delivered before the core is committed to the instruction > reading address A, so the interrupt is taken, and Linux offlines the page > (marking it as poison). > > - Why user process is killed for instr case > > Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported > "not recovered"") tries to fix noise message "Memory error not recovered" > and skips duplicate SIGBUSs due to the race. But it also introduced a bug > that kill_accessing_process() return -EHWPOISON for instr case, as result, > kill_me_maybe() send a SIGBUS to user process. > > If the CMCI wins that race, the page is marked poisoned when > uc_decode_notifier() calls memory_failure(). For dirty pages, > memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag, > converting the PTE to a hwpoison entry. As a result, > kill_accessing_process(): > > - call walk_page_range() and return 1 regardless of whether > try_to_unmap() succeeds or fails, > - call kill_proc() to make sure a SIGBUS is sent > - return -EHWPOISON to indicate that SIGBUS is already sent to the > process and kill_me_maybe() doesn't have to send it again. > > However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the > PTE unchanged and not converted to a hwpoison entry. Conversely, for > clean pages where PTE entries are not marked as hwpoison, > kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to > send a SIGBUS. > > Console log looks like this: > > Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects > Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered > Memory failure: 0x827ca68: already hardware poisoned > mce: Memory error not recovered > > To fix it, return 0 for "corrupted page was clean", preventing an > unnecessary SIGBUS to user process. > > [1] https://lore.kernel.org/lkml/20250217063335.22257-1-xueshuai@xxxxxxxxxxxxxxxxx/T/#mba94f1305b3009dd340ce4114d3221fe810d1871 > Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") > Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx Thanks for your detailed commit log. This patch looks good to me. Acked-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> Thanks. .