The patch titled Subject: x86/mce: collect error message for severities below MCE_PANIC_SEVERITY has been added to the -mm mm-unstable branch. Its filename is x86-mce-collect-error-message-for-severities-below-mce_panic_severity.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/x86-mce-collect-error-message-for-severities-below-mce_panic_severity.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> Subject: x86/mce: collect error message for severities below MCE_PANIC_SEVERITY Date: Mon, 17 Feb 2025 14:33:31 +0800 Patch series "mm/hwpoison: Fix regressions in memory failure handling", v2. This series addresses three regressions identified in memory failure handling, as discovered using ras-tools[1]: - `./einj_mem_uc copyin -f` - `./einj_mem_uc futex -f` - `./einj_mem_uc instr` The regressions in the copyin and futex cases were caused by the replacement of `EX_TYPE_UACCESS` with `EX_TYPE_EFAULT_REG` in some copy-from-user operations, leading to kernel panics. The instr case regression resulted from the PTE entry not being marked as hwpoison, causing the system to send unnecessary SIGBUS signals. These fixes ensure proper handling of memory errors and prevent kernel panics and unnecessary signal dispatch. [1] https://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git This patch (of 5): Currently, mce_no_way_out() only collects error messages when the error severity is equal to `MCE_PANIC_SEVERITY`. To improve diagnostics, modify the behavior to also collect error messages when the severity is less than `MCE_PANIC_SEVERITY`. Link: https://lkml.kernel.org/r/20250217063335.22257-1-xueshuai@xxxxxxxxxxxxxxxxx Link: https://lkml.kernel.org/r/20250217063335.22257-2-xueshuai@xxxxxxxxxxxxxxxxx Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> Cc: Acked-by:Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> Cc: Borislav Betkov <bp@xxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx> Cc: linmiaohe <linmiaohe@xxxxxxxxxx> Cc: "Luck, Tony" <tony.luck@xxxxxxxxx> Cc: Naoya Horiguchi <nao.horiguchi@xxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Ruidong Tian <tianruidong@xxxxxxxxxxxxxxxxx> Cc: Jane Chu <jane.chu@xxxxxxxxxx> Cc: Jarkko Sakkinen <jarkko@xxxxxxxxxx> Cc: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> Cc: Yazen Ghannam <yazen.ghannam@xxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- arch/x86/kernel/cpu/mce/core.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) --- a/arch/x86/kernel/cpu/mce/core.c~x86-mce-collect-error-message-for-severities-below-mce_panic_severity +++ a/arch/x86/kernel/cpu/mce/core.c @@ -925,12 +925,13 @@ static __always_inline void quirk_zen_if * Do a quick check if any of the events requires a panic. * This decides if we keep the events around or clear them. */ -static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, unsigned long *validp, - struct pt_regs *regs) +static __always_inline bool mce_no_way_out(struct mce_hw_err *err, char **msg, + unsigned long *validp, + struct pt_regs *regs) { struct mce *m = &err->m; char *tmp = *msg; - int i; + int i, cur_sev = MCE_NO_SEVERITY, sev; for (i = 0; i < this_cpu_read(mce_num_banks); i++) { m->status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS)); @@ -945,13 +946,17 @@ static __always_inline int mce_no_way_ou quirk_zen_ifu(i, m, regs); m->bank = i; - if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) { + sev = mce_severity(m, regs, &tmp, true); + if (sev >= cur_sev) { mce_read_aux(err, i); *msg = tmp; - return 1; + cur_sev = sev; } + + if (cur_sev == MCE_PANIC_SEVERITY) + return true; } - return 0; + return false; } /* _ Patches currently in -mm which might be from xueshuai@xxxxxxxxxxxxxxxxx are x86-mce-collect-error-message-for-severities-below-mce_panic_severity.patch x86-mce-dump-error-msg-from-severities.patch x86-mce-add-ex_type_efault_reg-as-in-kernel-recovery-context-to-fix-copy-from-user-operations-regression.patch mm-hwpoison-fix-incorrect-not-recovered-report-for-recovered-clean-pages.patch mm-memory-failure-move-return-value-documentation-to-function-declaration.patch