On Mon, Aug 28, 2023 at 09:41:55AM +0800, Shuai Xue wrote: > On 2023/8/22 09:15, Shuai Xue wrote: > > On 2023/8/21 18:50, Will Deacon wrote: > >>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > >>> index 3fe516b32577..38e2186882bd 100644 > >>> --- a/arch/arm64/mm/fault.c > >>> +++ b/arch/arm64/mm/fault.c > >>> @@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr, > >>> } else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) { > >>> unsigned int lsb; > >>> > >>> + pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n", > >>> + current->comm, current->pid, far); > >>> lsb = PAGE_SHIFT; > >>> if (fault & VM_FAULT_HWPOISON_LARGE) > >>> lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault)); > >> > >> Hmm, I'm not convinced by this. We have 'show_unhandled_signals' already, > >> and there's plenty of code in memory-failure.c for handling poisoned pages > >> reported by e.g. GHES. I don't think dumping extra messages in dmesg from > >> the arch code really adds anything. > > > > I see the show_unhandled_signals() will dump the stack but it rely on > > /proc/sys/debug/exception-trace be set. > > > > The memory failure is the top issue in our production cloud and also other hyperscalers. > > We have received complaints from our operations engineers and end users that processes > > are being inexplicably killed :(. Could you please consider add a message? I don't have any objection to logging this stuff somehow, I'm just not convinced that the console is the best place for that information in 2023. Is there really nothing better? Will