On Tue, Sep 19, 2023 at 10:21:27AM +0800, Shuai Xue wrote: > Hardware errors could be signaled by synchronous interrupt, e.g. when an > error is detected by a background scrubber, or signaled by synchronous > exception, e.g. when an uncorrected error is consumed. Both synchronous and > asynchronous error are queued and handled by a dedicated kthread in > workqueue. > > commit 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for > synchronous errors") keep track of whether memory_failure() work was > queued, and make task_work pending to flush out the workqueue so that the > work for synchronous error is processed before returning to user-space. > The trick ensures that the corrupted page is unmapped and poisoned. And > after returning to user-space, the task starts at current instruction which > triggering a page fault in which kernel will send SIGBUS to current process > due to VM_FAULT_HWPOISON. > > However, the memory failure recovery for hwpoison-aware mechanisms does not > work as expected. For example, hwpoison-aware user-space processes like > QEMU register their customized SIGBUS handler and enable early kill mode by > seting PF_MCE_EARLY at initialization. Then the kernel will directy notify > the process by sending a SIGBUS signal in memory failure with wrong > si_code: the actual user-space process accessing the corrupt memory > location, but its memory failure work is handled in a kthread context, so > it will send SIGBUS with BUS_MCEERR_AO si_code to the actual user-space > process instead of BUS_MCEERR_AR in kill_proc(). > > To this end, separate synchronous and asynchronous error handling into > different paths like X86 platform does: > > - valid synchronous errors: queue a task_work to synchronously send SIGBUS > before ret_to_user. > - valid asynchronous errors: queue a work into workqueue to asynchronously > handle memory failure. > - abnormal branches such as invalid PA, unexpected severity, no memory > failure config support, invalid GUID section, OOM, etc. > > Then for valid synchronous errors, the current context in memory failure is > exactly belongs to the task consuming poison data and it will send SIBBUS > with proper si_code. > > Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors") > Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> > Tested-by: Ma Wupeng <mawupeng1@xxxxxxxxxx> > Reviewed-by: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> > Reviewed-by: Xiaofei Tan <tanxiaofei@xxxxxxxxxx> > Reviewed-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> > --- > arch/x86/kernel/cpu/mce/core.c | 9 +--- > drivers/acpi/apei/ghes.c | 84 +++++++++++++++++++++------------- > include/acpi/ghes.h | 3 -- > mm/memory-failure.c | 17 ++----- > 4 files changed, 56 insertions(+), 57 deletions(-) > ... > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 4d6e43c88489..80e1ea1cc56d 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -2163,7 +2163,9 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > * > * Return: 0 for successfully handled the memory error, > * -EOPNOTSUPP for hwpoison_filter() filtered the error event, > - * < 0(except -EOPNOTSUPP) on failure. > + * -EHWPOISON for already sent SIGBUS to the current process with > + * the proper error info, The meaning of this comment is understood, but the sentence seems to be a little too long. Could you sort this out with bullet points (like below)? * Return values: * 0 - success * -EOPNOTSUPP - hwpoison_filter() filtered the error event. * -EHWPOISON - sent SIGBUS to the current process with the proper * error info by kill_accessing_process(). * other negative values - failure > + * other negative error code on failure. > */ > int memory_failure(unsigned long pfn, int flags) > { > @@ -2445,19 +2447,6 @@ static void memory_failure_work_func(struct work_struct *work) > } > } > > -/* > - * Process memory_failure work queued on the specified CPU. > - * Used to avoid return-to-userspace racing with the memory_failure workqueue. > - */ > -void memory_failure_queue_kick(int cpu) > -{ > - struct memory_failure_cpu *mf_cpu; > - > - mf_cpu = &per_cpu(memory_failure_cpu, cpu); > - cancel_work_sync(&mf_cpu->work); > - memory_failure_work_func(&mf_cpu->work); > -} > - The declaration of memory_failure_queue_kick() still remains in include/linux/mm.h, so you can remove it together. Thanks, Naoya Horiguchi