On Mon, Dec 18, 2023 at 02:45:21PM +0800, Shuai Xue wrote: > Hardware errors could be signaled by asynchronous interrupt, e.g. when an > error is detected by a background scrubber, or signaled by synchronous > exception, e.g. when a CPU tries to access a poisoned cache line. Both > synchronous and asynchronous error are queued as a memory_failure() work > and handled by a dedicated kthread in workqueue. > > However, the memory failure recovery sends SIBUS with wrong BUS_MCEERR_AO > si_code for synchronous errors in early kill mode, even MF_ACTION_REQUIRED > is set. The main problem is that the memory failure work is handled in > kthread context but not the user-space process which is accessing the > corrupt memory location, so it will send SIGBUS with BUS_MCEERR_AO si_code > to the user-space process instead of BUS_MCEERR_AR in kill_proc(). > > To this end, queue memory_failure() as a task_work so that the current > context in memory_failure() is exactly belongs to the process consuming > poison data and it will send SIBBUS with proper si_code. > > Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> > Tested-by: Ma Wupeng <mawupeng1@xxxxxxxxxx> > Reviewed-by: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> > Reviewed-by: Xiaofei Tan <tanxiaofei@xxxxxxxxxx> > Reviewed-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> > --- > drivers/acpi/apei/ghes.c | 77 +++++++++++++++++++++++----------------- > include/acpi/ghes.h | 3 -- > mm/memory-failure.c | 13 ------- > 3 files changed, 44 insertions(+), 49 deletions(-) > <formletter> This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly. </formletter>