On Tue, 27 Apr 2021 15:29:52 +0900 Naoya Horiguchi <nao.horiguchi@xxxxxxxxx> wrote: > From: Tony Luck <tony.luck@xxxxxxxxx> > > There can be races when multiple CPUs consume poison from the same > page. The first into memory_failure() atomically sets the HWPoison > page flag and begins hunting for tasks that map this page. Eventually > it invalidates those mappings and may send a SIGBUS to the affected > tasks. > > But while all that work is going on, other CPUs see a "success" > return code from memory_failure() and so they believe the error > has been handled and continue executing. > > Fix by wrapping most of the internal parts of memory_failure() in > a mutex. > > Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > Reviewed-by: Borislav Petkov <bp@xxxxxxx> Sorry to interrupt, I just thought one thing: This mutex seems not been bind to the error page, will there be some core case like test code or multi-poison case whick will break this mutex? Thanks! Aili Yao