On Mon, Mar 08, 2021 at 02:55:04PM -0800, Luck, Tony wrote: > There can be races when multiple CPUs consume poison from the same > page. The first into memory_failure() atomically sets the HWPoison > page flag and begins hunting for tasks that map this page. Eventually > it invalidates those mappings and may send a SIGBUS to the affected > tasks. > > But while all that work is going on, other CPUs see a "success" > return code from memory_failure() and so they believe the error > has been handled and continue executing. > > Fix by wrapping most of the internal parts of memory_failure() in > a mutex. > > Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx> Thanks! Acked-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>