On Wed, Apr 21, 2021 at 09:57:26AM +0900, Naoya Horiguchi wrote: > From: Tony Luck <tony.luck@xxxxxxxxx> > > There can be races when multiple CPUs consume poison from the same > page. The first into memory_failure() atomically sets the HWPoison > page flag and begins hunting for tasks that map this page. Eventually > it invalidates those mappings and may send a SIGBUS to the affected > tasks. > > But while all that work is going on, other CPUs see a "success" > return code from memory_failure() and so they believe the error > has been handled and continue executing. > > Fix by wrapping most of the internal parts of memory_failure() in > a mutex. > > Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > --- > mm/memory-failure.c | 37 ++++++++++++++++++++++++------------- > 1 file changed, 24 insertions(+), 13 deletions(-) Reviewed-by: Borislav Petkov <bp@xxxxxxx> -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette