On Wed, Jun 15, 2022 at 01:49:33PM -0700, Andrew Morton wrote: > On Wed, 15 Jun 2022 10:34:06 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@xxxxxxx> wrote: > > > On Wed, Jun 15, 2022 at 05:32:09PM +0800, zhenwei pi wrote: > > > Currently unpoison_memory(unsigned long pfn) is designed for soft > > > poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets > > > cleared on a x86 platform once hardware memory corrupts. > > > > > > Unpoisoning a hardware corrupted page puts page back buddy only, > > > the kernel has a chance to access the page with *NOT PRESENT* KPTE. > > > This leads BUG during accessing on the corrupted KPTE. > > > > > > Suggested by David&Naoya, disable unpoison mechanism when a real HW error > > > happens to avoid BUG like this: > > > > > > > ... > > > > > > > > Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support") > > > Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned") > > > Cc: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > > Cc: David Hildenbrand <david@xxxxxxxxxx> > > > Cc: Oscar Salvador <osalvador@xxxxxxx> > > > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > > > Acked-by: David Hildenbrand <david@xxxxxxxxxx> > > > Signed-off-by: zhenwei pi <pizhenwei@xxxxxxxxxxxxx> > > > > Thank you very much. > > > > Acked-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > I added cc:stable to this. But the dual Fixes: are going to confuse > people regarding which kernel versions need the fix. Can we be more > specific? OK. This bug was visible since 17fae1294ad9d (merged in v5.8 time period), so marking "v5.8+" on "Cc: stable" line would be helpful. - Naoya Horiguchi