On 15.06.22 10:15, HORIGUCHI NAOYA(堀口 直也) wrote: > On Wed, Jun 15, 2022 at 10:00:05AM +0800, zhenwei pi wrote: >> Currently unpoison_memory(unsigned long pfn) is designed for soft >> poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets >> cleared on a x86 platform once hardware memory corrupts. >> >> Unpoisoning a hardware corrupted page puts page back buddy only, >> the kernel has a chance to access the page with *NOT PRESENT* KPTE. >> This leads BUG during accessing on the corrupted KPTE. >> >> Suggested by David&Naoya, disable unpoison mechanism when a real HW error >> happens to avoid BUG like this: > ... > >> >> Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support") >> Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned") >> Cc: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> >> Cc: David Hildenbrand <david@xxxxxxxxxx> >> Signed-off-by: zhenwei pi <pizhenwei@xxxxxxxxxxxxx> > > Cc to stable? > I think that the current approach seems predictable to me than earlier versions, > so I can agree with sending this to stable a little more confidently. > >> --- >> Documentation/vm/hwpoison.rst | 3 ++- >> drivers/base/memory.c | 2 +- >> include/linux/mm.h | 1 + >> mm/hwpoison-inject.c | 2 +- >> mm/madvise.c | 2 +- >> mm/memory-failure.c | 12 ++++++++++++ >> 6 files changed, 18 insertions(+), 4 deletions(-) >> > > ... > >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index b85661cbdc4a..385b5e99bfc1 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __read_mostly = 1; >> >> atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); >> >> +static bool hw_memory_failure; > > Could you set the initial value explicitly? Using a default value is good, > but doing as the surrounding code do is better for consistency. And this > variable can be updated only once, so adding __read_mostly macro is also fine. No strong opinion. __read_mostly makes sense, but I assume we don't really care about performance that much when dealing with HW errors. With or without changes around this initialization Acked-by: David Hildenbrand <david@xxxxxxxxxx> -- Thanks, David / dhildenb