From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> HWPoisonHandlable() sometimes returns false for typical user pages due to races with average memory events like transfers over LRU lists. This causes failures in hwpoison handling. There's retry code for such a case but does not work because the retry loop reaches the retry limit too quickly before the page settles down to handlable state. Let get_any_page() call shake_page() to fix it. Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation") Reported-by: Tony Luck <tony.luck@xxxxxxxxx> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> Cc: stable@xxxxxxxxxxxxxxx # 5.13 --- mm/memory-failure.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git v5.14-rc6/mm/memory-failure.c v5.14-rc6_patched/mm/memory-failure.c index eefd823deb67..aa6592540f17 100644 --- v5.14-rc6/mm/memory-failure.c +++ v5.14-rc6_patched/mm/memory-failure.c @@ -1146,7 +1146,7 @@ static int __get_hwpoison_page(struct page *page) * unexpected races caused by taking a page refcount. */ if (!HWPoisonHandlable(head)) - return 0; + return -EBUSY; if (PageTransHuge(head)) { /* @@ -1199,9 +1199,14 @@ static int get_any_page(struct page *p, unsigned long flags) } goto out; } else if (ret == -EBUSY) { - /* We raced with freeing huge page to buddy, retry. */ - if (pass++ < 3) + /* + * We raced with (possibly temporary) unhandlable + * page, retry. + */ + if (pass++ < 3) { + shake_page(p, 1); goto try_again; + } goto out; } } -- 2.25.1