The patch titled Subject: mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb() has been added to the -mm tree. Its filename is mm-hwpoison-set-pagehwpoison-after-taking-page-lock-in-memory_failure_hugetlb.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-hwpoison-set-pagehwpoison-after-taking-page-lock-in-memory_failure_hugetlb.patch and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-hwpoison-set-pagehwpoison-after-taking-page-lock-in-memory_failure_hugetlb.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> Subject: mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb() There is a race condition between memory_failure_hugetlb() and hugetlb free/demotion, which causes setting PageHWPoison flag on the wrong page (which was a hugetlb when memory_failrue() was called, but was removed or demoted when memory_failure_hugetlb() is called). This results in killing wrong processes. So set PageHWPoison flag with holding page lock, Link: https://lkml.kernel.org/r/20220309091449.2753904-1-naoya.horiguchi@xxxxxxxxx Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Cc: Miaohe Lin <linmiaohe@xxxxxxxxxx> Cc: Yang Shi <shy828301@xxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/memory-failure.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-) --- a/mm/memory-failure.c~mm-hwpoison-set-pagehwpoison-after-taking-page-lock-in-memory_failure_hugetlb +++ a/mm/memory-failure.c @@ -1496,24 +1496,11 @@ static int memory_failure_hugetlb(unsign int res; unsigned long page_flags; - if (TestSetPageHWPoison(head)) { - pr_err("Memory failure: %#lx: already hardware poisoned\n", - pfn); - res = -EHWPOISON; - if (flags & MF_ACTION_REQUIRED) - res = kill_accessing_process(current, page_to_pfn(head), flags); - return res; - } - - num_poisoned_pages_inc(); - if (!(flags & MF_COUNT_INCREASED)) { res = get_hwpoison_page(p, flags); if (!res) { lock_page(head); if (hwpoison_filter(p)) { - if (TestClearPageHWPoison(head)) - num_poisoned_pages_dec(); unlock_page(head); return -EOPNOTSUPP; } @@ -1535,13 +1522,16 @@ static int memory_failure_hugetlb(unsign page_flags = head->flags; if (hwpoison_filter(p)) { - if (TestClearPageHWPoison(head)) - num_poisoned_pages_dec(); put_page(p); res = -EOPNOTSUPP; goto out; } + if (TestSetPageHWPoison(head)) + goto already_hwpoisoned; + + num_poisoned_pages_inc(); + /* * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so * simply disable it. In order to make it work properly, we need @@ -1567,6 +1557,13 @@ static int memory_failure_hugetlb(unsign out: unlock_page(head); return res; +already_hwpoisoned: + unlock_page(head); + pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); + res = -EHWPOISON; + if (flags & MF_ACTION_REQUIRED) + res = kill_accessing_process(current, page_to_pfn(head), flags); + return res; } static int memory_failure_dev_pagemap(unsigned long pfn, int flags, _ Patches currently in -mm which might be from naoya.horiguchi@xxxxxxx are mm-hwpoison-remove-obsolete-comment.patch mm-hwpoison-fix-error-page-recovered-but-reported-not-recovered.patch mm-hwpoison-set-pagehwpoison-after-taking-page-lock-in-memory_failure_hugetlb.patch