The patch titled Subject: mm/hwpoison: disable pcp for page_handle_poison() has been added to the -mm tree. Its filename is mm-hwpoison-disable-pcp-for-page_handle_poison.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-hwpoison-disable-pcp-for-page_handle_poison.patch and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-hwpoison-disable-pcp-for-page_handle_poison.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> Subject: mm/hwpoison: disable pcp for page_handle_poison() Recent changes by patch "mm/page_alloc: allow high-order pages to be stored on the per-cpu lists" makes kernels determine whether to use pcp by pcp_allowed_order(), which breaks soft-offline for hugetlb pages. Soft-offline dissolves a migration source page, then removes it from buddy free list, so it's assumed that any subpage of the soft-offlined hugepage are recognized as a buddy page just after returning from dissolve_free_huge_page(). pcp_allowed_order() returns true for hugetlb, so this assumption is no longer true. So disable pcp during dissolve_free_huge_page() and take_page_off_buddy() to prevent soft-offlined hugepages from linking to pcp lists. Soft-offline should not be common events so the impact on performance should be minimal. And I think that the optimization of Mel's patch could benefit to hugetlb so zone_pcp_disable() is called only in hwpoison context. Link: https://lkml.kernel.org/r/20210617092626.291006-1-nao.horiguchi@xxxxxxxxx Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> Acked-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Cc: David Hildenbrand <david@xxxxxxxxxx> Cc: Oscar Salvador <osalvador@xxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/memory-failure.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) --- a/mm/memory-failure.c~mm-hwpoison-disable-pcp-for-page_handle_poison +++ a/mm/memory-failure.c @@ -65,6 +65,19 @@ int sysctl_memory_failure_recovery __rea atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); +static bool __page_handle_poison(struct page *page) +{ + bool ret; + + zone_pcp_disable(page_zone(page)); + ret = dissolve_free_huge_page(page); + if (!ret) + ret = take_page_off_buddy(page); + zone_pcp_enable(page_zone(page)); + + return ret; +} + static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) { if (hugepage_or_freepage) { @@ -72,7 +85,7 @@ static bool page_handle_poison(struct pa * Doing this check for free pages is also fine since dissolve_free_huge_page * returns 0 for non-hugetlb pages as well. */ - if (dissolve_free_huge_page(page) || !take_page_off_buddy(page)) + if (!__page_handle_poison(page)) /* * We could fail to take off the target page from buddy * for example due to racy page allocation, but that's @@ -842,7 +855,7 @@ static int me_huge_page(struct page *p, */ if (PageAnon(hpage)) put_page(hpage); - if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) { + if (__page_handle_poison(p)) { page_ref_inc(p); res = MF_RECOVERED; } @@ -1287,7 +1300,7 @@ static int memory_failure_hugetlb(unsign } unlock_page(head); res = MF_FAILED; - if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) { + if (__page_handle_poison(p)) { page_ref_inc(p); res = MF_RECOVERED; } _ Patches currently in -mm which might be from naoya.horiguchi@xxxxxxx are mm-hwpoison-do-not-lock-page-again-when-me_huge_page-successfully-recovers.patch mm-hwpoison-disable-pcp-for-page_handle_poison.patch mmhwpoison-send-sigbus-with-error-virutal-address.patch mmhwpoison-send-sigbus-with-error-virutal-address-fix.patch mmhwpoison-make-get_hwpoison_page-call-get_any_page.patch