There is a potential race between __update_and_free_hugetlb_folio() and try_memory_failure_hugetlb(): CPU1 CPU2 __update_and_free_hugetlb_folio try_memory_failure_hugetlb spin_lock_irq(&hugetlb_lock); __get_huge_page_for_hwpoison folio_test_hugetlb -- It's still hugetlb folio. folio_test_hugetlb_raw_hwp_unreliable -- raw_hwp_unreliable flag is not set yet. folio_set_hugetlb_hwpoison -- raw_hwp_unreliable flag might be set. spin_unlock_irq(&hugetlb_lock); spin_lock_irq(&hugetlb_lock); __folio_clear_hugetlb(folio); -- Hugetlb flag is cleared but too late! spin_unlock_irq(&hugetlb_lock); When above race occurs, raw error pages will hit pcplists/buddy. Fix this issue by deferring folio_test_hugetlb_raw_hwp_unreliable() until __folio_clear_hugetlb() is done. The raw_hwp_unreliable flag cannot be set after hugetlb folio flag is cleared. Fixes: 32c877191e02 ("hugetlb: do not clear hugetlb dtor until allocating vmemmap") Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> --- mm/hugetlb.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9155144a654c..3d65b68cf78f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1705,13 +1705,6 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) return; - /* - * If we don't know which subpages are hwpoisoned, we can't free - * the hugepage, so it's leaked intentionally. - */ - if (folio_test_hugetlb_raw_hwp_unreliable(folio)) - return; - /* * If folio is not vmemmap optimized (!clear_flag), then the folio * is no longer identified as a hugetlb page. hugetlb_vmemmap_restore_folio @@ -1739,6 +1732,13 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, spin_unlock_irq(&hugetlb_lock); } + /* + * If we don't know which subpages are hwpoisoned, we can't free + * the hugepage, so it's leaked intentionally. + */ + if (folio_test_hugetlb_raw_hwp_unreliable(folio)) + return; + /* * Move PageHWPoison flag from head page to the raw error pages, * which makes any healthy subpages reusable. -- 2.33.0