Re: [PATCH v6 1/2] mm,hwpoison: fix race with hugetlb page allocation

"Luck, Tony" <tony.luck@xxxxxxxxx> · Wed, 11 Aug 2021 21:28:13 -0700

On Fri, Jun 04, 2021 at 08:36:31AM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
> 
> When hugetlb page fault (under overcommitting situation) and
> memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> 
>     CPU0:                           CPU1:
> 
>                                     gather_surplus_pages()
>                                       page = alloc_surplus_huge_page()
>     memory_failure_hugetlb()
>       get_hwpoison_page(page)
>         __get_hwpoison_page(page)
>           get_page_unless_zero(page)
>                                       zero = put_page_testzero(page)
>                                       VM_BUG_ON_PAGE(!zero, page)
>                                       enqueue_huge_page(h, page)
>       put_page(page)
> 
> __get_hwpoison_page() only checks the page refcount before taking an
> additional one for memory error handling, which is not enough because
> there's a time window where compound pages have non-zero refcount during
> hugetlb page initialization.
> 
> So make __get_hwpoison_page() check page status a bit more for hugetlb
> pages with get_hwpoison_huge_page(). Checking hugetlb-specific flags
> under hugetlb_lock makes sure that the hugetlb page is not transitive.
> It's notable that another new function, HWPoisonHandlable(), is helpful
> to prevent a race against other transitive page states (like a generic
> compound page just before PageHuge becomes true).

I'm seeing some strange results when doing a simple injection/recovery.

Current upstream often fails to offline the page with messages like:

	"high-order kernel page"
or
	"unknown page"

Things were working in v5.12. Broken in v5.13.

Bisect says that:

25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation")

is the culprit (though it is possible that there is more than one
issue ... failure symptoms changed a bit during the bisection).

This commit doesn't revert automatically from upstream. But it
does revert from v5.13. Running with this reverted from v5.13
gives kernel that recovers normally[1] from hundreds of consecutive
error injections.

-Tony

[1] Almost normally. My test catches SIGBUS and prints the virtual
address from the siginfo_t structure. Sometimes the address is correct
other times it is NULL.