On Tue, May 25, 2021 at 09:36:05AM +0200, Oscar Salvador wrote: > On Thu, May 20, 2021 at 07:17:17AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote: > > So I think of inserting the check to comply with the assumption of > > get_hwpoison_huge_page() like below: > > > > ret = get_hwpoison_huge_page(head, &hugetlb); > > if (hugetlb) > > return ret; > > > > if (!PageLRU(head) && !__PageMovable(head)) > > return 0; > > > > if (PageTransHuge(head)) { > > ... > > } > > > > if (get_page_unless_zero(head)) { > > ... > > } > > > > return 0; > > Hi Naoya, > > would you mind posting a complete draft of what it would look like? > I am having a hard time picturing it. OK, here's the current draft. Thanks, Naoya Horiguchi --- From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> Date: Tue, 18 May 2021 23:49:18 +0900 Subject: [PATCH] mm,hwpoison: fix race with hugetlb page allocation When hugetlb page fault (under overcommitting situation) and memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race: CPU0: CPU1: gather_surplus_pages() page = alloc_surplus_huge_page() memory_failure_hugetlb() get_hwpoison_page(page) __get_hwpoison_page(page) get_page_unless_zero(page) zero = put_page_testzero(page) VM_BUG_ON_PAGE(!zero, page) enqueue_huge_page(h, page) put_page(page) __get_hwpoison_page() only checks the page refcount before taking an additional one for memory error handling, which is wrong because there's a time window where compound pages have non-zero refcount during initialization. So make __get_hwpoison_page() check page status a bit more for hugetlb pages. Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> Reported-by: Muchun Song <songmuchun@xxxxxxxxxxxxx> Cc: stable@xxxxxxxxxxxxxxx # 5.12+ --- include/linux/hugetlb.h | 6 ++++++ mm/hugetlb.c | 15 +++++++++++++++ mm/memory-failure.c | 11 ++++++++++- 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b92f25ccef58..790ae618548d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -149,6 +149,7 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, long hugetlb_unreserve_pages(struct inode *inode, long start, long end, long freed); bool isolate_huge_page(struct page *page, struct list_head *list); +int get_hwpoison_huge_page(struct page *page, bool *hugetlb); void putback_active_hugepage(struct page *page); void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason); void free_huge_page(struct page *page); @@ -339,6 +340,11 @@ static inline bool isolate_huge_page(struct page *page, struct list_head *list) return false; } +static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb) +{ + return 0; +} + static inline void putback_active_hugepage(struct page *page) { } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 95918f410c0f..f138bae3e302 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5847,6 +5847,21 @@ bool isolate_huge_page(struct page *page, struct list_head *list) return ret; } +int get_hwpoison_huge_page(struct page *page, bool *hugetlb) +{ + int ret = 0; + + *hugetlb = false; + spin_lock_irq(&hugetlb_lock); + if (PageHeadHuge(page)) { + *hugetlb = true; + if (HPageFreed(page) || HPageMigratable(page)) + ret = get_page_unless_zero(page); + } + spin_unlock_irq(&hugetlb_lock); + return ret; +} + void putback_active_hugepage(struct page *page) { spin_lock_irq(&hugetlb_lock); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 85ad98c00fd9..4c264c4090d7 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -959,8 +959,17 @@ static int page_action(struct page_state *ps, struct page *p, static int __get_hwpoison_page(struct page *page) { struct page *head = compound_head(page); + int ret = 0; + bool hugetlb = false; + + ret = get_hwpoison_huge_page(head, &hugetlb); + if (hugetlb) + return ret; + + if (!PageLRU(head) && !__PageMovable(head)) + return 0; - if (!PageHuge(head) && PageTransHuge(head)) { + if (PageTransHuge(head)) { /* * Non anonymous thp exists only in allocation/free time. We * can't handle such a case correctly, so let's give it up. -- 2.25.1