On 5/18/21 4:12 PM, Naoya Horiguchi wrote: > From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > When hugetlb page fault (under overcommitting situation) and > memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race: > > CPU0: CPU1: > > gather_surplus_pages() > page = alloc_surplus_huge_page() > memory_failure_hugetlb() > get_hwpoison_page(page) > __get_hwpoison_page(page) > get_page_unless_zero(page) > zero = put_page_testzero(page) > VM_BUG_ON_PAGE(!zero, page) > enqueue_huge_page(h, page) > put_page(page) > > __get_hwpoison_page() only checks the page refcount before taking an > additional one for memory error handling, which is wrong because there's > a time window where compound pages have non-zero refcount during > initialization. So make __get_hwpoison_page() check page status a bit > more for hugetlb pages. > > Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling") > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > Reported-by: Muchun Song <songmuchun@xxxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx # 5.12+ > --- > include/linux/hugetlb.h | 6 ++++++ > mm/hugetlb.c | 15 +++++++++++++++ > mm/memory-failure.c | 8 +++++++- > 3 files changed, 28 insertions(+), 1 deletion(-) > > diff --git v5.13-rc2/include/linux/hugetlb.h v5.13-rc2_patched/include/linux/hugetlb.h > index b92f25ccef58..790ae618548d 100644 > --- v5.13-rc2/include/linux/hugetlb.h > +++ v5.13-rc2_patched/include/linux/hugetlb.h > @@ -149,6 +149,7 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, > long hugetlb_unreserve_pages(struct inode *inode, long start, long end, > long freed); > bool isolate_huge_page(struct page *page, struct list_head *list); > +int get_hwpoison_huge_page(struct page *page, bool *hugetlb); > void putback_active_hugepage(struct page *page); > void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason); > void free_huge_page(struct page *page); > @@ -339,6 +340,11 @@ static inline bool isolate_huge_page(struct page *page, struct list_head *list) > return false; > } > > +static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb) > +{ > + return 0; > +} > + > static inline void putback_active_hugepage(struct page *page) > { > } > diff --git v5.13-rc2/mm/hugetlb.c v5.13-rc2_patched/mm/hugetlb.c > index 95918f410c0f..f138bae3e302 100644 > --- v5.13-rc2/mm/hugetlb.c > +++ v5.13-rc2_patched/mm/hugetlb.c > @@ -5847,6 +5847,21 @@ bool isolate_huge_page(struct page *page, struct list_head *list) > return ret; > } > > +int get_hwpoison_huge_page(struct page *page, bool *hugetlb) > +{ > + int ret = 0; > + > + *hugetlb = false; > + spin_lock_irq(&hugetlb_lock); > + if (PageHeadHuge(page)) { > + *hugetlb = true; > + if (HPageFreed(page) || HPageMigratable(page)) > + ret = get_page_unless_zero(page); > + } > + spin_unlock_irq(&hugetlb_lock); > + return ret; > +} > + > void putback_active_hugepage(struct page *page) > { > spin_lock_irq(&hugetlb_lock); > diff --git v5.13-rc2/mm/memory-failure.c v5.13-rc2_patched/mm/memory-failure.c > index 85ad98c00fd9..353c6177e489 100644 > --- v5.13-rc2/mm/memory-failure.c > +++ v5.13-rc2_patched/mm/memory-failure.c > @@ -959,8 +959,14 @@ static int page_action(struct page_state *ps, struct page *p, > static int __get_hwpoison_page(struct page *page) > { > struct page *head = compound_head(page); > + int ret = 0; > + bool hugetlb = false; > + > + ret = get_hwpoison_huge_page(head, &hugetlb); > + if (hugetlb) > + return ret; > Hello Naoya, Thanks for your continued efforts. However, I believe the race still exists. Unless I am mistaken, it is possible that page is in the hugetlb allocation patch and racing with __get_hwpoison_page() as follows: CPU0: CPU1: gather_surplus_pages() page = alloc_surplus_huge_page() page = alloc_fresh_huge_page() page = alloc_buddy_huge_page() memory_failure_hugetlb() get_hwpoison_page(page) __get_hwpoison_page(page) get_hwpoison_huge_page() /* Note that PageHuge() is false, so hugetlb not set */ PageTransHuge(head) false prep_new_huge_page(page) /* Now PageHuge() becomes true */ get_page_unless_zero(page) I am not sure if it is possible to handle this race in the memory error code. I can not think of a way to avoid potentially incrementing the ref count on a hugetlb page as it is being created. There is nothing synchronizing this in the hugetlb code. When Muchun first proposed a fix to the race, the idea was to catch the race in the hugetlb code. Michal suggested that the memory error code be more careful in modifying ref counts. I would wait a bit to see if someone has a good idea how this can be done. We 'may' need to revisit the approach suggested by Muchun. -- Mike Kravetz