On 4/21/21 1:33 AM, HORIGUCHI NAOYA(堀口 直也) wrote: > On Wed, Apr 21, 2021 at 10:03:34AM +0200, Michal Hocko wrote: >> [Cc Naoya] >> >> On Wed 21-04-21 14:02:59, Muchun Song wrote: >>> The possible bad scenario: >>> >>> CPU0: CPU1: >>> >>> gather_surplus_pages() >>> page = alloc_surplus_huge_page() >>> memory_failure_hugetlb() >>> get_hwpoison_page(page) >>> __get_hwpoison_page(page) >>> get_page_unless_zero(page) >>> zero = put_page_testzero(page) >>> VM_BUG_ON_PAGE(!zero, page) >>> enqueue_huge_page(h, page) >>> put_page(page) >>> >>> The refcount can possibly be increased by memory-failure or soft_offline >>> handlers, we can trigger VM_BUG_ON_PAGE and wrongly add the page to the >>> hugetlb pool list. >> >> The hwpoison side of this looks really suspicious to me. It shouldn't >> really touch the reference count of hugetlb pages without being very >> careful (and having hugetlb_lock held). > > I have the same feeling, there is a window where a hugepage is refcounted > during converting from buddy free pages into free hugepage, so refcount > alone is not enough to prevent the race. hugetlb_lock is retaken after > alloc_surplus_huge_page returns, so simply holding hugetlb_lock in > get_hwpoison_page() seems not work. Is there any status bit to show that a > hugepage is just being initialized (not in free hugepage pool or in use)? > It seems we can also race with the code that makes a compound page a hugetlb page. The memory failure code could be called after allocating pages from buddy and before setting compound page DTOR. So, the memory handling code will process it as a compound page. Just thinking that this may not be limited to the hugetlb specific memory failure handling? -- Mike Kravetz