On Tue, Jun 08, 2021 at 05:22:39PM +0800, Xu Yu wrote: > We notice that hung task happens in a conner but practical scenario when > CONFIG_PREEMPT_NONE is enabled, as follows. > > Process 0 Process 1 Process 2..Inf > split_huge_page_to_list > unmap_page > split_huge_pmd_address > __migration_entry_wait(head) > __migration_entry_wait(tail) > remap_page (roll back) > remove_migration_ptes > rmap_walk_anon > cond_resched > > Where __migration_entry_wait(tail) is occurred in kernel space, e.g., > copy_to_user in fstat, which will immediately fault again without > rescheduling, and thus occupy the cpu fully. > > When there are too many processes performing __migration_entry_wait on > tail page, remap_page will never be done after cond_resched. > > This makes __migration_entry_wait operate on the compound head page, > thus waits for remap_page to complete, whether the THP is split > successfully or roll back. > > Note that put_and_wait_on_page_locked helps to drop the page reference > acquired with get_page_unless_zero, as soon as the page is on the wait > queue, before actually waiting. So splitting the THP is only prevented > for a brief interval. > > Fixes: ba98828088ad ("thp: add option to setup migration entries during PMD split") > Suggested-by: Hugh Dickins <hughd@xxxxxxxxxx> > Signed-off-by: Gang Deng <gavin.dg@xxxxxxxxxxxxxxxxx> > Signed-off-by: Xu Yu <xuyu@xxxxxxxxxxxxxxxxx> Looks good to me: Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> But there's one quirk: if split succeed we effectively wait on wrong page to be unlocked. And it may take indefinite time if split_huge_page() was called on the head page. Maybe we should consider waking up head waiter on head page, even if it is still locked after split? Something like this (untested): diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 63ed6b25deaa..f79a38e21e53 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2535,6 +2535,9 @@ static void __split_huge_page(struct page *page, struct list_head *list, */ put_page(subpage); } + + if (page == head) + wake_up_page_bit(page, PG_locked); } int total_mapcount(struct page *page) -- Kirill A. Shutemov