On Tue, Jun 08, 2021 at 09:22:28PM +0800, Yu Xu wrote: > On 6/8/21 8:00 PM, Kirill A. Shutemov wrote: > > On Tue, Jun 08, 2021 at 05:22:39PM +0800, Xu Yu wrote: > > > We notice that hung task happens in a conner but practical scenario when > > > CONFIG_PREEMPT_NONE is enabled, as follows. > > > > > > Process 0 Process 1 Process 2..Inf > > > split_huge_page_to_list > > > unmap_page > > > split_huge_pmd_address > > > __migration_entry_wait(head) > > > __migration_entry_wait(tail) > > > remap_page (roll back) > > > remove_migration_ptes > > > rmap_walk_anon > > > cond_resched > > > > > > Where __migration_entry_wait(tail) is occurred in kernel space, e.g., > > > copy_to_user in fstat, which will immediately fault again without > > > rescheduling, and thus occupy the cpu fully. > > > > > > When there are too many processes performing __migration_entry_wait on > > > tail page, remap_page will never be done after cond_resched. > > > > > > This makes __migration_entry_wait operate on the compound head page, > > > thus waits for remap_page to complete, whether the THP is split > > > successfully or roll back. > > > > > > Note that put_and_wait_on_page_locked helps to drop the page reference > > > acquired with get_page_unless_zero, as soon as the page is on the wait > > > queue, before actually waiting. So splitting the THP is only prevented > > > for a brief interval. > > > > > > Fixes: ba98828088ad ("thp: add option to setup migration entries during PMD split") > > > Suggested-by: Hugh Dickins <hughd@xxxxxxxxxx> > > > Signed-off-by: Gang Deng <gavin.dg@xxxxxxxxxxxxxxxxx> > > > Signed-off-by: Xu Yu <xuyu@xxxxxxxxxxxxxxxxx> > > > > Looks good to me: > > > > Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > > > > But there's one quirk: if split succeed we effectively wait on wrong > > page to be unlocked. And it may take indefinite time if split_huge_page() > > was called on the head page. > > Inspired by you, I look into the codes, and have a new question (nothing > to do with this patch). > > If we split_huge_page_to_list on *tail* page (in fact, I haven't seen > that used yet), See ksm code for instance. > mm/huge_memory.c:2666 checks "VM_BUG_ON_PAGE(!PageLocked(head), head);" > in split_huge_page_to_list(), while > > mm/huge_memory.c:2497 does "unlock_page(subpage)", where subpage can > be head in this scenario, in __split_huge_page(). > > My confusion is > 1) how the pin on the @subpage is got outside split_huge_page_to_list()? > can we ever get tail page? This loop: for (i = 0; i < nr; i++) { struct page *subpage = head + i; if (subpage == page) continue; unlock_page(subpage); /* * Subpages may be freed if there wasn't any mapping * like if add_to_swap() is running on a lru page that * had its mapping zapped. And freeing these pages * requires taking the lru_lock so we do the put_page * of the tail pages after the split is complete. */ put_page(subpage); } We skip unlocking and unpinning the page split_huge_page() got called for. > > 2) head page is locked outside split_huge_page_to_list(), but unlocked > in __split_huge_page()? If called on tail page, yes. -- Kirill A. Shutemov