On Wed, 2 Jun 2021, Xu Yu wrote: > We notice that hung task happens in a conner but practical scenario when > CONFIG_PREEMPT_NONE is enabled, as follows. > > Process 0 Process 1 Process 2..Inf > split_huge_page_to_list > unmap_page > split_huge_pmd_address > __migration_entry_wait(head) > __migration_entry_wait(tail) > remap_page (roll back) > remove_migration_ptes > rmap_walk_anon > cond_resched > > Where __migration_entry_wait(tail) is occurred in kernel space, e.g., > copy_to_user, which will immediately fault again without rescheduling, > and thus occupy the cpu fully. > > When there are too many processes performing __migration_entry_wait on > tail page, remap_page will never be done after cond_resched. > > This relaxes __migration_entry_wait on tail page, thus gives remap_page > a chance to complete. > > Signed-off-by: Gang Deng <gavin.dg@xxxxxxxxxxxxxxxxx> > Signed-off-by: Xu Yu <xuyu@xxxxxxxxxxxxxxxxx> Well caught: you're absolutely right that there's a bug there. But isn't cond_resched() just papering over the real bug, and what it should do is a "page = compound_head(page);" before the get_page_unless_zero()? How does that work out in your testing? Hugh > --- > mm/migrate.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index b234c3f3acb7..df2dc39fe566 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -301,8 +301,11 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, > * is zero; but we must not call put_and_wait_on_page_locked() without > * a ref. Use get_page_unless_zero(), and just fault again if it fails. > */ > - if (!get_page_unless_zero(page)) > - goto out; > + if (!get_page_unless_zero(page)) { > + pte_unmap_unlock(ptep, ptl); > + cond_resched(); > + return; > + } > pte_unmap_unlock(ptep, ptl); > put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE); > return; > -- > 2.20.1.2432.ga663e714 > > >