On 2/6/22 22:49, Hugh Dickins wrote: > Page migration of a VM_LOCKED page tends to fail, because when the old > page is unmapped, it is put on the mlock pagevec with raised refcount, > which then fails the freeze. > > At first I thought this would be fixed by a local mlock_page_drain() at > the upper rmap_walk() level - which would have nicely batched all the > munlocks of that page; but tests show that the task can too easily move > to another cpu, leaving pagevec residue behind which fails the migration. > > So try_to_migrate_one() drain the local pagevec after page_remove_rmap() > from a VM_LOCKED vma; and do the same in try_to_unmap_one(), whose > TTU_IGNORE_MLOCK users would want the same treatment; and do the same > in remove_migration_pte() - not important when successfully inserting > a new page, but necessary when hoping to retry after failure. > > Any new pagevec runs the risk of adding a new way of stranding, and we > might discover other corners where mlock_page_drain() or lru_add_drain() > would now help. If the mlock pagevec raises doubts, we can easily add a > sysctl to tune its length to 1, which reverts to synchronous operation. Not a fan of adding new sysctls like those as that just pushes the failure of kernel devs to poor admins :) The old pagevec usage deleted by patch 1 was limited to the naturally larger munlock_vma_pages_range() operation. The new per-cpu based one is more general, which obviously has its advantages, but then it might bring new corner cases. So if this turns out to be an big problem, I would rather go back to the limited scenario pagevec than a sysctl? > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> Acked-by: Vlastimil Babka <vbabka@xxxxxxx> > --- > mm/migrate.c | 2 ++ > mm/rmap.c | 4 ++++ > 2 files changed, 6 insertions(+) > > diff --git a/mm/migrate.c b/mm/migrate.c > index f4bcf1541b62..e7d0b68d5dcb 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -251,6 +251,8 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, > page_add_file_rmap(new, vma, false); > set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); > } > + if (vma->vm_flags & VM_LOCKED) > + mlock_page_drain(smp_processor_id()); > > /* No need to invalidate - it was non-present before */ > update_mmu_cache(vma, pvmw.address, pvmw.pte); > diff --git a/mm/rmap.c b/mm/rmap.c > index 5442a5c97a85..714bfdc72c7b 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1656,6 +1656,8 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > * See Documentation/vm/mmu_notifier.rst > */ > page_remove_rmap(subpage, vma, PageHuge(page)); > + if (vma->vm_flags & VM_LOCKED) > + mlock_page_drain(smp_processor_id()); > put_page(page); > } > > @@ -1930,6 +1932,8 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, > * See Documentation/vm/mmu_notifier.rst > */ > page_remove_rmap(subpage, vma, PageHuge(page)); > + if (vma->vm_flags & VM_LOCKED) > + mlock_page_drain(smp_processor_id()); > put_page(page); > } >