Re: [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 29, 2018 at 10:24:44AM -0700, Mike Kravetz wrote:
> On 08/27/2018 06:46 AM, Jerome Glisse wrote:
> > On Mon, Aug 27, 2018 at 09:46:45AM +0200, Michal Hocko wrote:
> >> On Fri 24-08-18 11:08:24, Mike Kravetz wrote:
> >>> Here is an updated patch which does as you suggest above.
> >> [...]
> >>> @@ -1409,6 +1419,32 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >>>  		subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte);
> >>>  		address = pvmw.address;
> >>>  
> >>> +		if (PageHuge(page)) {
> >>> +			if (huge_pmd_unshare(mm, &address, pvmw.pte)) {
> >>> +				/*
> >>> +				 * huge_pmd_unshare unmapped an entire PMD
> >>> +				 * page.  There is no way of knowing exactly
> >>> +				 * which PMDs may be cached for this mm, so
> >>> +				 * we must flush them all.  start/end were
> >>> +				 * already adjusted above to cover this range.
> >>> +				 */
> >>> +				flush_cache_range(vma, start, end);
> >>> +				flush_tlb_range(vma, start, end);
> >>> +				mmu_notifier_invalidate_range(mm, start, end);
> >>> +
> >>> +				/*
> >>> +				 * The ref count of the PMD page was dropped
> >>> +				 * which is part of the way map counting
> >>> +				 * is done for shared PMDs.  Return 'true'
> >>> +				 * here.  When there is no other sharing,
> >>> +				 * huge_pmd_unshare returns false and we will
> >>> +				 * unmap the actual page and drop map count
> >>> +				 * to zero.
> >>> +				 */
> >>> +				page_vma_mapped_walk_done(&pvmw);
> >>> +				break;
> >>> +			}
> >>
> >> This still calls into notifier while holding the ptl lock. Either I am
> >> missing something or the invalidation is broken in this loop (not also
> >> for other invalidations).
> > 
> > mmu_notifier_invalidate_range() is done with pt lock held only the start
> > and end versions need to happen outside pt lock.
> 
> Hi Jérôme (and anyone else having good understanding of mmu notifier API),
> 
> Michal and I have been looking at backports to stable releases.  If you look
> at the v4.4 version of try_to_unmap_one(), it does not use the
> mmu_notifier_invalidate_range_start/end interfaces. Rather, it uses the
> mmu_notifier_invalidate_page(), passing in the address of the page it
> unmapped.  This is done after releasing the ptl lock.  I'm not even sure if
> this works for huge pages, as it appears some THP supporting code was added
> to try_to_unmap_one() after v4.4.
> 
> But, we were wondering what mmu notifier interface to use in the case where
> try_to_unmap_one() unmaps a shared pmd huge page as addressed in the patch
> above.  In this case, a PUD sized area is effectively unmapped.  In the
> code/patch above we have the invalidate range (start and end as well) take
> the PUD sized area into account.
> 
> What would be the best mmu notifier interface to use where there are no
> start/end calls?
> Or, is the best solution to add the start/end calls as is done in later
> versions of the code?  If that is the suggestion, has there been any change
> in invalidate start/end semantics that we should take into account?

start/end would be the one to add, 4.4 seems broken in respect to THP
and mmu notification. Another solution is to fix user of mmu notifier,
they were only a handful back then. For instance properly adjust the
address to match first address covered by pmd or pud and passing down
correct page size to mmu_notifier_invalidate_page() would allow to fix
this easily.

This is ok because user of try_to_unmap_one() replace the pte/pmd/pud
with an invalid one (either poison, migration or swap) inside the
function. So anyone racing would synchronize on those special entry
hence why it is fine to delay mmu_notifier_invalidate_page() to after
dropping the page table lock.

Adding start/end might the solution with less code churn as you would
only need to change try_to_unmap_one().

Cheers,
Jérôme



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux