Re: [PATCH v4 08/14] mm/gup: grab head page refcount once for group of subpages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 31, 2021 at 01:34:04PM +0100, Joao Martins wrote:
> On 8/30/21 2:07 PM, Jason Gunthorpe wrote:
> > On Fri, Aug 27, 2021 at 07:34:54PM +0100, Joao Martins wrote:
> >> On 8/27/21 5:25 PM, Jason Gunthorpe wrote:
> >>> On Fri, Aug 27, 2021 at 03:58:13PM +0100, Joao Martins wrote:
> >>>
> >>>>  #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
> >>>>  static int __gup_device_huge(unsigned long pfn, unsigned long addr,
> >>>>  			     unsigned long end, unsigned int flags,
> >>>>  			     struct page **pages, int *nr)
> >>>>  {
> >>>> -	int nr_start = *nr;
> >>>> +	int refs, nr_start = *nr;
> >>>>  	struct dev_pagemap *pgmap = NULL;
> >>>>  	int ret = 1;
> >>>>  
> >>>>  	do {
> >>>> -		struct page *page = pfn_to_page(pfn);
> >>>> +		struct page *head, *page = pfn_to_page(pfn);
> >>>> +		unsigned long next = addr + PAGE_SIZE;
> >>>>  
> >>>>  		pgmap = get_dev_pagemap(pfn, pgmap);
> >>>>  		if (unlikely(!pgmap)) {
> >>>> @@ -2252,16 +2265,25 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
> >>>>  			ret = 0;
> >>>>  			break;
> >>>>  		}
> >>>> -		SetPageReferenced(page);
> >>>> -		pages[*nr] = page;
> >>>> -		if (unlikely(!try_grab_page(page, flags))) {
> >>>> -			undo_dev_pagemap(nr, nr_start, flags, pages);
> >>>> +
> >>>> +		head = compound_head(page);
> >>>> +		/* @end is assumed to be limited at most one compound page */
> >>>> +		if (PageHead(head))
> >>>> +			next = end;
> >>>> +		refs = record_subpages(page, addr, next, pages + *nr);
> >>>> +
> >>>> +		SetPageReferenced(head);
> >>>> +		if (unlikely(!try_grab_compound_head(head, refs, flags))) {
> >>>> +			if (PageHead(head))
> >>>> +				ClearPageReferenced(head);
> >>>> +			else
> >>>> +				undo_dev_pagemap(nr, nr_start, flags, pages);
> >>>>  			ret = 0;
> >>>>  			break;
> >>>
> >>> Why is this special cased for devmap?
> >>>
> >>> Shouldn't everything processing pud/pmds/etc use the same basic loop
> >>> that is similar in idea to the 'for_each_compound_head' scheme in
> >>> unpin_user_pages_dirty_lock()?
> >>>
> >>> Doesn't that work for all the special page type cases here?
> >>
> >> We are iterating over PFNs to create an array of base pages (regardless of page table
> >> type), rather than iterating over an array of pages to work on. 
> > 
> > That is part of it, yes, but the slow bit here is to minimally find
> > the head pages and do the atomics on them, much like the
> > unpin_user_pages_dirty_lock()
> > 
> > I would think this should be designed similar to how things work on
> > the unpin side.
> > 
> I don't think it's the same thing. The bit you say 'minimally find the
> head pages' carries a visible overhead in unpin_user_pages() as we are
> checking each of the pages belongs to the same head page -- because you
> can pass an arbritary set of pages. This does have a cost which is not
> in gup-fast right now AIUI. Whereas in our gup-fast 'handler' you
> already know that you are processing a contiguous chunk of pages.
> If anything, we are closer to unpin_user_page_range*()
> than unpin_user_pages().

Yes, that is what I mean, it is very similar to the range case as we
don't even know that a single compound spans a pud/pmd. So you end up
doing the same loop to find the compound boundaries.

Under GUP slow we can also aggregate multiple page table entires, eg a
split huge page could be procesed as a single 2M range operation even
if it is broken to 4K PTEs.
> 
> Switching to similar iteration logic to unpin would look something like
> this (still untested):
> 
>         for_each_compound_range(index, &page, npages, head, refs) {
>                 pgmap = get_dev_pagemap(pfn + *nr, pgmap);

I recall talking to DanW about this and we agreed it was unnecessary
here to hold the pgmap and should be deleted.

Jason




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux