Re: [PATCH v4 08/14] mm/gup: grab head page refcount once for group of subpages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/30/21 2:07 PM, Jason Gunthorpe wrote:
> On Fri, Aug 27, 2021 at 07:34:54PM +0100, Joao Martins wrote:
>> On 8/27/21 5:25 PM, Jason Gunthorpe wrote:
>>> On Fri, Aug 27, 2021 at 03:58:13PM +0100, Joao Martins wrote:
>>>
>>>>  #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
>>>>  static int __gup_device_huge(unsigned long pfn, unsigned long addr,
>>>>  			     unsigned long end, unsigned int flags,
>>>>  			     struct page **pages, int *nr)
>>>>  {
>>>> -	int nr_start = *nr;
>>>> +	int refs, nr_start = *nr;
>>>>  	struct dev_pagemap *pgmap = NULL;
>>>>  	int ret = 1;
>>>>  
>>>>  	do {
>>>> -		struct page *page = pfn_to_page(pfn);
>>>> +		struct page *head, *page = pfn_to_page(pfn);
>>>> +		unsigned long next = addr + PAGE_SIZE;
>>>>  
>>>>  		pgmap = get_dev_pagemap(pfn, pgmap);
>>>>  		if (unlikely(!pgmap)) {
>>>> @@ -2252,16 +2265,25 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
>>>>  			ret = 0;
>>>>  			break;
>>>>  		}
>>>> -		SetPageReferenced(page);
>>>> -		pages[*nr] = page;
>>>> -		if (unlikely(!try_grab_page(page, flags))) {
>>>> -			undo_dev_pagemap(nr, nr_start, flags, pages);
>>>> +
>>>> +		head = compound_head(page);
>>>> +		/* @end is assumed to be limited at most one compound page */
>>>> +		if (PageHead(head))
>>>> +			next = end;
>>>> +		refs = record_subpages(page, addr, next, pages + *nr);
>>>> +
>>>> +		SetPageReferenced(head);
>>>> +		if (unlikely(!try_grab_compound_head(head, refs, flags))) {
>>>> +			if (PageHead(head))
>>>> +				ClearPageReferenced(head);
>>>> +			else
>>>> +				undo_dev_pagemap(nr, nr_start, flags, pages);
>>>>  			ret = 0;
>>>>  			break;
>>>
>>> Why is this special cased for devmap?
>>>
>>> Shouldn't everything processing pud/pmds/etc use the same basic loop
>>> that is similar in idea to the 'for_each_compound_head' scheme in
>>> unpin_user_pages_dirty_lock()?
>>>
>>> Doesn't that work for all the special page type cases here?
>>
>> We are iterating over PFNs to create an array of base pages (regardless of page table
>> type), rather than iterating over an array of pages to work on. 
> 
> That is part of it, yes, but the slow bit here is to minimally find
> the head pages and do the atomics on them, much like the
> unpin_user_pages_dirty_lock()
> 
> I would think this should be designed similar to how things work on
> the unpin side.
> 
I don't think it's the same thing. The bit you say 'minimally find the
head pages' carries a visible overhead in unpin_user_pages() as we are
checking each of the pages belongs to the same head page -- because you
can pass an arbritary set of pages. This does have a cost which is not
in gup-fast right now AIUI. Whereas in our gup-fast 'handler' you
already know that you are processing a contiguous chunk of pages.
If anything, we are closer to unpin_user_page_range*()
than unpin_user_pages().

> Sweep the page tables to find a proper start/end - eg even if a
> compound is spread across multiple pte/pmd/pud/etc we should find a
> linear range of starting PFN (ie starting page*) and npages across as
> much of the page tables as we can manage. This is the same as where
> things end up in the unpin case where all the contiguous PFNs are
> grouped togeher into a range.
> 
> Then 'assign' that range to the output array which requires walking
> over each compount_head in the range and pinning it, then writing out
> the tail pages to the output struct page array.
> 
> And this approach should apply universally no matter what is under the
> pte's - ie huge pages, THPs and devmaps should all be treated the same
> way. Currently each case is different, like above which is unique to
> device_huge.
> 
Only devmap gup-fast is different IIUC.

Switching to similar iteration logic to unpin would look something like
this (still untested):

        for_each_compound_range(index, &page, npages, head, refs) {
                pgmap = get_dev_pagemap(pfn + *nr, pgmap);
                if (unlikely(!pgmap)) {
                        undo_dev_pagemap(nr, nr_start, flags, pages);
                        ret = 0;
                        break;
                }

                SetPageReferenced(head);
                if (unlikely(!try_grab_compound_head(head, refs, flags))) {
                        if (PageHead(head))
                                ClearPageReferenced(head);
                        else
                                undo_dev_pagemap(nr, nr_start, flags, pages);
                        ret = 0;
                        break;
                }

                record_subpages(page + *nr, addr,
                                addr + (refs << PAGE_SHIFT), pages + *nr);
                *(nr) += refs;
		addr += (refs << PAGE_SHIFT);
        }


But it looks to be a tidbit more complex and not really aligning with the
rest of gup-fast.

All in all, I am dealing with the fact that 1) devmap pmds/puds may not
be represented with compound pages and 2) we temporarily grab dev_pagemap reference
prior to pinning the page. Those two items is what makes this different than THPs/HugeTLB
(which do have the same logic). And thus it's what lead me to *slightly* improve
gup_device_huge().



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux