On Tue, Aug 31, 2021 at 01:34:04PM +0100, Joao Martins wrote: > On 8/30/21 2:07 PM, Jason Gunthorpe wrote: > > On Fri, Aug 27, 2021 at 07:34:54PM +0100, Joao Martins wrote: > >> On 8/27/21 5:25 PM, Jason Gunthorpe wrote: > >>> On Fri, Aug 27, 2021 at 03:58:13PM +0100, Joao Martins wrote: > >>> > >>>> #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE) > >>>> static int __gup_device_huge(unsigned long pfn, unsigned long addr, > >>>> unsigned long end, unsigned int flags, > >>>> struct page **pages, int *nr) > >>>> { > >>>> - int nr_start = *nr; > >>>> + int refs, nr_start = *nr; > >>>> struct dev_pagemap *pgmap = NULL; > >>>> int ret = 1; > >>>> > >>>> do { > >>>> - struct page *page = pfn_to_page(pfn); > >>>> + struct page *head, *page = pfn_to_page(pfn); > >>>> + unsigned long next = addr + PAGE_SIZE; > >>>> > >>>> pgmap = get_dev_pagemap(pfn, pgmap); > >>>> if (unlikely(!pgmap)) { > >>>> @@ -2252,16 +2265,25 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, > >>>> ret = 0; > >>>> break; > >>>> } > >>>> - SetPageReferenced(page); > >>>> - pages[*nr] = page; > >>>> - if (unlikely(!try_grab_page(page, flags))) { > >>>> - undo_dev_pagemap(nr, nr_start, flags, pages); > >>>> + > >>>> + head = compound_head(page); > >>>> + /* @end is assumed to be limited at most one compound page */ > >>>> + if (PageHead(head)) > >>>> + next = end; > >>>> + refs = record_subpages(page, addr, next, pages + *nr); > >>>> + > >>>> + SetPageReferenced(head); > >>>> + if (unlikely(!try_grab_compound_head(head, refs, flags))) { > >>>> + if (PageHead(head)) > >>>> + ClearPageReferenced(head); > >>>> + else > >>>> + undo_dev_pagemap(nr, nr_start, flags, pages); > >>>> ret = 0; > >>>> break; > >>> > >>> Why is this special cased for devmap? > >>> > >>> Shouldn't everything processing pud/pmds/etc use the same basic loop > >>> that is similar in idea to the 'for_each_compound_head' scheme in > >>> unpin_user_pages_dirty_lock()? > >>> > >>> Doesn't that work for all the special page type cases here? > >> > >> We are iterating over PFNs to create an array of base pages (regardless of page table > >> type), rather than iterating over an array of pages to work on. > > > > That is part of it, yes, but the slow bit here is to minimally find > > the head pages and do the atomics on them, much like the > > unpin_user_pages_dirty_lock() > > > > I would think this should be designed similar to how things work on > > the unpin side. > > > I don't think it's the same thing. The bit you say 'minimally find the > head pages' carries a visible overhead in unpin_user_pages() as we are > checking each of the pages belongs to the same head page -- because you > can pass an arbritary set of pages. This does have a cost which is not > in gup-fast right now AIUI. Whereas in our gup-fast 'handler' you > already know that you are processing a contiguous chunk of pages. > If anything, we are closer to unpin_user_page_range*() > than unpin_user_pages(). Yes, that is what I mean, it is very similar to the range case as we don't even know that a single compound spans a pud/pmd. So you end up doing the same loop to find the compound boundaries. Under GUP slow we can also aggregate multiple page table entires, eg a split huge page could be procesed as a single 2M range operation even if it is broken to 4K PTEs. > > Switching to similar iteration logic to unpin would look something like > this (still untested): > > for_each_compound_range(index, &page, npages, head, refs) { > pgmap = get_dev_pagemap(pfn + *nr, pgmap); I recall talking to DanW about this and we agreed it was unnecessary here to hold the pgmap and should be deleted. Jason