Re: [PATCH v4 08/14] mm/gup: grab head page refcount once for group of subpages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 11, 2021 at 04:53:29PM +0100, Joao Martins wrote:
> On 10/8/21 12:54, Jason Gunthorpe wrote:

> > The only optimization that might work here is to grab the head, then
> > compute the extent of tail pages and amalgamate them. Holding a ref on
> > the head also secures the tails.
> 
> How about pmd_page(orig) / pud_page(orig) like what the rest of hugetlb/thp
> checks do? i.e. we would pass pmd_page(orig)/pud_page(orig) to __gup_device_huge()
> as an added @head argument. While keeping the same structure of counting tail pages
> between @addr .. @end if we have a head page.

The right logic is what everything else does:

	page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
	refs = record_subpages(page, addr, end, pages + *nr);
	head = try_grab_compound_head(pud_page(orig), refs, flags);

If you can use this, or not, depends entirely on answering the
question of why does  __gup_device_huge() exist at all.

This I don't fully know:

1) As discussed quite a few times now, the entire get_dev_pagemap
   stuff looks usless and should just be deleted. If you care about
   optimizing this I would persue doing that as it will give the
   biggest single win.

2) It breaks up the PUD/PMD into tail pages and scans them all
   Why? Can devmap combine multiple compound_head's into the same PTE?
   Does devmap guarentee that the PUD/PMD points to the head page? (I
   assume no)

But I'm looking at this some more and I see try_get_compound_head() is
reading the compound_head with no locking, just READ_ONCE, so that
must be OK under GUP.

It still seems to me the same generic algorithm should work
everywhere, once we get rid of the get_dev_pagemap

  start_pfn = pud/pmd_pfn() + pud/pmd_page_offset(addr)
  end_pfn = start_pfn + (end - addr) // fixme
  if (THP)
     refs = end_pfn - start_pfn
  if (devmap)
     refs = 1

  do {
     page = pfn_to_page(start_pfn)
     head_page = try_grab_compound_head(page, refs, flags)
     if (pud/pmd_val() != orig)
        err

     npages = 1 << compound_order(head_page)
     npages = min(npages, end_pfn - start_pfn)
     for (i = 0, iter = page; i != npages) {
     	 *pages++ = iter;
         mem_map_next(iter, page, i)
     }

     if (devmap && npages > 2)
         grab_compound_head(head_page, npages - 1, flags)
     start_pfn += npages;
  } while (start_pfn != end_pfn)

Above needs to be cleaned up quite a bit, but is the general idea.

There is an further optimization we can put in where we can know that
'page' is still in a currently grab'd compound (eg last_page+1 = page,
not past compound_order) and defer the refcount work.

> It's interesting how THP (in gup_huge_pmd()) unilaterally computes
> tails assuming pmd_page(orig) is the head page.

I think this is an integral property of THP, probably not devmap/dax
though?

Jason



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux