On Mon, Oct 11, 2021 at 04:53:29PM +0100, Joao Martins wrote: > On 10/8/21 12:54, Jason Gunthorpe wrote: > > The only optimization that might work here is to grab the head, then > > compute the extent of tail pages and amalgamate them. Holding a ref on > > the head also secures the tails. > > How about pmd_page(orig) / pud_page(orig) like what the rest of hugetlb/thp > checks do? i.e. we would pass pmd_page(orig)/pud_page(orig) to __gup_device_huge() > as an added @head argument. While keeping the same structure of counting tail pages > between @addr .. @end if we have a head page. The right logic is what everything else does: page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); refs = record_subpages(page, addr, end, pages + *nr); head = try_grab_compound_head(pud_page(orig), refs, flags); If you can use this, or not, depends entirely on answering the question of why does __gup_device_huge() exist at all. This I don't fully know: 1) As discussed quite a few times now, the entire get_dev_pagemap stuff looks usless and should just be deleted. If you care about optimizing this I would persue doing that as it will give the biggest single win. 2) It breaks up the PUD/PMD into tail pages and scans them all Why? Can devmap combine multiple compound_head's into the same PTE? Does devmap guarentee that the PUD/PMD points to the head page? (I assume no) But I'm looking at this some more and I see try_get_compound_head() is reading the compound_head with no locking, just READ_ONCE, so that must be OK under GUP. It still seems to me the same generic algorithm should work everywhere, once we get rid of the get_dev_pagemap start_pfn = pud/pmd_pfn() + pud/pmd_page_offset(addr) end_pfn = start_pfn + (end - addr) // fixme if (THP) refs = end_pfn - start_pfn if (devmap) refs = 1 do { page = pfn_to_page(start_pfn) head_page = try_grab_compound_head(page, refs, flags) if (pud/pmd_val() != orig) err npages = 1 << compound_order(head_page) npages = min(npages, end_pfn - start_pfn) for (i = 0, iter = page; i != npages) { *pages++ = iter; mem_map_next(iter, page, i) } if (devmap && npages > 2) grab_compound_head(head_page, npages - 1, flags) start_pfn += npages; } while (start_pfn != end_pfn) Above needs to be cleaned up quite a bit, but is the general idea. There is an further optimization we can put in where we can know that 'page' is still in a currently grab'd compound (eg last_page+1 = page, not past compound_order) and defer the refcount work. > It's interesting how THP (in gup_huge_pmd()) unilaterally computes > tails assuming pmd_page(orig) is the head page. I think this is an integral property of THP, probably not devmap/dax though? Jason