On 12/9/20 4:40 AM, John Hubbard wrote: > On 12/8/20 9:28 AM, Joao Martins wrote: >> Much like hugetlbfs or THPs, we treat device pagemaps with >> compound pages like the rest of GUP handling of compound pages. >> >> Rather than incrementing the refcount every 4K, we record >> all sub pages and increment by @refs amount *once*. >> >> Performance measured by gup_benchmark improves considerably >> get_user_pages_fast() and pin_user_pages_fast(): >> >> $ gup_benchmark -f /dev/dax0.2 -m 16384 -r 10 -S [-u,-a] -n 512 -w > > "gup_test", now that you're in linux-next, actually. > > (Maybe I'll retrofit that test with getopt_long(), those options are > getting more elaborate.) > :) >> >> (get_user_pages_fast 2M pages) ~75k us -> ~3.6k us >> (pin_user_pages_fast 2M pages) ~125k us -> ~3.8k us > > That is a beautiful result! I'm very motivated to see if this patchset > can make it in, in some form. > Cool! >> >> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx> >> --- >> mm/gup.c | 67 ++++++++++++++++++++++++++++++++++++++++++-------------- >> 1 file changed, 51 insertions(+), 16 deletions(-) >> >> diff --git a/mm/gup.c b/mm/gup.c >> index 98eb8e6d2609..194e6981eb03 100644 >> --- a/mm/gup.c >> +++ b/mm/gup.c >> @@ -2250,22 +2250,68 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, >> } >> #endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */ >> >> + >> +static int record_subpages(struct page *page, unsigned long addr, >> + unsigned long end, struct page **pages) >> +{ >> + int nr; >> + >> + for (nr = 0; addr != end; addr += PAGE_SIZE) >> + pages[nr++] = page++; >> + >> + return nr; >> +} >> + >> #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE) >> -static int __gup_device_huge(unsigned long pfn, unsigned long addr, >> - unsigned long end, unsigned int flags, >> - struct page **pages, int *nr) >> +static int __gup_device_compound_huge(struct dev_pagemap *pgmap, >> + struct page *head, unsigned long sz, > > If this variable survives (I see Jason requested a reorg of this math stuff, > and I also like that idea), then I'd like a slightly better name for "sz". > Yeap. > I was going to suggest one, but then realized that I can't understand how this > works. See below... > >> + unsigned long addr, unsigned long end, >> + unsigned int flags, struct page **pages) >> +{ >> + struct page *page; >> + int refs; >> + >> + if (!(pgmap->flags & PGMAP_COMPOUND)) >> + return -1; > > btw, I'm unhappy with returning -1 here and assigning it later to a refs variable. > (And that will show up even more clearly as an issue if you attempt to make > refs unsigned everywhere!) > Yes true. The usage of @refs = -1 (therefore an int) was to differentiate when we are not in a PGMAP_COMPOUND pgmap (and so for logic to keep as today). Notice that in the PGMAP_COMPOUND case if we fail to grab the head compound page we return 0. > I'm not going to suggest anything because there are a lot of ways to structure > these routines, and I don't want to overly constrain you. Just please don't assign > negative values to any refs variables. > OK. TBH I'm a little afraid this can turn into further complexity if I have to keep the non-compound pgmap around. But I will see how I can adjust this. >> + >> + page = head + ((addr & (sz-1)) >> PAGE_SHIFT); > > If you pass in PMD_SHIFT or PUD_SHIFT for, that's a number-of-bits, isn't it? > Not a size. And if it's not a size, then sz - 1 doesn't work, does it? If it > does work, then better naming might help. I'm probably missing a really > obvious math trick here. You're right. That was a mistake on my end, indeed. But the mistake wouldn't change the logic, as the PageReference bit only applies to the head page. Joao