On Tue, Jun 13, 2023 at 05:53:45PM -0400, Peter Xu wrote: > The acceleration of THP was done with ctx.page_mask, however it'll be > ignored if **pages is non-NULL. > > The old optimization was introduced in 2013 in 240aadeedc4a ("mm: > accelerate mm_populate() treatment of THP pages"). It didn't explain why > we can't optimize the **pages non-NULL case. It's possible that at that > time the major goal was for mm_populate() which should be enough back then. > > Optimize thp for all cases, by properly looping over each subpage, doing > cache flushes, and boost refcounts / pincounts where needed in one go. > > This can be verified using gup_test below: > > # chrt -f 1 ./gup_test -m 512 -t -L -n 1024 -r 10 > > Before: 13992.50 ( +-8.75%) > After: 378.50 (+-69.62%) > > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> > --- > mm/gup.c | 36 +++++++++++++++++++++++++++++------- > 1 file changed, 29 insertions(+), 7 deletions(-) > > diff --git a/mm/gup.c b/mm/gup.c > index a2d1b3c4b104..cdabc8ea783b 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1210,16 +1210,38 @@ static long __get_user_pages(struct mm_struct *mm, > goto out; > } > next_page: > - if (pages) { > - pages[i] = page; > - flush_anon_page(vma, page, start); > - flush_dcache_page(page); > - ctx.page_mask = 0; > - } > - > page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask); > if (page_increm > nr_pages) > page_increm = nr_pages; > + > + if (pages) { > + struct page *subpage; > + unsigned int j; > + > + /* > + * This must be a large folio (and doesn't need to > + * be the whole folio; it can be part of it), do > + * the refcount work for all the subpages too. > + * Since we already hold refcount on the head page, > + * it should never fail. > + * > + * NOTE: here the page may not be the head page > + * e.g. when start addr is not thp-size aligned. > + */ > + if (page_increm > 1) > + WARN_ON_ONCE( > + try_grab_folio(compound_head(page), > + page_increm - 1, > + foll_flags) == NULL); I'm not sure this should be warning but otherwise ignoring this returning NULL? This feels like a case that could come up in realtiy, e.g. folio_ref_try_add_rcu() fails, or !folio_is_longterm_pinnable(). Side-note: I _hate_ the semantics of GUP such that try_grab_folio() (invoked, other than for huge page cases, by the GUP-fast logic) will explicitly fail if neither FOLL_GET or FOLL_PIN are specified, differentiating it from try_grab_page() in this respect. This is a side-note and not relevant here, as all callers to __get_user_pages() either explicitly set FOLL_GET if not set by user (in __get_user_pages_locked()) or don't set pages (e.g. in faultin_vma_page_range()) > + > + for (j = 0; j < page_increm; j++) { > + subpage = nth_page(page, j); > + pages[i+j] = subpage; > + flush_anon_page(vma, subpage, start + j * PAGE_SIZE); > + flush_dcache_page(subpage); > + } > + } > + > i += page_increm; > start += page_increm * PAGE_SIZE; > nr_pages -= page_increm; > -- > 2.40.1 >