Re: [PATCHv3 1/2] mm/gup: fix omission of check on FOLL_LONGTERM in get_user_pages_fast()

John Hubbard <jhubbard@xxxxxxxxxx> · Thu, 6 Jun 2019 14:17:27 -0700

On 6/5/19 7:19 PM, Pingfan Liu wrote:
> On Thu, Jun 6, 2019 at 5:49 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
...
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -2196,6 +2196,26 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
>>>       return ret;
>>>  }
>>>
>>> +#ifdef CONFIG_CMA
>>> +static inline int reject_cma_pages(int nr_pinned, struct page **pages)
>>> +{
>>> +     int i;
>>> +
>>> +     for (i = 0; i < nr_pinned; i++)
>>> +             if (is_migrate_cma_page(pages[i])) {
>>> +                     put_user_pages(pages + i, nr_pinned - i);
>>> +                     return i;
>>> +             }
>>> +
>>> +     return nr_pinned;
>>> +}
>>
>> There's no point in inlining this.
> OK, will drop it in V4.
> 
>>
>> The code seems inefficient.  If it encounters a single CMA page it can
>> end up discarding a possibly significant number of non-CMA pages.  I
> The trick is the page is not be discarded, in fact, they are still be
> referrenced by pte. We just leave the slow path to pick up the non-CMA
> pages again.
> 
>> guess that doesn't matter much, as get_user_pages(FOLL_LONGTERM) is
>> rare.  But could we avoid this (and the second pass across pages[]) by
>> checking for a CMA page within gup_pte_range()?
> It will spread the same logic to hugetlb pte and normal pte. And no
> improvement in performance due to slow path. So I think maybe it is
> not worth.
> 
>>

I think the concern is: for the successful gup_fast case with no CMA
pages, this patch is adding another complete loop through all the 
pages. In the fast case.

If the check were instead done as part of the gup_pte_range(), then
it would be a little more efficient for that case.

As for whether it's worth it, *probably* this is too small an effect to measure. 
But in order to attempt a measurement: running fio (https://github.com/axboe/fio)
with O_DIRECT on an NVMe drive, might shed some light. Here's an fio.conf file 
that Jan Kara and Tom Talpey helped me come up with, for related testing:

[reader]
direct=1
ioengine=libaio
blocksize=4096
size=1g
numjobs=1
rw=read
iodepth=64

thanks,
-- 
John Hubbard
NVIDIA