Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

Ryan Roberts <ryan.roberts@xxxxxxx> · Thu, 23 Nov 2023 19:11:19 +0000

On 23/11/2023 17:22, Peter Xu wrote:
> On Thu, Nov 23, 2023 at 03:47:49PM +0000, Matthew Wilcox wrote:
>> It looks like ARM (in the person of Ryan) are going to add support for
>> something equivalent to hugepd.
> 
> If it's about arm's cont_pte, then it looks ideal because this series
> didn't yet touch cont_pte, assuming it'll just work.  From that aspect, his
> work may help mine, and no immediately collapsing either.

Hi,

I'm not sure I've 100% understood the crossover between this series and my work
to support arm64's contpte mappings generally for anonymous and file-backed memory.

My approach is to transparently use contpte mappings when core-mm request pte
mappings that meet the requirements; and its all based around intercepting the
normal (non-hugetlb) helpers (e.g. set_ptes(), ptep_get() and friends). There is
no semantic change to the core-mm. See [1]. It relies on 1) the page cache using
large folios and 2) my "small-sized THP" series which starts using arbitrary
sized large folios for anonymous memory [2].

If I've understood this conversation correctly there is an object called hugepd,
which today is only supported by powerpc, but which could allow the core-mm to
control the mapping granularity? I can see some value in exposing that control
to core-mm in the (very) long term.

[1] https://lore.kernel.org/all/20231115163018.1303287-1-ryan.roberts@xxxxxxx/
[2] https://lore.kernel.org/linux-mm/20231115132734.931023-1-ryan.roberts@xxxxxxx/

Thanks,
Ryan

> 
> There can be a slight performance difference which I need to measure for
> arm's cont_pte already for hugetlb, but I didn't worry much on that;
> quotting my commit message in the last patch:
> 
>     There may be a slight difference of how the loops run when processing
>     GUP over a large hugetlb range on either ARM64 (e.g. CONT_PMD) or RISCV
>     (mostly its Svnapot extension on 64K huge pages): each loop of
>     __get_user_pages() will resolve one pgtable entry with the patch
>     applied, rather than relying on the size of hugetlb hstate, the latter
>     may cover multiple entries in one loop.
>     
>     However, the performance difference should hopefully not be a major
>     concern, considering that GUP just yet got 57edfcfd3419 ("mm/gup:
>     accelerate thp gup even for "pages != NULL""), and that's not part of a
>     performance analysis but a side dish.  If the performance will be a
>     concern, we can consider handle CONT_PTE in follow_page(), for example.
> 
> So IMHO it can be slightly different comparing to e.g. page fault, because
> each fault is still pretty slow as a whole if one fault for each small pte
> (of a large folio / cont_pte), while the loop in GUP is still relatively
> tight and short, comparing to a fault.  I'd boldly guess more low hanging
> fruits out there for large folio outside GUP areas.
> 
> In all cases, it'll be interesting to know if Ryan has worked on cont_pte
> support for gup on large folios, and whether there's any performance number
> to share.  It's definitely good news to me because it means Ryan's work can
> also then benefit hugetlb if this series will be merged, I just don't know
> how much difference there will be.
> 
> Thanks,
>