On 23/11/2023 17:22, Peter Xu wrote: > On Thu, Nov 23, 2023 at 03:47:49PM +0000, Matthew Wilcox wrote: >> It looks like ARM (in the person of Ryan) are going to add support for >> something equivalent to hugepd. > > If it's about arm's cont_pte, then it looks ideal because this series > didn't yet touch cont_pte, assuming it'll just work. From that aspect, his > work may help mine, and no immediately collapsing either. Hi, I'm not sure I've 100% understood the crossover between this series and my work to support arm64's contpte mappings generally for anonymous and file-backed memory. My approach is to transparently use contpte mappings when core-mm request pte mappings that meet the requirements; and its all based around intercepting the normal (non-hugetlb) helpers (e.g. set_ptes(), ptep_get() and friends). There is no semantic change to the core-mm. See [1]. It relies on 1) the page cache using large folios and 2) my "small-sized THP" series which starts using arbitrary sized large folios for anonymous memory [2]. If I've understood this conversation correctly there is an object called hugepd, which today is only supported by powerpc, but which could allow the core-mm to control the mapping granularity? I can see some value in exposing that control to core-mm in the (very) long term. [1] https://lore.kernel.org/all/20231115163018.1303287-1-ryan.roberts@xxxxxxx/ [2] https://lore.kernel.org/linux-mm/20231115132734.931023-1-ryan.roberts@xxxxxxx/ Thanks, Ryan > > There can be a slight performance difference which I need to measure for > arm's cont_pte already for hugetlb, but I didn't worry much on that; > quotting my commit message in the last patch: > > There may be a slight difference of how the loops run when processing > GUP over a large hugetlb range on either ARM64 (e.g. CONT_PMD) or RISCV > (mostly its Svnapot extension on 64K huge pages): each loop of > __get_user_pages() will resolve one pgtable entry with the patch > applied, rather than relying on the size of hugetlb hstate, the latter > may cover multiple entries in one loop. > > However, the performance difference should hopefully not be a major > concern, considering that GUP just yet got 57edfcfd3419 ("mm/gup: > accelerate thp gup even for "pages != NULL""), and that's not part of a > performance analysis but a side dish. If the performance will be a > concern, we can consider handle CONT_PTE in follow_page(), for example. > > So IMHO it can be slightly different comparing to e.g. page fault, because > each fault is still pretty slow as a whole if one fault for each small pte > (of a large folio / cont_pte), while the loop in GUP is still relatively > tight and short, comparing to a fault. I'd boldly guess more low hanging > fruits out there for large folio outside GUP areas. > > In all cases, it'll be interesting to know if Ryan has worked on cont_pte > support for gup on large folios, and whether there's any performance number > to share. It's definitely good news to me because it means Ryan's work can > also then benefit hugetlb if this series will be merged, I just don't know > how much difference there will be. > > Thanks, >