Re: [LSF/MM/BPF TOPIC] HugeTLB generic pagewalk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




commit 0549e76663730235a10395a7af7ad3d3ce6e2402
Author: Christophe Leroy <christophe.leroy@xxxxxxxxxx>
Date:   Tue Jul 2 15:51:25 2024 +0200

      powerpc/8xx: rework support for 8M pages using contiguous PTE entries
      In order to fit better with standard Linux page tables layout, add
support
      for 8M pages using contiguous PTE entries in a standard page
table.  Page
      tables will then be populated with 1024 similar entries and two PMD
      entries will point to that page table.
      The PMD entries also get a flag to tell it is addressing an 8M
page, this
      is required for the HW tablewalk assistance.

Where we are walking a PTE table, but actually there is another PTE
table we
have to modify in the same go.


Very hard to make that non-hugetlb aware, as it's simply completely
different compared
to ordinary page table walking/modifications today.

Maybe there are ideas to tackle that, and I'd be very interested in them.



But at least that 8xx change allowed us to get ride of huge page
directories (hugepd) which was even more painful IIUC.

Yes, don't get me wrong, it was a clear win to get rid of hugepd, allowing for GUP and folio_walk to work in a non-hugetlb fashion: at least, when all we want to do is lookup which page is mapped at a given address.

Unfortunately, that's not what all page table walkers do.


Neverthless, can't we turn that into a standard walk in a way or another ?

While we walk we reach a PMD entry which is marked as a CONT-PMD, but it
is not tagged as a leaf entry, so there is a page table below. PMD_SIZE
is 4M but the page size is 8M so once you've walked the page table
entirely you know you still have 4M to go so you have to walk the second
PMD and the page table it points to.

We would somehow have to fake that it is a PMD leaf, and realize that they both are cont, so we can batch both PMDs. The PTE page table handling is a bit of a pain, though.

... and modifying entries it is a bit of a pain as well; unless we can hide all that somehow in the powerpc pmd setters.

Hm, far from ideal, at least at this stage, because we don't really support cont-pmd outside of hugetlb, and a lot of page table walkers must be taught do deal with cont-pmd.


By the way, don't know it can help or make things worse, but indeed from
a HW point of view there is no need to replicate 1024 times the PTE
entry. Here we used a standard page table because it looked more generic
from kernel point of view, but all the HW needs is a single PTE located
at a page aligned address. Thats what we had when we used huge page
directories (hugepd). It was even easier because both PMD entries were
pointing to the same hugepd entry hence no need of CONT-PTE-like
management at PTE level.

Ah, I see. I'll have to think about that a bit ... far from trivial.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux