commit 0549e76663730235a10395a7af7ad3d3ce6e2402
Author: Christophe Leroy <christophe.leroy@xxxxxxxxxx>
Date: Tue Jul 2 15:51:25 2024 +0200
powerpc/8xx: rework support for 8M pages using contiguous PTE entries
In order to fit better with standard Linux page tables layout, add
support
for 8M pages using contiguous PTE entries in a standard page
table. Page
tables will then be populated with 1024 similar entries and two PMD
entries will point to that page table.
The PMD entries also get a flag to tell it is addressing an 8M
page, this
is required for the HW tablewalk assistance.
Where we are walking a PTE table, but actually there is another PTE
table we
have to modify in the same go.
Very hard to make that non-hugetlb aware, as it's simply completely
different compared
to ordinary page table walking/modifications today.
Maybe there are ideas to tackle that, and I'd be very interested in them.
But at least that 8xx change allowed us to get ride of huge page
directories (hugepd) which was even more painful IIUC.
Yes, don't get me wrong, it was a clear win to get rid of hugepd,
allowing for GUP and folio_walk to work in a non-hugetlb fashion: at
least, when all we want to do is lookup which page is mapped at a given
address.
Unfortunately, that's not what all page table walkers do.
Neverthless, can't we turn that into a standard walk in a way or another ?
While we walk we reach a PMD entry which is marked as a CONT-PMD, but it
is not tagged as a leaf entry, so there is a page table below. PMD_SIZE
is 4M but the page size is 8M so once you've walked the page table
entirely you know you still have 4M to go so you have to walk the second
PMD and the page table it points to.
We would somehow have to fake that it is a PMD leaf, and realize that
they both are cont, so we can batch both PMDs. The PTE page table
handling is a bit of a pain, though.
... and modifying entries it is a bit of a pain as well; unless we can
hide all that somehow in the powerpc pmd setters.
Hm, far from ideal, at least at this stage, because we don't really
support cont-pmd outside of hugetlb, and a lot of page table walkers
must be taught do deal with cont-pmd.
By the way, don't know it can help or make things worse, but indeed from
a HW point of view there is no need to replicate 1024 times the PTE
entry. Here we used a standard page table because it looked more generic
from kernel point of view, but all the HW needs is a single PTE located
at a page aligned address. Thats what we had when we used huge page
directories (hugepd). It was even easier because both PMD entries were
pointing to the same hugepd entry hence no need of CONT-PTE-like
management at PTE level.
Ah, I see. I'll have to think about that a bit ... far from trivial.
--
Cheers,
David / dhildenb