Re: [PATCH 00/45] hugetlb pagewalk unification

Jason Gunthorpe <jgg@xxxxxxxxxx> · Mon, 8 Jul 2024 11:28:43 -0300

On Mon, Jul 08, 2024 at 10:18:30AM +0200, Oscar Salvador wrote:

> IMHO, that was a mistake to start with, but I was not around when it was
> introduced and maybe there were good reasons to deal with that the way
> it is done.

It is a trade off, either you have to write out a lot of duplicated
code for every level or you have this sort of level agnostic design.

> But, the thing is that my ultimate goal, is for hugetlb code to be able
> to deal with PUD/PMD (pte and cont-pte is already dealt with) just like
> mm core does for THP (PUD is not supported by THP, but you get me), and
> that is not that difficult to do, as this patchset tries to prove.

IMHO we need to get to an API that can understand everything in a page
table. Having two APIs that are both disjoint is the problematic bit.

Improving the pud/pmd/etc API is a good direction

Nobody has explored it, but generalizing to a 'non-level' API could
also be a direction. 'non-level' means it works more like the huge API
where the level is not part of the function names but somehow the
level is encoded by the values/state/something.

This is appealing for things like page_walk where we have all these
per-level ops which are kind of pointless code duplication.

I've been doing some experiments on the iommu page table side on both
these directions and so far I haven't come to something that is really
great :\

> Of course, for hugetlb to gain the hability to operate on PUD/PMD, this
> means we need to add a fairly amount of code. e.g: for operating
> hugepages on PUD level, code for markers on PUD/PMD level for
> uffd/poison stuff (only dealt
> on pmd/pte atm AFAIK), swap functions for PUD (is_swap_pud for PUD), etc.
> Basically, almost all we did for PMD-* stuff we need it for PUD as well,
> and that will be around when THP gains support for PUD if it ever does
> (I guess that in a few years if memory capacity keeps increasing).

Right, this is the general pain of the mm's design is we have to
duplicate so much stuff N-wise for each level, even though in alot of
cases it isn't different for each level.

> I will keep working on this patchset not because of pagewalk savings,
> but because I think it will help us in have hugetlb more mm-core ready,
> since the current pagewalk has to test that a hugetlb page can be
> properly read on PUD/PMD/PTE level no matter what: uffd for hugetlb on PUD/PMD,
> hwpoison entries for swp on PUD/PMD, pud invalidating, etc.

Right, it would be nice if the page walk ops didn't have to touch huge
stuff at all. pagewalk ops, as they are today, should just work with
pud/pmd/pte normal functions in all cases.

Jason