Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 08, 2023 at 09:10:15PM +0100, Matthew Wilcox wrote:
> On Thu, Jun 08, 2023 at 08:34:10AM +0200, David Hildenbrand wrote:
> > On 08.06.23 02:02, David Rientjes wrote:
> > > While people have proposed 1GB THP support in the past, it was nacked, in
> > > part, because of the suggestion to just use existing 1GB support in
> > > hugetlb instead :)
> > 
> > Yes, because I still think that the use for "transparent" (for the user)
> > nowadays is very limited and not worth the complexity.
> > 
> > IMHO, what you really want is a pool of large pages that (guarantees about
> > availability and nodes) and fine control about who gets these pages. That's
> > what hugetlb provides.
> > 
> > In contrast to THP, you don't want to allow for
> > * Partially mmap, mremap, munmap, mprotect them
> > * Partially sharing then / COW'ing them
> > * Partially mixing them with other anon pages (MADV_DONTNEED + refault)
> > * Exclude them from some features KSM/swap
> > * (swap them out and eventually split them for that)
> > 
> > Because you don't want to get these pages PTE-mapped by the system *unless*
> > there is a real reason (HGM, hwpoison) -- you want guarantees. Once such a
> > page is PTE-mapped, you only want to collapse in place.
> > 
> > But you don't want special-HGM, you simply want the core to PTE-map them
> > like a (file) THP.
> > 
> > IMHO, getting that realized much easier would be if we wouldn't have to care
> > about some of the hugetlb complexity I raised (MAP_PRIVATE, PMD sharing),
> > but maybe there is a way ...
> 
> I favour a more evolutionary than revolutionary approach.  That is,
> I think it's acceptable to add new features to hugetlbfs _if_ they're
> combined with cleanup work that gets hugetlbfs closer to the main mm.
> This is why I harp on things like pagewalk that currently need special
> handling for hugetlb -- that's pointless; they should just be treated as
> large folios.  GUP handles hugetlb separately too, and I'm not sure why.

Yes, this echo's my feelings too.

Making all the special core-mm cases around hugetlb even more
complicated with HGM seems like a non-starter.

We need to get to a point where the core-mm handles all the PTE
programming and supports arbitary order folios in the page tables
uniformly for everyone.

hugetlb is just a special high order folio provider.

Get rid of all the special PTE formats, unique arch code, and special
code in gup.c/pagewalkers/etc that supports hugetlbfs.

I think the general path to do that is to make the core-mm and all the
hugetlb supporting arches support a core-code path for working with
high order folios in page tables.

Maybe this is demo'd & tested with a temporary/simplified hugetlbfs
uAPI. When the core MM and all the arches are ready you switch
hugetlbfs to use the new core API and deleted all the page walk
special cases.

>From there you can then teach the core code to do all the splitting
and whatever that you want.

Jason




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux