On Thu, Jun 08, 2023 at 09:10:15PM +0100, Matthew Wilcox wrote: > On Thu, Jun 08, 2023 at 08:34:10AM +0200, David Hildenbrand wrote: > > On 08.06.23 02:02, David Rientjes wrote: > > > While people have proposed 1GB THP support in the past, it was nacked, in > > > part, because of the suggestion to just use existing 1GB support in > > > hugetlb instead :) > > > > Yes, because I still think that the use for "transparent" (for the user) > > nowadays is very limited and not worth the complexity. > > > > IMHO, what you really want is a pool of large pages that (guarantees about > > availability and nodes) and fine control about who gets these pages. That's > > what hugetlb provides. > > > > In contrast to THP, you don't want to allow for > > * Partially mmap, mremap, munmap, mprotect them > > * Partially sharing then / COW'ing them > > * Partially mixing them with other anon pages (MADV_DONTNEED + refault) > > * Exclude them from some features KSM/swap > > * (swap them out and eventually split them for that) > > > > Because you don't want to get these pages PTE-mapped by the system *unless* > > there is a real reason (HGM, hwpoison) -- you want guarantees. Once such a > > page is PTE-mapped, you only want to collapse in place. > > > > But you don't want special-HGM, you simply want the core to PTE-map them > > like a (file) THP. > > > > IMHO, getting that realized much easier would be if we wouldn't have to care > > about some of the hugetlb complexity I raised (MAP_PRIVATE, PMD sharing), > > but maybe there is a way ... > > I favour a more evolutionary than revolutionary approach. That is, > I think it's acceptable to add new features to hugetlbfs _if_ they're > combined with cleanup work that gets hugetlbfs closer to the main mm. > This is why I harp on things like pagewalk that currently need special > handling for hugetlb -- that's pointless; they should just be treated as > large folios. GUP handles hugetlb separately too, and I'm not sure why. Yes, this echo's my feelings too. Making all the special core-mm cases around hugetlb even more complicated with HGM seems like a non-starter. We need to get to a point where the core-mm handles all the PTE programming and supports arbitary order folios in the page tables uniformly for everyone. hugetlb is just a special high order folio provider. Get rid of all the special PTE formats, unique arch code, and special code in gup.c/pagewalkers/etc that supports hugetlbfs. I think the general path to do that is to make the core-mm and all the hugetlb supporting arches support a core-code path for working with high order folios in page tables. Maybe this is demo'd & tested with a temporary/simplified hugetlbfs uAPI. When the core MM and all the arches are ready you switch hugetlbfs to use the new core API and deleted all the page walk special cases. >From there you can then teach the core code to do all the splitting and whatever that you want. Jason