>>> consideting that 2MB THP have turned out to be quite a pain but >>> situation has settled over time. Maybe our current code base is prepared >>> for that much better. > > I am planning to refactor my code further to reduce the amount of > the added code, since PUD THP is very similar to PMD THP. One thing > I want to achieve is to enable split_huge_page to split any order of > pages to a group of any lower order of pages. A lot of code in this > patchset is replicating the same behavior of PMD THP at PUD level. > It might be possible to deduplicate most of the code. > >>> >>> Exposing that interface to the userspace is a different story of course. >>> I do agree that we likely do not want to be very explicit about that. >>> E.g. an interface for address space defragmentation without any more >>> specifics sounds like a useful feature to me. It will be up to the >>> kernel to decide which huge pages to use. >> >> Yes, I think one important feature would be that we don't end up placing >> a gigantic page where only a handful of pages are actually populated >> without green light from the application - because that's what some user >> space applications care about (not consuming more memory than intended. >> IIUC, this is also what this patch set does). I'm fine with placing >> gigantic pages if it really just "defragments" the address space layout, >> without filling unpopulated holes. >> >> Then, this would be mostly invisible to user space, and we really >> wouldn't have to care about any configuration. > > > I agree that the interface should be as simple as no configuration to > most users. But I also wonder why we have hugetlbfs to allow users to > specify different kinds of page sizes, which seems against the discussion > above. Are we assuming advanced users should always use hugetlbfs instead > of THPs? Well, with hugetlbfs you get a real control over which pagesizes to use. No mixture, guarantees. In some environments you might want to control which application gets which pagesize. I know of database applications and hypervisors that sometimes really want 2MB huge pages instead of 1GB huge pages. And sometimes you really want/need 1GB huge pages (e.g., low-latency applications, real-time KVM, ...). Simple example: KVM with postcopy live migration While 2MB huge pages work reasonably fine, migrating 1GB gigantic pages on demand (via userfaultdfd) is a painfully slow / impractical. -- Thanks, David / dhildenb