Re: [RFC PATCH v2 00/30] 1GB PUD THP support on x86_64

David Hildenbrand <david@xxxxxxxxxx> · Mon, 5 Oct 2020 19:39:53 +0200

>>> consideting that 2MB THP have turned out to be quite a pain but
>>> situation has settled over time. Maybe our current code base is prepared
>>> for that much better.
> 
> I am planning to refactor my code further to reduce the amount of
> the added code, since PUD THP is very similar to PMD THP. One thing
> I want to achieve is to enable split_huge_page to split any order of
> pages to a group of any lower order of pages. A lot of code in this
> patchset is replicating the same behavior of PMD THP at PUD level.
> It might be possible to deduplicate most of the code.
> 
>>>
>>> Exposing that interface to the userspace is a different story of course.
>>> I do agree that we likely do not want to be very explicit about that.
>>> E.g. an interface for address space defragmentation without any more
>>> specifics sounds like a useful feature to me. It will be up to the
>>> kernel to decide which huge pages to use.
>>
>> Yes, I think one important feature would be that we don't end up placing
>> a gigantic page where only a handful of pages are actually populated
>> without green light from the application - because that's what some user
>> space applications care about (not consuming more memory than intended.
>> IIUC, this is also what this patch set does). I'm fine with placing
>> gigantic pages if it really just "defragments" the address space layout,
>> without filling unpopulated holes.
>>
>> Then, this would be mostly invisible to user space, and we really
>> wouldn't have to care about any configuration.
> 
> 
> I agree that the interface should be as simple as no configuration to
> most users. But I also wonder why we have hugetlbfs to allow users to
> specify different kinds of page sizes, which seems against the discussion
> above. Are we assuming advanced users should always use hugetlbfs instead
> of THPs?

Well, with hugetlbfs you get a real control over which pagesizes to use.
No mixture, guarantees.

In some environments you might want to control which application gets
which pagesize. I know of database applications and hypervisors that
sometimes really want 2MB huge pages instead of 1GB huge pages. And
sometimes you really want/need 1GB huge pages (e.g., low-latency
applications, real-time KVM, ...).

Simple example: KVM with postcopy live migration

While 2MB huge pages work reasonably fine, migrating 1GB gigantic pages
on demand (via userfaultdfd) is a painfully slow / impractical.

-- 
Thanks,

David / dhildenb