Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs

David Rientjes <rientjes@xxxxxxxxxx> · Thu, 25 May 2023 20:00:59 -0700 (PDT)

On Wed, 24 May 2023, James Houghton wrote:

> Hi everyone,
> 
> If you came to the HGM session at LSF/MM/BPF, thank you!

Thank you, James, for putting together such a detailed discussion and 
soliciting some great feedback.

> I want to
> address some of the feedback I got and restate the importance of HGM,
> especially as it relates to handling memory poison.
> 

Thanks for bringing this up, I think it's a very important use case.  
Adding in Naoya Horiguchi and Miaohe Lin as well.

> ## Memory poison is a problem
> 
> HGM allows us to unmap poison at 4K instead of unmapping the entire
> hugetlb page. For applications that use HugeTLB, losing the entire
> hugepage can be catastrophic. For example, if a hypervisor is using 1G
> pages for guest memory, the VM will lose 1G of its physical address
> space, which is catastrophic (even 2M will most likely kill the VM).
> If we can limit the poisoning to only 4K, the VM will most likely be
> able to recover. This improved recoverability applies to other HugeTLB
> users as well, like databases.
> 

Mike, do you have feedback on how useful this would be, especially for use 
cases beyond what cloud providers would find helpful?

> ## Adding a new filesystem has risks, and unification will take years
> 
> Most of the feedback I got from the HGM session was to simply avoid
> adding new code to HugeTLB, and instead to make a new device or
> filesystem. Creating a new device or filesystem could work, but it
> leaves existing HugeTLB users with no answer for memory poison. Users
> would need to switch to the new device/filesystem if they want better
> hwpoison handling, and it will probably take years for the new
> device/filesystem to support all the features that HugeTLB supports
> today (so beyond PUD+ mappings, we would need page table sharing, page
> struct freeing, and even private mappings/CoW).
> 
> If we make a new filesystem and are unable to completely implement the
> HugeTLB uapi exactly with that filesystem, we will be stuck unable to
> remove HugeTLB.  We would strongly like to avoid coexisting HugeTLB
> implementations (similar to cgroup v1 and cgroup v2) if at all
> possible.
> 
> Instead of making a new filesystem, we could add HugeTLB-like features
> tmpfs, such as support for gigantic page allocations (from bootmem or
> CMA, like HugeTLB), for example. This path would work to mostly unify
> HugeTLB with tmpfs, but existing HugeTLB users will still have to wait
> for many years before poison can be handled more efficiently. (And
> some users care about things like hugetlb_cgroup!)
> 
> ## HGM doesn’t hinder future unification
> 
> HGM doesn’t add any new special cases into mm code; it takes advantage
> of the existing special cases that already exist to support HugeTLB.
> HGM also isn’t adding a completely novel feature that can’t be
> replicated by THPs: PTE-mapping of THPs is already supported.
> 

I think this is important, there are deficiencies that HGM can fully 
address (like the aforementioned smaller granularity page poisoning, as 
well as optimized live migration) while not posing an obstacle for future 
unification if possible.

If not for HGM, it would be great to get alignment on what needs to be 
done so that we can support memory poisoning in smaller sizes for users of 
1GB pages *and* optimized live migration for VMs backed by 1GB pages 
without requiring a full unification of the HugeTLB subsystem with the 
rest of core MM.

While that unification has been discussed for several years, it would be a 
shame if that became a full blocker to address these real deficiencies 
that are actively causing pain.

> HGM solves a problem that HugeTLB users have right now: unnecessarily
> large portions of memory are poisoned. Unless we fix HugeTLB itself,
> we will have to spend years effectively rewriting HugeTLB and telling
> users to switch to the new system that gets built.
> 
> Given all this, I think we should continue to move forward with HGM
> unless there is another feasible way to solve poisoning for existing
> HugeTLB users. Also, I encourage everyone to read the series itself
> (it's not all that complicated!).
> 
> - James
>