Re: [Invitation] Linux MM Alignment Session on HugeTLB Core MM Convergence on Wednesday

Mike Kravetz <mike.kravetz@xxxxxxxxxx> · Thu, 15 Jun 2023 11:31:45 -0700

On 06/15/23 10:29, David Hildenbrand wrote:
> On 15.06.23 10:04, Michal Hocko wrote:
> > On Wed 14-06-23 16:04:58, Mike Kravetz wrote:
> > > On 06/12/23 18:59, David Rientjes wrote:
> > > 
> > > We did spend a good chunk of time on hugetlb/core mm unification and
> > > removing special casing.  In some (most) of these cases, the benefit of
> > > removing special cases from core mm would result in adding more code to
> > > hugetlb.  For example: proper type'ing so that hugetlb does not treat
> > > all page table entries as PTEs.  Again, I may be wrong but I think
> > > people were OK with adding more code (and even complexity) to hugetlb
> > > if it eliminated special casing in the core mm.  But, there did not
> > > seem to be a clear concensus especially with the thought that we may
> > > need to double hugetlb code to get types right.
> > 
> > This is primarily your call as a maintainer. If you ask me, hugetlb is
> > over complicated in its current form already. Regression are not really
> > seldom when code is added which is a signal we are hitting maintenance
> > cost walls. This doesn't mean further development is impossible of
> > course but it is increasingly more costly AFAICS.
> > 
> > > Unless I missed something, there was no clear direction at the end of this
> > > session.  I was hoping that we could come up with a plan to address the
> > > issues facing today's hugetlb users.  IMO, there seems to be two options:
> > > 1) Start work on hugetlb v2 with the intention that customers will need
> > >     to move to this to address their issues.
> > > 2) Incorporate functionality like HGM into existing hugetlb.
> > 
> 
> I fully agree with all that Michal said.
> 
> I'm just going to add that I don't see why anyone would look into a
> hugetlbv2 if we're going to use the motivation of "help existing users" to
> make hugetlb ever-more complicated and special. "existing users" her even
> meaning "people use hugetlb for backing VMs. Now they want to get postcopy
> working with less latency." -- which I consider partially a new use case.
> 
> So working on adding HGM and concurrently starting a hugetlbv2? I don't
> think that will happen if we decide on adding HGM and proceeding with that
> reasoning about existing users.

I agree that doing both in parallel is not going to happen.

> As expressed yesterday, I don't see a fast an clean way to make hugetlb
> significantly less special (thanks Willy for the list of odd cases).
> 
> Sure, we can talk about adding pte_t safety, but I don't really see a way
> forward to unify page table walking code that way -- there are still the
> (PT) locking, PMD sharing, PTE-cont special cases ... but sure, if anybody
> wants to work on that, why not.
> 
> Having that said, like Michal, I acknowledge that it is Mikes call regarding
> the hugetlb code. I, for my part, will push back on any added core-mm
> complexity that adds more special casing for hugetlb. Maybe there are easy
> ways to integrate it nicely and that is not really a concern.

And if the call on how to move forward was easy, I would have already made
a decision. :)  I really do appreciate all the input.

It is pretty clear that adding more complex special cases to core mm for
hugetlb is going to be a non-starter.  James has talked about any such special
cases for HGM in another reply.

I previously said that I am leaning toward trying to add HGM to existing
hugetlb.  This is on the condition that any addition of special cases to
the core mm would be minimal and trivial.  In addition, the added complexity
to hugetlb has to be manageable.
-- 
Mike Kravetz

> Note that while we've been discussing how HGM would already interfere with
> core-mm, we've not even started discussing how actual
> MADV_SPLIT/MADV_COLLAPSE/page poisioning ... would affect core-mm and
> require special-casing for hugetlb.
> 
> I, for my part, will explore a bit the mapcount topic (as time permits) and
> see if we can come up at least with a unified mapcount approach (e.g.,
> sub-page mapcount?). But I suspect even figuring that out will take quite a
> while already ...
> 
> -- 
> Cheers,
> 
> David / dhildenb