Re: [PATCH v2 00/12] mm: userspace hugepage collapse

"Zach O'Keefe" <zokeefe@xxxxxxxxxx> · Wed, 20 Apr 2022 18:02:14 -0700

On Wed, Apr 20, 2022 at 5:57 PM Yang Shi <shy828301@xxxxxxxxx> wrote:
>
> On Tue, Apr 19, 2022 at 3:43 PM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote:
> >
> > On Tue, Apr 19, 2022 at 1:03 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
> > >
> > > >> E.g., have with a very sparse memory layout, we don't want to waste
> > > >> memory by allocating memory where we actually have no page populated yet
> > > >> -- could be user space won't reuse that memory in the foreseeable
> > > >> future. With too many swap entries, we don't want to trigger an
> > > >> eventually unnecessary overhead of swapping in entries if user space
> > > >> won't access them in the foreseeable future. Something similar applies
> > > >> to max_ptes_shared, where one might just end up wasting a lot of memory
> > > >> eventually in some applications.
> > > >>
> > > >> So IMHO, with MADV_COLLAPSE we should ignore/disable any heuristics that
> > > >> try figuring out what user space might be doing. We know exactly what
> > > >> user space asks for -- and that can be documented properly.
> > > >>
> > >
> > > Just a thought, if we ever want to implement khugepaged in user space,
> > > it could theoretically obtain similar information using e.g., the
> > > pagemap. It wouldn't be race-free, but the question is if it would matter.
> > >
> > > I consider the primary use case of giving an application more precise
> > > control over actual THP placement.
> > >
> >
> > Good point about the pagemap and agree about the primary use case -
> > I'll make that clear in v3 cover letter.
> >
> > > >
> > > > Sounds good to me. Would you also be in favor of decoupling allocation
> > > > semantics from khugepaged? I.e. we'll pick some default gfp flags and
> > > > not depend on /sys/kernel/mm/transparent_hugepage/khugepaged/defrag?
> > >
> > > Good question. It's not really a heuristic like that other stuff.
> > >
> > > Easy answer: we're not dealing with khugepaged, so anything in
> > > /sys/kernel/mm/transparent_hugepage/khugepaged/ shouldn't apply?
> > >
> >
> > That's what I'm thinking now too. If there's no objections, I'll
> > proceed in that direction for v3.
>
> I agree, we should not treat MADV_COLLAPSE as "userspace khugepaged"
> IMHO. It is still best effort though, but it is requested by the users
> explicitly so kernel should trust the users' judgement and ignore
> those max_ptes_* since we should assume the users know what they are
> doing and the cost.
>

Thanks for reading and giving your thoughts, Yang. Glad to hear we are
aligned here!

I'll send out a v3 early next week. Only real change is the gfp flags,
but I want to avoid spamming folks so soon since v2.

Thanks,
Zach

> >
> > > Sure, we could have a separate toggles for MADV_COLLAPSE.
> > >
> > > Maybe we simply want a dedicated syscall where we can specify additional
> > > options ... but maybe that simply over-complicates the problem.
> > >
> >
> > Thankfully process_madvise(2) has flags, and madvise(2) users can
> > always migrate to using process_madvise(2) on self. Piggy-backing off
> > madvise infrastructure for these "non-advice actions" (e.g.
> > MADV_PAGEOUT) seems to be the norm.
> >
> > Thanks as always for your time and thoughts!
> >
> > Zach
> >
> > > --
> > > Thanks,
> > >
> > > David / dhildenb
> > >