Re: [PATCH v2 00/12] mm: userspace hugepage collapse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 15, 2022 at 6:39 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Thu, Apr 14, 2022 at 05:52:43PM -0700, Zach O'Keefe wrote:
> > Hey Peter,
> >
> > Thanks for taking the time to review!
> >
> > On Thu, Apr 14, 2022 at 5:04 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
> > >
> > > Hi, Zach,
> > >
> > > On Thu, Apr 14, 2022 at 11:06:00AM -0700, Zach O'Keefe wrote:
> > > > process_madvise(2)
> > > >
> > > >       Performs a synchronous collapse of the native pages
> > > >       mapped by the list of iovecs into transparent hugepages.
> > > >
> > > >       Allocation semantics are the same as khugepaged, and depend on
> > > >       (1) the active sysfs settings
> > > >       /sys/kernel/mm/transparent_hugepage/enabled and
> > > >       /sys/kernel/mm/transparent_hugepage/khugepaged/defrag, and (2)
> > > >       the VMA flags of the memory range being collapsed.
> > > >
> > > >       Collapse eligibility criteria differs from khugepaged in that
> > > >       the sysfs files
> > > >       /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_[none|swap|shared]
> > > >       are ignored.
> > >
> > > The userspace khugepaged idea definitely makes sense to me, though I'm
> > > curious how the line is drown on the different behaviors here by explicitly
> > > ignoring the max_ptes_* entries.
> > >
> > > Let's assume the initiative is to duplicate a more data-aware khugepaged in
> > > the userspace, then IMHO it makes more sense to start with all the policies
> > > that applies to khugepaged already, including max_pte_*.
> > >
> > > I can understand the willingness to provide even stronger semantics here
> > > than khugepaged since the userspace could have very clear knowledge of how
> > > to provision the memories (better than a kernel scanner).  It's just that
> > > IMHO it could be slightly confusing if the new interface only partially
> > > apply the khugepaged rules.
> > >
> > > No strong opinion here.  It could already been a trade-off after the
> > > discussion from the RFC with Michal which I read..  Just curious about how
> > > you made that design decision so feel free to read it as a pure question.
> > >
> >
> > Understand your point here. The allocation and max_pte_* semantics are
> > split between khugepaged-like and fault-like, respectively - which
> > could be confusing. Originally, I proposed a MADV_F_COLLAPSE_LIMITS
> > flag to control the former's behavior, but agreed to keep things
> > simple to start, and expand the interface if/when necessary. I opted
> > to ignore max_ptes_* as the default since I envisioned that early
> > adopters would "just want it to work". One such example would be
> > backing executable text by hugepages on program load when many pages
> > haven't been demand-paged in yet.
> >
> > What do you think?
>
> I'm just slightly worried that'll make the default MADV_COLLAPSE semantics
> blurred.
>
> To me, a clean default definition for MADV_COLLAPSE would be nice, as "do
> khugepaged on this range, and with current thread context".  IMHO any
> feature bits then can be supplementing special needs, and I'll take the thp
> backing executable example to be one of the (good?) reason we'd need an
> extra flag for ignoring the max_ptes_* knobs.
>
> So personally if I were you maybe I'll start with the simple scheme of that
> (even if it won't immediately service a thing) but then add either the
> defrag or ignore_max_ptes_* as feature bits later on, with clear use case
> descriptions about why we need each of the feature flags.  IMHO numbers
> would be even more helpful when there's specific use cases on the show.
>
> Or, perhaps you think all potential MADV_COLLAPSE users should literally
> skip max_ptes_* limitations always?
>

Thanks for your time and valuable feedback here, Peter. I had a response typed
up, but after a few iterations became increasingly unsatisfied with my
own response.

I think this feature should be able to stand on its own without
consideration of a userspace khugepaged, as we have existing concrete
examples where it would be useful. In these cases, and I assume almost
all other use-cases outside userspace khugepaged, max_ptes_* should be
ignored as the fundamental assumption of MADV_COLLAPSE is that the
user knows better, and IMHO, khugepaged heuristics shouldn't tell
users they are wrong.

But this, as you mention, unsatisfactorily blurs the semantics of
MADV_COLLAPSE: "act like khugepaged here, but not here".

As such, WDYT about the reverse-side of the coin of what you proposed:
to not couple the default behavior of MADV_COLLAPSE with khugepaged at
all? I.e. Not tie the allocation semantics to
/sys/kernel/mm/transparent_hugepage/khugepaged/defrag. We can add
flags as necessary when/if a reimplementation of khugepaged in
userspace proves fruitful.

Thanks for your time and input,
Zach



> Anyway, I won't pretend I am an expert in this area. :) So please take that
> with a grain of salt.
>
> Thanks,
>
> --
> Peter Xu
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux