Re: [RFC] mm: MADV_COLLAPSE semantics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Michal and Yang,

Thanks for the feedback!

On Tue, May 24, 2022 at 1:02 PM Yang Shi <shy828301@xxxxxxxxx> wrote:
> [...]
> Page reclaim could also cause the THP split. And it may happen at any
> time. I'm not sure how the users or callers could monitor it.

I don't have a good idea of what monitoring would look like, but this
is a great example that shows splitting can happen from underneath us
and we'll have to design accordingly.

Luckily in this example, the page is likely cold and therefore of less
interest to be backed by THPs.

On Wed, May 25, 2022 at 10:33 AM Yang Shi <shy828301@xxxxxxxxx> wrote:
>
> On Wed, May 25, 2022 at 1:24 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
> >
> > On Mon 23-05-22 17:18:32, Zach O'Keefe wrote:
> > [...]
> > > Idea: MADV_COLLAPSE should respect VM_NOHUGEPAGE and "never" THP mode,
> > > but otherwise would attempt to collapse.
> >
> > I do agree that {process_}madvise should fail on VM_NOHUGEPAGE. The
> > process has explicitly noted that THP shouldn't be used on such a VMA
> > and seeing THP could be observed as not complying with that contract.
> >
> > I am not so sure about the global "never" policy, though. The global
> > policy controls _kernel_ driven THPs. As the request to collapse memory
> > comes from the userspace I do not think it should be limited by the
> > kernel policy.

Ya, I agree this would be ideal / is the cleanest. However, Peter
mentioned a non-debug example where users wouldn't be expecting THPs
after setting "never". Though, as Peter points out, I'm not sure how
many users do this with CONFIG_TRANSPARENT_HUGEPAGE=y.

>> I also think it can be beneficial to implement userspace
> > based THP policies and exclude any kernel interference and that could be
> > achieved by global kernel "never" policy and implement the whole
> > functionality by process_madvise.

I don't have a clear picture yet, but even if we move THP collapse
policy to userspace, I imagine we'll still want an informed
application/allocator to be able to MADV_HUGEPAGE'ing known hot memory
and fault-in THPs rather than MADV_COLLAPSING after-the-fact. IOW, I
don't know if we'll ever want "never". When I get started on this
work, I was planning on some prctl(2) interface to disable khugepaged
on processes where the userspace agent has taken responsibility for
THP utilization.

> I'd prefer to respect "never" for now since it is typically used to
> disable THP globally even though the mappings are madvised
> (MADV_HUGEPAGE). IMHO I treat MADV_COLLAPSE as weaker MADV_HUGEPAGE
> (take effect for non-madvised mappings but not flip VM_NOHUGEPAGE) +
> best-effort synchronous THP collapse.

I'm likewise in favor of respecting it until proven otherwise - even
though I agree with Michal that it would be nice to not depend on the
kernel policy / sysfs settings here.

> We could lift the restriction in the future if it turns out non
> respecting "never" is more useful.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux