Hey Michal and Yang, Thanks for the feedback! On Tue, May 24, 2022 at 1:02 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > [...] > Page reclaim could also cause the THP split. And it may happen at any > time. I'm not sure how the users or callers could monitor it. I don't have a good idea of what monitoring would look like, but this is a great example that shows splitting can happen from underneath us and we'll have to design accordingly. Luckily in this example, the page is likely cold and therefore of less interest to be backed by THPs. On Wed, May 25, 2022 at 10:33 AM Yang Shi <shy828301@xxxxxxxxx> wrote: > > On Wed, May 25, 2022 at 1:24 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > On Mon 23-05-22 17:18:32, Zach O'Keefe wrote: > > [...] > > > Idea: MADV_COLLAPSE should respect VM_NOHUGEPAGE and "never" THP mode, > > > but otherwise would attempt to collapse. > > > > I do agree that {process_}madvise should fail on VM_NOHUGEPAGE. The > > process has explicitly noted that THP shouldn't be used on such a VMA > > and seeing THP could be observed as not complying with that contract. > > > > I am not so sure about the global "never" policy, though. The global > > policy controls _kernel_ driven THPs. As the request to collapse memory > > comes from the userspace I do not think it should be limited by the > > kernel policy. Ya, I agree this would be ideal / is the cleanest. However, Peter mentioned a non-debug example where users wouldn't be expecting THPs after setting "never". Though, as Peter points out, I'm not sure how many users do this with CONFIG_TRANSPARENT_HUGEPAGE=y. >> I also think it can be beneficial to implement userspace > > based THP policies and exclude any kernel interference and that could be > > achieved by global kernel "never" policy and implement the whole > > functionality by process_madvise. I don't have a clear picture yet, but even if we move THP collapse policy to userspace, I imagine we'll still want an informed application/allocator to be able to MADV_HUGEPAGE'ing known hot memory and fault-in THPs rather than MADV_COLLAPSING after-the-fact. IOW, I don't know if we'll ever want "never". When I get started on this work, I was planning on some prctl(2) interface to disable khugepaged on processes where the userspace agent has taken responsibility for THP utilization. > I'd prefer to respect "never" for now since it is typically used to > disable THP globally even though the mappings are madvised > (MADV_HUGEPAGE). IMHO I treat MADV_COLLAPSE as weaker MADV_HUGEPAGE > (take effect for non-madvised mappings but not flip VM_NOHUGEPAGE) + > best-effort synchronous THP collapse. I'm likewise in favor of respecting it until proven otherwise - even though I agree with Michal that it would be nice to not depend on the kernel policy / sysfs settings here. > We could lift the restriction in the future if it turns out non > respecting "never" is more useful.