On 2/17/21 9:21 AM, Michal Hocko wrote: > [Cc linux-api] > > On Tue 16-02-21 20:24:16, David Rientjes wrote: >> Hi everybody, >> >> Khugepaged is slow by default, it scans at most 4096 pages every 10s. >> That's normally fine as a system-wide setting, but some applications would >> benefit from a more aggressive approach (as long as they are willing to >> pay for it). >> >> Instead of adding priorities for eligible ranges of memory to khugepaged, >> temporarily speeding khugepaged up for the whole system, or sharding its >> work for memory belonging to a certain process, one approach would be to >> allow userspace to induce hugepage collapse. >> >> The benefit to this approach would be that this is done in process context >> so its cpu is charged to the process that is inducing the collapse. >> Khugepaged is not involved. > > Yes, this makes a lot of sense to me. > >> Idea was to allow userspace to induce hugepage collapse through the new >> process_madvise() call. This allows us to collapse hugepages on behalf of >> current or another process for a vectored set of ranges. > > Yes, madvise sounds like a good fit for the purpose. Agreed on both points. >> This could be done through a new process_madvise() mode *or* it could be a >> flag to MADV_HUGEPAGE since process_madvise() allows for a flag parameter >> to be passed. For example, MADV_F_SYNC. > > Would this MADV_F_SYNC be applicable to other madvise modes? Most > existing madvise modes do not seem to make much sense. We can argue that > MADV_PAGEOUT would guarantee the range was indeed reclaimed but I am not > sure we want to provide such a strong semantic because it can limit > future reclaim optimizations. > > To me MADV_HUGEPAGE_COLLAPSE sounds like the easiest way forward. I guess in the old madvise(2) we could create a new combo of MADV_HUGEPAGE | MADV_WILLNEED with this semantic? But you are probably more interested in process_madvise() anyway. There the new flag would make more sense. But there's also David H.'s proposal for MADV_POPULATE and there might be benefit in considering both at the same time? Should e.g. MADV_POPULATE with MADV_HUGEPAGE have the collapse semantics? But would MADV_POPULATE be added to process_madvise() as well? Just thinking out loud so we don't end up with more flags than necessary, it's already confusing enough as it is.