[Cc linux-api] On Tue 16-02-21 20:24:16, David Rientjes wrote: > Hi everybody, > > Khugepaged is slow by default, it scans at most 4096 pages every 10s. > That's normally fine as a system-wide setting, but some applications would > benefit from a more aggressive approach (as long as they are willing to > pay for it). > > Instead of adding priorities for eligible ranges of memory to khugepaged, > temporarily speeding khugepaged up for the whole system, or sharding its > work for memory belonging to a certain process, one approach would be to > allow userspace to induce hugepage collapse. > > The benefit to this approach would be that this is done in process context > so its cpu is charged to the process that is inducing the collapse. > Khugepaged is not involved. Yes, this makes a lot of sense to me. > Idea was to allow userspace to induce hugepage collapse through the new > process_madvise() call. This allows us to collapse hugepages on behalf of > current or another process for a vectored set of ranges. Yes, madvise sounds like a good fit for the purpose. > This could be done through a new process_madvise() mode *or* it could be a > flag to MADV_HUGEPAGE since process_madvise() allows for a flag parameter > to be passed. For example, MADV_F_SYNC. Would this MADV_F_SYNC be applicable to other madvise modes? Most existing madvise modes do not seem to make much sense. We can argue that MADV_PAGEOUT would guarantee the range was indeed reclaimed but I am not sure we want to provide such a strong semantic because it can limit future reclaim optimizations. To me MADV_HUGEPAGE_COLLAPSE sounds like the easiest way forward. -- Michal Hocko SUSE Labs