On 16 Feb 2021, at 23:24, David Rientjes wrote: > Hi everybody, > > Khugepaged is slow by default, it scans at most 4096 pages every 10s. > That's normally fine as a system-wide setting, but some applications would > benefit from a more aggressive approach (as long as they are willing to > pay for it). > > Instead of adding priorities for eligible ranges of memory to khugepaged, > temporarily speeding khugepaged up for the whole system, or sharding its > work for memory belonging to a certain process, one approach would be to > allow userspace to induce hugepage collapse. > > The benefit to this approach would be that this is done in process context > so its cpu is charged to the process that is inducing the collapse. > Khugepaged is not involved. > > Idea was to allow userspace to induce hugepage collapse through the new > process_madvise() call. This allows us to collapse hugepages on behalf of > current or another process for a vectored set of ranges. > > This could be done through a new process_madvise() mode *or* it could be a > flag to MADV_HUGEPAGE since process_madvise() allows for a flag parameter > to be passed. For example, MADV_F_SYNC. > > When done, this madvise call would allocate a hugepage on the right node > and attempt to do the collapse in process context just as khugepaged would > otherwise do. > > This would immediately be useful for a malloc implementation, for example, > that has released its memory back to the system using MADV_DONTNEED and > will subsequently refault the memory. Rather than wait for khugepaged to > come along 30m later, for example, and collapse this memory into a > hugepage (which could take a much longer time on a very large system), an > alternative would be to use this process_madvise() mode to induce the > action up front. In other words, say "I'm returning this memory to the > application and it's going to be hot, so back it by a hugepage now rather > than waiting until later." > > It would also be useful for read-only file-backed mappings for text > segments. Khugepaged should be happy, it's just less work done by generic > kthreads that gets charged as an overall tax to everybody. > > Thoughts? The idea sounds great to me. One question on how it interacts with khugepaged: will the process be excluded from khugepaged if this process_madvise() is used on it? Since it may save khugepaged some additional scanning work if someone is actively collapsing hugepages for this process. — Best Regards, Yan Zi
Attachment:
signature.asc
Description: OpenPGP digital signature