On 18.02.21 14:43, Vlastimil Babka wrote:
On 2/17/21 9:21 AM, Michal Hocko wrote:
[Cc linux-api]
On Tue 16-02-21 20:24:16, David Rientjes wrote:
Hi everybody,
Khugepaged is slow by default, it scans at most 4096 pages every 10s.
That's normally fine as a system-wide setting, but some applications would
benefit from a more aggressive approach (as long as they are willing to
pay for it).
Instead of adding priorities for eligible ranges of memory to khugepaged,
temporarily speeding khugepaged up for the whole system, or sharding its
work for memory belonging to a certain process, one approach would be to
allow userspace to induce hugepage collapse.
The benefit to this approach would be that this is done in process context
so its cpu is charged to the process that is inducing the collapse.
Khugepaged is not involved.
Yes, this makes a lot of sense to me.
Idea was to allow userspace to induce hugepage collapse through the new
process_madvise() call. This allows us to collapse hugepages on behalf of
current or another process for a vectored set of ranges.
Yes, madvise sounds like a good fit for the purpose.
Agreed on both points.
This could be done through a new process_madvise() mode *or* it could be a
flag to MADV_HUGEPAGE since process_madvise() allows for a flag parameter
to be passed. For example, MADV_F_SYNC.
Would this MADV_F_SYNC be applicable to other madvise modes? Most
existing madvise modes do not seem to make much sense. We can argue that
MADV_PAGEOUT would guarantee the range was indeed reclaimed but I am not
sure we want to provide such a strong semantic because it can limit
future reclaim optimizations.
To me MADV_HUGEPAGE_COLLAPSE sounds like the easiest way forward.
I guess in the old madvise(2) we could create a new combo of MADV_HUGEPAGE |
MADV_WILLNEED with this semantic? But you are probably more interested in
process_madvise() anyway. There the new flag would make more sense. But there's
also David H.'s proposal for MADV_POPULATE and there might be benefit in
considering both at the same time? Should e.g. MADV_POPULATE with MADV_HUGEPAGE
have the collapse semantics? But would MADV_POPULATE be added to
process_madvise() as well? Just thinking out loud so we don't end up with more
flags than necessary, it's already confusing enough as it is.
Note that madvise() eats only a single value, not flags. Combinations as
you describe are not possible.
Something MADV_HUGEPAGE_COLLAPSE make sense to me that does not need the
mmap lock in write and does not modify the actual VMA, only a mapping.
--
Thanks,
David / dhildenb