Hi, Zach
Thanks for your great patchset!
Recently, We also try to collapse THP in this way, likes performance
degradation due to using too much hugepages in our scenes.
And there is a doubt about process_madvise(MADV_COLLAPSE) when we test
this patchset:. It seems that process_madvise(MADV_COLLAPSE) rely on
madvise(MADV_HUGEPAGE)? If the vma wasn't marked with 'hg',
process_madvise(MADV_COLLAPSE) will fail to collapse. And if I miss
something, please let me know.
If so, how about introducing process_madvise(MADV_HUGEPAGE) or
process_madvise(MADV_NOHUGEPAGE)? The former helps to mark the target
vma with 'hg', and the collapse process can be finished completely with
the help of other processes. the latter could let some special vma avoid
collapsing when setting 'THP=always'.
Best regards,
-wrw
On 5/5/22 5:44 AM, Zach O'Keefe wrote:
Allow MADV_COLLAPSE behavior for process_madvise(2) if caller has
CAP_SYS_ADMIN or is requesting collapse of it's own memory.
Signed-off-by: Zach O'Keefe <zokeefe@xxxxxxxxxx>
---
mm/madvise.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 638517952bd2..08c11217025a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1168,13 +1168,15 @@ madvise_behavior_valid(int behavior)
}
static bool
-process_madvise_behavior_valid(int behavior)
+process_madvise_behavior_valid(int behavior, struct task_struct *task)
{
switch (behavior) {
case MADV_COLD:
case MADV_PAGEOUT:
case MADV_WILLNEED:
return true;
+ case MADV_COLLAPSE:
+ return task == current || capable(CAP_SYS_ADMIN);
default:
return false;
}
@@ -1452,7 +1454,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
goto free_iov;
}
- if (!process_madvise_behavior_valid(behavior)) {
+ if (!process_madvise_behavior_valid(behavior, task)) {
ret = -EINVAL;
goto release_task;
}