Hey Rongwei, Thanks for taking the time to review! On Tue, May 10, 2022 at 5:49 PM Rongwei Wang <rongwei.wang@xxxxxxxxxxxxxxxxx> wrote: > > Hi, Zach > > Thanks for your great patchset! > Recently, We also try to collapse THP in this way, likes performance > degradation due to using too much hugepages in our scenes. > > And there is a doubt about process_madvise(MADV_COLLAPSE) when we test > this patchset:. It seems that process_madvise(MADV_COLLAPSE) rely on > madvise(MADV_HUGEPAGE)? If the vma wasn't marked with 'hg', > process_madvise(MADV_COLLAPSE) will fail to collapse. And if I miss > something, please let me know. > I tried to have MADV_COLLAPSE follow the same THP eligibility semantics as khugepaged and at-fault: either THP=always, or THP=madvise and the vma is marked with MADV_HUGEPAGE, as you point out. If I understand you correctly, the usefulness of process_madvise(MADV_COLLAPSE) is limited in the case where THP=madvise and a CAP_SYS_ADMIN user is requesting a collapse of behalf of another process since they don't have a way to mark the target memory as eligible (which requires VM_HUGEPAGE). If so, I think that's a valid point, and your suggestion below of a supporting MADV_[NO]HUGEPAGE for process_madvise(2) makes sense. For the sake of exploring all options, I'll mention that there was also a previous idea suggested by Yang Shi where MADV_COLLAPSE could also set VM_HUGEPAGE[1]. Since it's possible supporting MADV_[NO]HUGEPAGE for process_madivse(2) has applications outside a subsequent MADV_COLLAPSE, and since I don't see process_madvise(MADV_COLLAPSE) to be in a hot path, I'd vote in favor of your suggestion and include process_madvise(MADV_[NO]HUGEPAGE) support in v6 unless others object. Thanks again for your review and your suggestion! Zach [1] https://lore.kernel.org/linux-mm/CAHbLzkqLRBd6u3qn=KqpOhRcPZtpGXbTXLUjK1z=4d_dQ06Pvw@xxxxxxxxxxxxxx/ > If so, how about introducing process_madvise(MADV_HUGEPAGE) or > process_madvise(MADV_NOHUGEPAGE)? The former helps to mark the target > vma with 'hg', and the collapse process can be finished completely with > the help of other processes. the latter could let some special vma avoid > collapsing when setting 'THP=always'. > > Best regards, > -wrw > > On 5/5/22 5:44 AM, Zach O'Keefe wrote: > > Allow MADV_COLLAPSE behavior for process_madvise(2) if caller has > > CAP_SYS_ADMIN or is requesting collapse of it's own memory. > > > > Signed-off-by: Zach O'Keefe <zokeefe@xxxxxxxxxx> > > --- > > mm/madvise.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 638517952bd2..08c11217025a 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -1168,13 +1168,15 @@ madvise_behavior_valid(int behavior) > > } > > > > static bool > > -process_madvise_behavior_valid(int behavior) > > +process_madvise_behavior_valid(int behavior, struct task_struct *task) > > { > > switch (behavior) { > > case MADV_COLD: > > case MADV_PAGEOUT: > > case MADV_WILLNEED: > > return true; > > + case MADV_COLLAPSE: > > + return task == current || capable(CAP_SYS_ADMIN); > > default: > > return false; > > } > > @@ -1452,7 +1454,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, > > goto free_iov; > > } > > > > - if (!process_madvise_behavior_valid(behavior)) { > > + if (!process_madvise_behavior_valid(behavior, task)) { > > ret = -EINVAL; > > goto release_task; > > }