在 2021/3/2 上午4:56, David Rientjes 写道: > On Wed, 24 Feb 2021, Alex Shi wrote: > >>> Agreed, and happy to see that there's a general consensus for the >>> direction. Benefit of a new madvise mode is that it can be used for >>> madvise() as well if you are interested in only a single range of your own >>> memory and then it doesn't need to reconcile with any of the already >>> overloaded semantics of MADV_HUGEPAGE. >> >> It's a good idea to let process deal with its own THP policy. >> but current applications will miss the benefit w/o changes, and change is >> expensive for end users. So except this work, may a per memcg collapse benefit >> apps and free for them, we often deploy apps in cgroups on server now. >> > > Hi Alex, > > I'm not sure that I understand: this MADV_COLLAPSE would be possible for > process_madvise() as well and by passing a vectored set of ranges so a > process can do this on behalf of other processes (it's the only way that > we could theoretically move khugepaged to userspace, although that's not > an explicit end goal). > Forgive my stupidity, I still can't figure out how process_madvise caller fill the iovec of other's on a common system. > > How would you see this working with memcg involved? I had thought this > was entirely orthogonal to any cgroup. > You'r right, it's out of cgroup and better. per cgroup khugepaged could be a alternative way. but it require a cgroup and not specific on target process. Thanks Alex