On Tue 02-08-22 02:48:33, Zach O'Keefe wrote: [...] > "mm/madvise: add MADV_COLLAPSE to process_madvise()" in the v7 series > ended with me mentioning a couple options, but ultimately I didn't > present a solution, and no consensus was reached[1]. After taking a > closer look, this is my proposal for what I believe to be the best > path forward. It should be squashed into the original patch. What do you think? If it is agreed that the CAP_SYS_ADMIN is too strict of a requirement then yes, this should be squashed into the original patch. There is no real reason to create a potential bisection headache by changing the permission model in a later patch. >From my POV, I would agree that CAP_SYS_ADMIN is just too strict of a requirement. I didn't really have time to follow recent discussions but I would argue that the operation is not really destructive or seriously harmful. All applications can already have their memory (almost) equally THP collapsed by khupaged with the proposed process_madvise semantic. NOHUGEMEM and prctl opt out from THP are both honored AFAIU and the only difference is the global THP killswitch behavior which I do not think warrants the strongest CAP_SYS_ADMIN capability (especially because it doesn't really control all kinds of THPs). If there is a userspace agent collapsing memory and causing problems then it can be easily fixed in the userspace. And I find that easier to do than putting the bar so high that userspace agents would be unfeasible because of CAP_SYS_ADMIN (which is nono in many cases as it would allow essentially full control of other stuff). So from practical POV, risking an extended RSS is really a negligible risk to lose a potentially useful feature for all others. Just my 2c > Thanks again, > Zach > > [1] https://lore.kernel.org/linux-mm/Ys4aTRqWIbjNs1mI@xxxxxxxxxx/ -- Michal Hocko SUSE Labs