Hi, Zach, On Thu, Apr 14, 2022 at 11:06:00AM -0700, Zach O'Keefe wrote: > process_madvise(2) > > Performs a synchronous collapse of the native pages > mapped by the list of iovecs into transparent hugepages. > > Allocation semantics are the same as khugepaged, and depend on > (1) the active sysfs settings > /sys/kernel/mm/transparent_hugepage/enabled and > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag, and (2) > the VMA flags of the memory range being collapsed. > > Collapse eligibility criteria differs from khugepaged in that > the sysfs files > /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_[none|swap|shared] > are ignored. The userspace khugepaged idea definitely makes sense to me, though I'm curious how the line is drown on the different behaviors here by explicitly ignoring the max_ptes_* entries. Let's assume the initiative is to duplicate a more data-aware khugepaged in the userspace, then IMHO it makes more sense to start with all the policies that applies to khugepaged already, including max_pte_*. I can understand the willingness to provide even stronger semantics here than khugepaged since the userspace could have very clear knowledge of how to provision the memories (better than a kernel scanner). It's just that IMHO it could be slightly confusing if the new interface only partially apply the khugepaged rules. No strong opinion here. It could already been a trade-off after the discussion from the RFC with Michal which I read.. Just curious about how you made that design decision so feel free to read it as a pure question. Thanks, -- Peter Xu