Hey Peter, Thanks for taking the time to review! On Thu, Apr 14, 2022 at 5:04 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > Hi, Zach, > > On Thu, Apr 14, 2022 at 11:06:00AM -0700, Zach O'Keefe wrote: > > process_madvise(2) > > > > Performs a synchronous collapse of the native pages > > mapped by the list of iovecs into transparent hugepages. > > > > Allocation semantics are the same as khugepaged, and depend on > > (1) the active sysfs settings > > /sys/kernel/mm/transparent_hugepage/enabled and > > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag, and (2) > > the VMA flags of the memory range being collapsed. > > > > Collapse eligibility criteria differs from khugepaged in that > > the sysfs files > > /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_[none|swap|shared] > > are ignored. > > The userspace khugepaged idea definitely makes sense to me, though I'm > curious how the line is drown on the different behaviors here by explicitly > ignoring the max_ptes_* entries. > > Let's assume the initiative is to duplicate a more data-aware khugepaged in > the userspace, then IMHO it makes more sense to start with all the policies > that applies to khugepaged already, including max_pte_*. > > I can understand the willingness to provide even stronger semantics here > than khugepaged since the userspace could have very clear knowledge of how > to provision the memories (better than a kernel scanner). It's just that > IMHO it could be slightly confusing if the new interface only partially > apply the khugepaged rules. > > No strong opinion here. It could already been a trade-off after the > discussion from the RFC with Michal which I read.. Just curious about how > you made that design decision so feel free to read it as a pure question. > Understand your point here. The allocation and max_pte_* semantics are split between khugepaged-like and fault-like, respectively - which could be confusing. Originally, I proposed a MADV_F_COLLAPSE_LIMITS flag to control the former's behavior, but agreed to keep things simple to start, and expand the interface if/when necessary. I opted to ignore max_ptes_* as the default since I envisioned that early adopters would "just want it to work". One such example would be backing executable text by hugepages on program load when many pages haven't been demand-paged in yet. What do you think? Thanks, Zach > Thanks, > > -- > Peter Xu >