>> E.g., have with a very sparse memory layout, we don't want to waste >> memory by allocating memory where we actually have no page populated yet >> -- could be user space won't reuse that memory in the foreseeable >> future. With too many swap entries, we don't want to trigger an >> eventually unnecessary overhead of swapping in entries if user space >> won't access them in the foreseeable future. Something similar applies >> to max_ptes_shared, where one might just end up wasting a lot of memory >> eventually in some applications. >> >> So IMHO, with MADV_COLLAPSE we should ignore/disable any heuristics that >> try figuring out what user space might be doing. We know exactly what >> user space asks for -- and that can be documented properly. >> Just a thought, if we ever want to implement khugepaged in user space, it could theoretically obtain similar information using e.g., the pagemap. It wouldn't be race-free, but the question is if it would matter. I consider the primary use case of giving an application more precise control over actual THP placement. > > Sounds good to me. Would you also be in favor of decoupling allocation > semantics from khugepaged? I.e. we'll pick some default gfp flags and > not depend on /sys/kernel/mm/transparent_hugepage/khugepaged/defrag? Good question. It's not really a heuristic like that other stuff. Easy answer: we're not dealing with khugepaged, so anything in /sys/kernel/mm/transparent_hugepage/khugepaged/ shouldn't apply? Sure, we could have a separate toggles for MADV_COLLAPSE. Maybe we simply want a dedicated syscall where we can specify additional options ... but maybe that simply over-complicates the problem. -- Thanks, David / dhildenb