On Sat, Jul 08, 2023 at 12:52:18AM +0800, Yin Fengwei wrote: > This series identified the large folio for mlock to two types: > - The large folio is in VM_LOCKED VMA range > - The large folio cross VM_LOCKED VMA boundary This is somewhere that I think our fixation on MUST USE PMD ENTRIES has led us astray. Today when the arguments to mlock() cross a folio boundary, we split the PMD entry but leave the folio intact. That means that we continue to manage the folio as a single entry on the LRU list. But userspace may have no idea that we're doing this. It may have made several calls to mmap() 256kB at once, they've all been coalesced into a single VMA and khugepaged has come along behind its back and created a 2MB THP. Now userspace calls mlock() and instead of treating that as a hint that oops, maybe we shouldn't've done that, we do our utmost to preserve the 2MB folio. I think this whole approach needs rethinking. IMO, anonymous folios should not cross VMA boundaries. Tell me why I'm wrong.