On Wed, 10 Feb 2021, Michal Hocko wrote: > On Wed 10-02-21 17:57:29, Michal Hocko wrote: > > On Wed 10-02-21 16:18:50, Vlastimil Babka wrote: > [...] > > > And the munlock (munlock_vma_pages_range()) is slow, because it uses > > > follow_page_mask() in a loop incrementing addresses by PAGE_SIZE, so that's > > > always traversing all levels of page tables from scratch. Funnily enough, > > > speeding this up was my first linux-mm series years ago. But the speedup only > > > works if pte's are present, which is not the case for unpopulated PROT_NONE > > > areas. That use case was unexpected back then. We should probably convert this > > > code to a proper page table walk. If there are large areas with unpopulated pmd > > > entries (or even higher levels) we would traverse them very quickly. > > > > Yes, this is a good idea. I suspect it will be little bit tricky without > > duplicating a large part of gup page table walker. > > Thinking about it some more, unmap_page_range would be a better model > for this operation. Could do, I suppose; but I thought it was just a matter of going back to using follow_page_mask() in munlock_vma_pages_range() (whose fear of THP split looks overwrought, since an extra reference now prevents splitting); and enhancing follow_page_mask() to let the no_page_table() FOLL_DUMP case set ctx->page_mask appropriately (or perhaps it can be preset at a higher level, without having to pass ctx so far down, dunno). Nice little job, but I couldn't quite spare the time to do it: needs a bit more care than I could afford (I suspect the page_increm business at the end of munlock_vma_pages_range() is good enough while THP tails are skipped one by one, but will need to be fixed to apply page_mask correctly to the start - __get_user_pages()'s page_increm-entation looks superior). Hugh