On page fault, filemap_map_pages() already retrieves a folio from the page cache and iterates over it, which realised some savings. This patch series drives that further down by allowing filemap to tell the MM to map a contiguous range of pages in the folio. This improves performance by batching the updates to the folio's refcount and the rmap counters. Testing with a micro benchmark like will-it-scale.pagefault on a 48C/96T IceLake box showed: - batched rmap brings around 15% performance gain - batched refcount brings around 2% performance gain v4: - Add the set_ptes() architecture interface - Change various interfaces to take (folio, struct page *, nr) instead of (folio, unsigned long start, nr) - Remove do_set_pte() instead of keeping a compat interface - Add a check in set_pte_range() to ensure that large anon folios are not being passed in yet (David Hildenbrand) - Save / restore the old vmf->pte pointer instead of passing a different pte pointer to set_pte_range() Matthew Wilcox (Oracle) (1): mm: Add generic set_ptes() Yin Fengwei (4): filemap: Add filemap_map_folio_range() rmap: add folio_add_file_rmap_range() mm: Convert do_set_pte() to set_pte_range() filemap: Batch PTE mappings Documentation/filesystems/locking.rst | 2 +- include/linux/mm.h | 3 +- include/linux/pgtable.h | 27 +++++++ include/linux/rmap.h | 2 + mm/filemap.c | 111 ++++++++++++++++---------- mm/memory.c | 31 ++++--- mm/rmap.c | 60 ++++++++++---- 7 files changed, 164 insertions(+), 72 deletions(-) -- 2.35.1