Current filemap_map_pages() uses page granularity even when underneath folio is large folio. Making it use folio based granularity allows batched refcount, rmap and mm counter update. Which brings performance gain. This sereis tries to bring batched refcount, rmap and mm counter for filemap_map_pages(). Testing with a micro benchmark like will-it-scale.pagefault on a 48C/96T IceLake tbox showed: - batched rmap brings around 15% performance gain - batched refcount brings around 2% performance gain Patch 1 enabled the fault around for share file page write fault. As David suggested here: [1] Patch 2 update filemap_map_pages() to do map based on folio granularity and batched refcount update Patch 3,4,5 enable batched rmap and mm counter [1] https://lore.kernel.org/linux-mm/e14b4e9a-612d-fc02-edc0-8f3b6bcf4148@xxxxxxxxxx/ Change from v1: - Update the struct page * parameter of *_range() to index in the folio as Matthew suggested - Fix indentations problem as Matthew pointed out - Add back the function comment as Matthew pointed out - Use nr_pages than len as Matthew pointed out - Add do_set_pte_range() as Matthew suggested - Add function comment as Ying suggested - Add performance test result to patch 1/2/5 commit message Patch 1: - Adapt commit message as Matthew suggested - Add Reviweed-by from Matthew Patch 3: - Restore general logic of page_add_file_rmap_range() to make patch review easier as Matthew suggested Patch 5: - Add perf data collected to understand the reason of performance gain Yin Fengwei (5): mm: Enable fault around for shared file page fault filemap: add function filemap_map_folio_range() rmap: add page_add_file_rmap_range() mm: add do_set_pte_range() filemap: batched update mm counter,rmap when map file folio include/linux/mm.h | 2 + include/linux/rmap.h | 2 + mm/filemap.c | 119 ++++++++++++++++++++++++++++++------------- mm/memory.c | 71 +++++++++++++++++++++----- mm/rmap.c | 66 +++++++++++++++++++----- 5 files changed, 199 insertions(+), 61 deletions(-) -- 2.30.2