On 3/7/2023 5:12 AM, Andrew Morton wrote: > On Mon, 6 Mar 2023 17:22:54 +0800 Yin Fengwei <fengwei.yin@xxxxxxxxx> wrote: > >> This series is trying to bring the batched rmap removing to >> try_to_unmap_one(). It's expected that the batched rmap >> removing bring performance gain than remove rmap per page. >> >> ... >> >> include/linux/rmap.h | 5 + >> mm/page_vma_mapped.c | 30 +++ >> mm/rmap.c | 623 +++++++++++++++++++++++++------------------ >> 3 files changed, 398 insertions(+), 260 deletions(-) > > As was discussed in v2's review, if no performance benefit has been > demonstrated, why make this change? > I changed the MADV_PAGEOUT not to split the large folio for page cache and created a micro benchmark mainly as following: char *c = mmap(NULL, FILESIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); count = 0; while (1) { unsigned long i; for (i = 0; i < FILESIZE; i += pgsize) { cc = *(volatile char *)(c + i); } madvise(c, FILESIZE, MADV_PAGEOUT); count++; } munmap(c, FILESIZE); Run it with 96 instances + 96 files for 1 second. The test platform was on an IceLake with 48C/96T + 192G memory. Test result (number count) got 10% improvement with this patch series. And perf shows following: Before the patch: --19.97%--try_to_unmap_one | |--12.35%--page_remove_rmap | | | --11.39%--__mod_lruvec_page_state | | | |--1.51%--__mod_memcg_lruvec_state | | | | | --0.91%--cgroup_rstat_updated | | | --0.70%--__mod_lruvec_state | | | --0.63%--__mod_node_page_state | |--5.41%--ptep_clear_flush | | | --4.65%--flush_tlb_mm_range | | | --3.83%--flush_tlb_func | | | --3.51%--native_flush_tlb_one_user | |--0.75%--percpu_counter_add_batch | --0.55%--PageHeadHuge After the patch: --9.50%--try_to_unmap_one | |--6.94%--try_to_unmap_one_page.constprop.0.isra.0 | | | |--5.07%--ptep_clear_flush | | | | | --4.25%--flush_tlb_mm_range | | | | | --3.44%--flush_tlb_func | | | | | --3.05%--native_flush_tlb_one_user | | | --0.80%--percpu_counter_add_batch | |--1.22%--folio_remove_rmap_and_update_count.part.0 | | | --1.16%--folio_remove_rmap_range | | | --0.62%--__mod_lruvec_page_state | --0.56%--PageHeadHuge As expected, the cost of __mod_lruvec_page_state is reduced a lot with batched folio_remove_rmap_range. I believe the same benefit is there for page reclaim path also. Thanks. Regards Yin, Fengwei