Architectures like powerpc support page access count mechanism which can be used for better identification of hot/cold pages in the system. POWER10 supports a 32-bit page access count which is incremented based on page access and decremented based on time decay. The page access count is incremented based on physical address filtering and hence should count access via page table(mmap) and read/write syscall. This patch series updates multi-gen LRU to use this page access count instead of the page table reference bit to classify a page into a generation. Pages are classified into generation during the sorting phase of reclaim. Currently sorting phase use generation details stored in page flags and with this change, we can avoid using page flags for storing generation. That will free the 3 bits in page flag used to store generation. Since the page access counting mechanism can also count access via read/write, we can look at avoiding using tier index in page flags. That should free the 2 bits in page flag used for REFS (this is not done in this patch). I also added a patch that did the below @@ -5243,7 +5243,8 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap if (list_empty(&list)) return scanned; retry: - reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false); + reclaimed = shrink_folio_list(&list, pgdat, sc, + &stat, arch_supports_page_access_count()); sc->nr_reclaimed += reclaimed; The performance did improve, but that did result in a large increase in the workingset_refault_anon. I think this is because it takes some minimal access to classify the pages to the younger generation and we can have high page refaults during that window. PATCH 2 did result in some improvements on powerpc because it is removing all additional code that is not used in page classification. memcached: patch details Total Ops/sec: mglru 160821 PATCH 2 164572 mongodb: Patch details Throughput(Ops/sec) mglru 92987 PATCH 2 93740 Enabling the architecture-supported page access count does impact workload performance since updating the access count involves some memory access overhead. Another challenge with page access count is in determining relative hotness between pages. I did try two methods density-based clustering and kmean clustering to classify pages to LRU generation based on sampled hotness. Doing more work during page classification is resulting in increased lock contention on lru_lock and hence hurts performance. memcached: patch details Total Ops/sec: arch page access count 161940 avoid folio_check_reference 171631 (but refault count increase from 2606765 -> 7793482) mongodb: Patch details Throughput(Ops/sec) arch page access count 92533 avoid folio_check_reference 91105 ( refault: 828951 -> 4592539) The patch series does show that using page access count is not resulting in any regression and can keep the code simpler w.r.t different feedback loop used during multi-gen LRU reclaim. This also saves some bits in page->flags . It was also observed that overhead in counting page access is not that high and can be mitigated by further tuning of the page generation classification logic. This also enables us to start looking at using page access count in other parts of the linux kernel like page promotion. I haven't been able to measure the impact on page promotion yet due to hardware availability. Aneesh Kumar K.V (7): mm: Move some code around so that next patch is simpler mm: Don't build multi-gen LRU page table walk code on architecture not supported mm: multi-gen LRU: avoid using generation stored in page flags for generation mm: multi-gen LRU: support different page aging mechanism powerpc/mm: Add page access count support powerpc/mm: Clear page access count on allocation mm: multi-gen LRU: Shrink folio list without checking for page table reference arch/Kconfig | 3 + arch/arm64/Kconfig | 1 + arch/powerpc/Kconfig | 10 + arch/powerpc/include/asm/hca.h | 49 ++++ arch/powerpc/include/asm/page.h | 5 + arch/powerpc/include/asm/page_aging.h | 35 +++ arch/powerpc/mm/Makefile | 1 + arch/powerpc/mm/hca.c | 288 ++++++++++++++++++++ arch/x86/Kconfig | 1 + include/linux/memcontrol.h | 2 +- include/linux/mm_inline.h | 47 +--- include/linux/mm_types.h | 8 +- include/linux/mmzone.h | 15 +- include/linux/page_aging.h | 43 +++ include/linux/swap.h | 2 +- kernel/fork.c | 2 +- mm/Kconfig | 4 + mm/memcontrol.c | 2 +- mm/rmap.c | 4 +- mm/vmscan.c | 372 ++++++++++++++++++++++---- 20 files changed, 780 insertions(+), 114 deletions(-) create mode 100644 arch/powerpc/include/asm/hca.h create mode 100644 arch/powerpc/include/asm/page_aging.h create mode 100644 arch/powerpc/mm/hca.c create mode 100644 include/linux/page_aging.h -- 2.39.2