I would like to propose a session to discuss the work going on around large folio swapin, whether its traditional swap or zswap or zram. Large folios have obvious advantages that have been discussed before like fewer page faults, batched PTE and rmap manipulation, reduced lru list, TLB coalescing (for arm64 and amd). However, swapping in large folios has its own drawbacks like higher swap thrashing. I had initially sent a RFC of zswapin of large folios in [1] but it causes a regression due to swap thrashing in kernel build time, which I am confident is happening with zram large folio swapin as well (which is merged in kernel). Some of the points we could discuss in the session: - What is the right (preferably open source) benchmark to test for swapin of large folios? kernel build time in limited memory cgroup shows a regression, microbenchmarks show a massive improvement, maybe there are benchmarks where TLB misses is a big factor and show an improvement. - We could have something like /sys/kernel/mm/transparent_hugepage/hugepages-*kB/swapin_enabled to enable/disable swapin but its going to be difficult to tune, might have different optimum values based on workloads and are likely to be left at their default values. Is there some dynamic way to decide when to swapin large folios and when to fallback to smaller folios? swapin_readahead swapcache path which only supports 4K folios atm has a read ahead window based on hits, however readahead is a folio flag and not a page flag, so this method can't be used as once a large folio is swapped in, we won't get a fault and subsequent hits on other pages of the large folio won't be recorded. - For zswap and zram, it might be that doing larger block compression/ decompression might offset the regression from swap thrashing, but it brings about its own issues. For e.g. once a large folio is swapped out, it could fail to swapin as a large folio and fallback to 4K, resulting in redundant decompressions. This will also mean swapin of large folios from traditional swap isn't something we should proceed with? - Should we even support large folio swapin? You often have high swap activity when the system/cgroup is close to running out of memory, at this point, maybe the best way forward is to just swapin 4K pages and let khugepaged [2], [3] collapse them if the surrounding pages are swapped in as well. [1] https://lore.kernel.org/all/20241018105026.2521366-1-usamaarif642@xxxxxxxxx/ [2] https://lore.kernel.org/all/20250108233128.14484-1-npache@xxxxxxxxxx/ [3] https://lore.kernel.org/lkml/20241216165105.56185-1-dev.jain@xxxxxxx/ Thanks, Usama