On Thu, Jan 9, 2025 at 12:06 PM Usama Arif <usamaarif642@xxxxxxxxx> wrote: > > I would like to propose a session to discuss the work going on > around large folio swapin, whether its traditional swap or > zswap or zram. > > Large folios have obvious advantages that have been discussed before > like fewer page faults, batched PTE and rmap manipulation, reduced > lru list, TLB coalescing (for arm64 and amd). > However, swapping in large folios has its own drawbacks like higher > swap thrashing. > I had initially sent a RFC of zswapin of large folios in [1] > but it causes a regression due to swap thrashing in kernel > build time, which I am confident is happening with zram large > folio swapin as well (which is merged in kernel). I am obviously interested in this discussion, but unfortunately I won't be able to make it this year. I will try to attend remotely though if possible! > > Some of the points we could discuss in the session: > > - What is the right (preferably open source) benchmark to test for > swapin of large folios? kernel build time in limited > memory cgroup shows a regression, microbenchmarks show a massive > improvement, maybe there are benchmarks where TLB misses is > a big factor and show an improvement. > > - We could have something like > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/swapin_enabled > to enable/disable swapin but its going to be difficult to tune, might > have different optimum values based on workloads and are likely to be > left at their default values. Is there some dynamic way to decide when > to swapin large folios and when to fallback to smaller folios? > swapin_readahead swapcache path which only supports 4K folios atm has a > read ahead window based on hits, however readahead is a folio flag and > not a page flag, so this method can't be used as once a large folio > is swapped in, we won't get a fault and subsequent hits on other > pages of the large folio won't be recorded. > > - For zswap and zram, it might be that doing larger block compression/ > decompression might offset the regression from swap thrashing, but it > brings about its own issues. For e.g. once a large folio is swapped > out, it could fail to swapin as a large folio and fallback > to 4K, resulting in redundant decompressions. > This will also mean swapin of large folios from traditional swap > isn't something we should proceed with? > > - Should we even support large folio swapin? You often have high swap > activity when the system/cgroup is close to running out of memory, at this > point, maybe the best way forward is to just swapin 4K pages and let > khugepaged [2], [3] collapse them if the surrounding pages are swapped in > as well. > > [1] https://lore.kernel.org/all/20241018105026.2521366-1-usamaarif642@xxxxxxxxx/ > [2] https://lore.kernel.org/all/20250108233128.14484-1-npache@xxxxxxxxxx/ > [3] https://lore.kernel.org/lkml/20241216165105.56185-1-dev.jain@xxxxxxx/ > > Thanks, > Usama