On 2024/2/2 02:12, Johannes Weiner wrote: > On Thu, Feb 01, 2024 at 03:49:05PM +0000, Chengming Zhou wrote: >> The !zswap_exclusive_loads_enabled mode will leave compressed copy in >> the zswap tree and lru list after the folio swapin. >> >> There are some disadvantages in this mode: >> 1. It's a waste of memory since there are two copies of data, one is >> folio, the other one is compressed data in zswap. And it's unlikely >> the compressed data is useful in the near future. >> >> 2. If that folio is dirtied, the compressed data must be not useful, >> but we don't know and don't invalidate the trashy memory in zswap. >> >> 3. It's not reclaimable from zswap shrinker since zswap_writeback_entry() >> will always return -EEXIST and terminate the shrinking process. >> >> On the other hand, the only downside of zswap_exclusive_loads_enabled >> is a little more cpu usage/latency when compression, and the same if >> the folio is removed from swapcache or dirtied. >> >> Not sure if we should accept the above disadvantages in the case of >> !zswap_exclusive_loads_enabled, so send this out for disscusion. >> >> Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> > > This is interesting. > > First, I will say that I never liked this config option, because it's > nearly impossible for a user to answer this question. Much better to > just pick a reasonable default. Agree. > > What should the default be? > > Caching "swapout work" is helpful when the system is thrashing. Then > recently swapped in pages might get swapped out again very soon. It > certainly makes sense with conventional swap, because keeping a clean > copy on the disk saves IO work and doesn't cost any additional memory. > > But with zswap, it's different. It saves some compression work on a > thrashing page. But the act of keeping compressed memory contributes > to a higher rate of thrashing. And that can cause IO in other places > like zswap writeback and file memory. > > It would be useful to have an A/B test to confirm that not caching is > better. Can you run your test with and without keeping the cache, and > in addition to the timings also compare the deltas for pgscan_anon, > pgscan_file, workingset_refault_anon, workingset_refault_file? I just A/B test kernel building in tmpfs directory, memory.max=2GB. (zswap writeback enabled and shrinker_enabled, one 50GB swapfile) >From the below results, exclusive mode has fewer scan and refault. zswap-invalidate-entry zswap-invalidate-entry-exclusive real 63.80 63.01 user 1063.83 1061.32 sys 290.31 266.15 zswap-invalidate-entry zswap-invalidate-entry-exclusive workingset_refault_anon 2383084.40 1976397.40 workingset_refault_file 44134.00 45689.40 workingset_activate_anon 837878.00 728441.20 workingset_activate_file 4710.00 4085.20 workingset_restore_anon 732622.60 639428.40 workingset_restore_file 1007.00 926.80 workingset_nodereclaim 0.00 0.00 pgscan 14343003.40 12409570.20 pgscan_kswapd 0.00 0.00 pgscan_direct 14343003.40 12409570.20 pgscan_khugepaged 0.00 0.00