[PATCH v2 0/6] mm/zswap: optimize zswap lru list

Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> · Sun, 04 Feb 2024 03:05:58 +0000

Changes in v2:
- Add comment above zswap_invalidate() to mention that large folio
  swap slot is not covered for now, per Yosry.
- Add comment about locking behaviour of LRU_STOP, per Yosry.
- Add the theory details and supportive testing results on why we
  choose the exclusive load as the default for zswap, per Johannes.
- Collect tags.
- Link to v1: https://lore.kernel.org/r/20240201-b4-zswap-invalidate-entry-v1-0-56ed496b6e55@xxxxxxxxxxxxx

Hi all,

This series is motivated when observe the zswap lru list shrinking,
noted there are some unexpected cases in zswap_writeback_entry().

bpftrace -e 'kr:zswap_writeback_entry {@[(int32)retval]=count()}'

There are some -ENOMEM because when the swap entry is freed to
per-cpu swap pool, it doesn't invalidate/drop zswap entry. Then
the shrinker encounter these trashy zswap entries, it can't be
reclaimed and return -ENOMEM.

So moves the invalidation ahead to when swap entry freed to the
per-cpu swap pool, since there is no any benefit to leave trashy
zswap entries on the zswap tree and lru list.

Another case is -EEXIST, which is seen more in the case of
!zswap_exclusive_loads_enabled, in which case the swapin folio
will leave compressed copy on the tree and lru list. And it
can't be reclaimed until the folio is removed from swapcache.

Changing to zswap_exclusive_loads_enabled mode will invalidate
when folio swapin, which has its own drawback if that folio
is still clean in swapcache and swapout again, we need to
compress it again. Please see the commit for details on why
we choose exclusive load as the default for zswap.

Another optimization for -EEXIST is that we add LRU_STOP to
support terminating the shrinking process to avoid evicting
warmer region.

Testing using kernel build in tmpfs, one 50GB swapfile and
zswap shrinker_enabled, with memory.max set to 2GB.

                mm-unstable   zswap-optimize
real               63.90s       63.25s
user             1064.05s     1063.40s
sys               292.32s      270.94s

The main optimization is in sys cpu, about 7% improvement.

Thanks for review and comments!

Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>
---
Chengming Zhou (6):
      mm/zswap: add more comments in shrink_memcg_cb()
      mm/zswap: invalidate zswap entry when swap entry free
      mm/zswap: stop lru list shrinking when encounter warm region
      mm/zswap: remove duplicate_entry debug value
      mm/zswap: only support zswap_exclusive_loads_enabled
      mm/zswap: zswap entry doesn't need refcount anymore

 include/linux/list_lru.h |   2 +
 include/linux/zswap.h    |   4 +-
 mm/Kconfig               |  16 ------
 mm/list_lru.c            |   3 ++
 mm/swap_slots.c          |   3 ++
 mm/swapfile.c            |   1 -
 mm/zswap.c               | 136 ++++++++++++++++-------------------------------
 7 files changed, 56 insertions(+), 109 deletions(-)
---
base-commit: 3a92c45e4ba694381c46994f3fde0d8544a2088b
change-id: 20240201-b4-zswap-invalidate-entry-b77dea670325

Best regards,
-- 
Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>