Re: [PATCH v3 1/2] zswap: implement a second chance algorithm for dynamic zswap shrinker

Johannes Weiner <hannes@xxxxxxxxxxx> · Wed, 7 Aug 2024 07:03:41 -0400



On Mon, Aug 05, 2024 at 04:22:42PM -0700, Nhat Pham wrote:
> Current zswap shrinker's heuristics to prevent overshrinking is brittle
> and inaccurate, specifically in the way we decay the protection size
> (i.e making pages in the zswap LRU eligible for reclaim).
> 
> We currently decay protection aggressively in zswap_lru_add() calls.
> This leads to the following unfortunate effect: when a new batch of
> pages enter zswap, the protection size rapidly decays to below 25% of
> the zswap LRU size, which is way too low.
> 
> We have observed this effect in production, when experimenting with the
> zswap shrinker: the rate of shrinking shoots up massively right after a
> new batch of zswap stores. This is somewhat the opposite of what we want
> originally - when new pages enter zswap, we want to protect both these
> new pages AND the pages that are already protected in the zswap LRU.
> 
> Replace existing heuristics with a second chance algorithm
> 
> 1. When a new zswap entry is stored in the zswap pool, its referenced
>    bit is set.
> 2. When the zswap shrinker encounters a zswap entry with the referenced
>    bit set, give it a second chance - only flips the referenced bit and
>    rotate it in the LRU.
> 3. If the shrinker encounters the entry again, this time with its
>    referenced bit unset, then it can reclaim the entry.
> 
> In this manner, the aging of the pages in the zswap LRUs are decoupled
> from zswap stores, and picks up the pace with increasing memory pressure
> (which is what we want).
> 
> The second chance scheme allows us to modulate the writeback rate based
> on recent pool activities. Entries that recently entered the pool will
> be protected, so if the pool is dominated by such entries the writeback
> rate will reduce proportionally, protecting the workload's workingset.On
> the other hand, stale entries will be written back quickly, which
> increases the effective writeback rate.
> 
> The referenced bit is added at the hole after the `length` field of
> struct zswap_entry, so there is no extra space overhead for this
> algorithm.
> 
> We will still maintain the count of swapins, which is consumed and
> subtracted from the lru size in zswap_shrinker_count(), to further
> penalize past overshrinking that led to disk swapins. The idea is that
> had we considered this many more pages in the LRU active/protected, they
> would not have been written back and we would not have had to swapped
> them in.
> 
> To test this new heuristics, I built the kernel under a cgroup with
> memory.max set to 2G, on a host with 36 cores:
> 
> With the old shrinker:
> 
> real: 263.89s
> user: 4318.11s
> sys: 673.29s
> swapins: 227300.5
> 
> With the second chance algorithm:
> 
> real: 244.85s
> user: 4327.22s
> sys: 664.39s
> swapins: 94663
> 
> (average over 5 runs)
> 
> We observe an 1.3% reduction in kernel CPU usage, and around 7.2%
> reduction in real time. Note that the number of swapped in pages
> dropped by 58%.
> 
> Suggested-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> Signed-off-by: Nhat Pham <nphamcs@xxxxxxxxx>

Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>