On Sat, Jun 8, 2024 at 8:53 AM Takero Funaki <flintglass@xxxxxxxxx> wrote: > > This series addresses two issues and introduces a minor improvement in > zswap global shrinker: By the way, what is your current setup? This global shrinker loop should only be run when the global pool limit is hit. That *never* happens to us in production, even with the zswap shrinker disabled. The default pool limit is 20% of memory, which is quite a lot, especially if anonymous memory is well-compressed and/or has a lot of zero pages (which do not count towards the limit). > > 1. Fix the memcg iteration logic that breaks iteration on offline memcgs. > 2. Fix the error path that aborts on expected error codes. > 3. Add proactive shrinking at 91% full, for 90% accept threshold. > > These patches need to be applied in this order to avoid potential loops > caused by the first issue. Patch 3 can be applied independently, but the > two issues must be resolved to ensure the shrinker can evict pages. > > Previously, the zswap pool could be filled with old pages that the > shrinker failed to evict, leading to zswap rejecting new pages. With > this series applied, the shrinker will continue to evict pages until the > pool reaches the accept_thr_percent threshold proactively, as > documented, and maintain the pool to keep recent pages. > > As a side effect of changes in the hysteresis logic, zswap will no > longer reject pages under the max pool limit. > > With this series, reclaims smaller than the proative shrinking amount > finish instantly and trigger background shrinking. Admins can check if > new pages are buffered by zswap by monitoring the pool_limit_hit > counter. > > Changes since v0: > mm: zswap: fix global shrinker memcg iteration > - Drop and reacquire spinlock before skipping a memcg. > - Add some comment to clarify the locking mechanism. > mm: zswap: proactive shrinking before pool size limit is hit > - Remove unneeded check before scheduling work. > - Change shrink start threshold to accept_thr_percent + 1%. > > Now it starts shrinking at accept_thr_percent + 1%. Previously, the > threshold was at the midpoint of 100% to accept_threshold. > > If a workload needs 10% space to buffer the average reclaim amount, with > the previous patch, it required setting the accept_thr_percent to 80%. > For 50%, it became 0%, which is not acceptable and unclear for admins. > We can use the accept percent as the shrink threshold directly but that > sounds shrinker is called too frequently around the accept threshold. I > added 1% as a minimum gap to the shrink threshold. > > ---- > > Takero Funaki (3): > mm: zswap: fix global shrinker memcg iteration > mm: zswap: fix global shrinker error handling logic > mm: zswap: proactive shrinking before pool size limit is hit > > Documentation/admin-guide/mm/zswap.rst | 17 ++- > mm/zswap.c | 172 ++++++++++++++++++------- > 2 files changed, 136 insertions(+), 53 deletions(-) > > -- > 2.43.0 >