2024年7月7日(日) 2:32 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>: > Seems that patches 1 & 2 might be worthy of backporting into earlier > kernels? Could you please provide a description of the > userspace-visible effects of the bugs so the desirability of such an > action can be better understood? Patch 1 and Patch 2 partially resolve the zswap global shrinker that leads to performance degradation on small systems. However, the fix uncovers another issue addressed in patches 3 to 6. Backporting only the two patches can be a tradeoff with possible performance degradation in some cases. I am not sure the possible issue can be acceptable. The visible issue is described in the cover letter: > Visible issue to resolve > ------------------------------- > The visible issue with the current global shrinker is that pageout/in > operations from active processes are slow when zswap is near its max > pool size. This is particularly significant on small memory systems > where total swap usage exceeds what zswap can store. This results in old > pages occupying most of the zswap pool space, with recent pages using > the swap disk directly. > > Root cause of the issue > ------------------------------- > This issue is caused by zswap maintaining the pool size near 100%. Since > the shrinker fails to shrink the pool to accept_threshold_percent, zswap > rejects incoming pages more frequently than it should. The rejected > pages are directly written to disk while zswap protects old pages from > eviction, leading to slow pageout/in performance for recent pages. Patches 1 and 2 partially resolve the issue by fixing iteration logic. With the two patches applied, zswap shrinker starts evicting pages once the pool limit is hit, as described in the current zswap documentation. However, this fix might not give performance improvement since it lacks proactive shrinking required to prepare spaces before pool limit is hit, implemented in patch 3. Unfortunately, the fix uncovers another issue described in the bottom half of the cover letter. Because the shrinker performs writeback simultaneously with pageout for rejected pages, the shrinker delays actual memory reclaim unnecessarily. The first issue masked the second by virtually disabling the global shrinker writeback. I think the second issue only occurs under severe memory pressure, but may degrade pageout performance as shown in the benchmark at the bottom of the cover letter.