On Wed, Mar 20, 2024 at 05:50:53AM -0400, Johannes Weiner wrote: > On Wed, Mar 20, 2024 at 02:08:22AM +0000, Yosry Ahmed wrote: > > Currently, the number of protected zswap entries corresponding to an > > lruvec are incremented every time we swapin a page. > > Correct. This is the primary signal that the shrinker is being too > aggressive in moving entries to disk and should slow down...? Right, I think that's where I was confused. See below :) > > > This happens regardless of whether or not the page originated in > > zswap. Hence, swapins from disk will lead to increasing protection > > on potentially stale zswap entries. Furthermore, the increased > > shrinking protection can lead to more pages skipping zswap and going > > to disk, eventually leading to even more swapins from disk and > > starting a vicious circle. > > How does shrinker protection affect zswap stores? > > On the contrary, I would expect this patch to create a runaway > shrinker. The more aggressively it moves entries out to disk, the > lower the rate of zswap loads, the more aggressively it moves more > entries out to disk. I think I found the source of my confusion. As you described, the intention of the protection is to detect that we are doing writeback more aggressively than we should. This is indicated by swapins (especially those from disk). However, this assumes that pages swapped in from disk had gone there by shrinking/writeback. What I was focused on was pages that end up on disk because of the zswap limit. In this case, a disk swapin is actually a sign that we swapped hot pages to disk instead of putting them in zswap, which probably indicates that we should shrink zswap more aggressively to make room for these pages. I guess the general assumption is that with the shrinker at play, pages skipping zswap due to the limit becomes the unlikely scenario -- which makes sense. However, pages that end up on disk because they skipped zswap should have a higher likelihood of being swapped in, because they are hotter by definition. Ideally, we would be able to tell during swapin if this page was sent to disk due to writeback or skipping zswap and increase or decrease protection accordingly -- but this isn't the case. So perhaps the answer is to decrease the protection when pages skip zswap instead? Or is this not necessary because we kick off a background shrinker anyway? IIUC the latter will stop once we reach the acceptance threshold, but it might be a good idea to increase shrinking beyond that under memory pressure. Also, in an ideal world I guess it would be better to distinguish swapins from disk vs. zswap? Swapins from disk should lead to a bigger increase in protection IIUC (assuming they happened due to writeback). Sorry if my analysis is still off, I am still familiarizing myself with the shrinker heuristics :) > > > Instead, only increase the protection when pages are loaded from zswap. > > This also has a nice side effect of removing zswap_folio_swapin() and > > replacing it with a static helper that is only called from zswap_load(). > > > > No problems were observed in practice, this was found through code > > inspection. > > This is missing test results :) I intentionally wanted to send out the patch first to get some feedback, knowing that I probably missed something. In retrospect, this should have been an RFC patch. Patch 2 should be irrelevant tho, I only bundled them because they touch the same area.