On Thu, Jan 18, 2024 at 9:39 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Thu, Jan 18, 2024 at 09:06:43AM -0800, Yosry Ahmed wrote: > > > > > On a different note, I wonder if it would help to perform synchronous > > > > > reclaim here instead. With our current design, the zswap store failure > > > > > (due to global limit hit) would leave the incoming page going to swap > > > > > instead, creating an LRU inversion. Not sure if that's ideal. > > > > > > > > The global shrink path keeps reclaiming until zswap can accept again > > > > (by default, that means reclaiming 10% of the total limit). I think > > > > this is too expensive to be done synchronously. > > > > > > That thresholding code is a bit weird right now. > > > > > > It wakes the shrinker and rejects at the same time. We're guaranteed > > > to see rejections, even if the shrinker has no trouble flushing some > > > entries a split second later. > > > > > > It would make more sense to wake the shrinker at e.g. 95% full and > > > have it run until 90%. > > > > > > But with that in place we also *should* do synchronous reclaim once we > > > hit 100%. Just enough to make room for the store. This is important to > > > catch the case where reclaim rate exceeds swapout rate. Rejecting and > > > going to swap means the reclaimer will be throttled down to IO rate > > > anyway, and the app latency isn't any worse. But this way we keep the > > > pipeline alive, and keep swapping out the oldest zswap entries, > > > instead of rejecting and swapping what would be the hottest ones. > > > > I fully agree with the thresholding code being weird, and with waking > > up the shrinker before the pool is full. What I don't understand is > > how we can do synchronous reclaim when we hit 100% and still respect > > the acceptance threshold :/ > > > > Are you proposing we change the semantics of the acceptance threshold > > to begin with? > > I kind of am. It's worth looking at the history of this knob. > > It was added in 2020 by 45190f01dd402112d3d22c0ddc4152994f9e1e55, and > from the changelogs and the code in this patch I do not understand how > this was supposed to work. > > It also *didn't* work for very basic real world applications. See > Domenico's follow-up (e0228d590beb0d0af345c58a282f01afac5c57f3), which > effectively reverted it to get halfway reasonable behavior. > > If there are no good usecases for this knob, then I think it makes > sense to phase it out again. I am always nervous about removing/altering user visible knobs, but if you think it's fine then I am all for it. I think it makes more sense to start writeback early to avoid the whole situation if possible, and synchronously reclaim a little bit if we hit 100%. I think the proactive writeback should reduce the amount of synchronous IO we need to do in reclaim as well, so we may see some latency improvements.