2024年7月13日(土) 8:02 Nhat Pham <nphamcs@xxxxxxxxx>: > > > > I agree this does not follow LRU, but I think the LRU priority > > inversion is unavoidable once the pool limit is hit. > > The accept_thr_percent should be lowered to reduce the probability of > > LRU inversion if it matters. (it is why I implemented proactive > > shrinker.) > > And yet, in your own benchmark it fails to prevent that, no? I think > you lower it all the way down to 50%. > > > > > When the writeback throughput is slower than memory usage grows, > > zswap_store() will have to reject pages sooner or later. > > If we evict the oldest stored pages synchronously before rejecting a > > new page (rotating pool to keep LRU), it will affect latency depending > > how much writeback is required to store the new page. If the oldest > > pages were compressed well, we would have to evict too many pages to > > store a warmer page, which blocks the reclaim progress. Fragmentation > > in the zspool may also increase the required writeback amount. > > We cannot accomplish both maintaining LRU priority and maintaining > > pageout latency. > > Hmm yeah, I guess this is fair. Looks like there is not a lot of > choice, if you want to maintain decent pageout latency... > > I could suggest that you have a budgeted zswap writeback on zswap > store - i.e if the pool is full, then try to zswap writeback until we > have enough space or if the budget is reached. But that feels like > even more engineering - the IO priority approach might even be easier > at that point LOL. > > Oh well, global shrinker delay it is :) > > > > > Additionally, zswap_writeback_entry() is slower than direct pageout. I > > assume this is because shrinker performs 4KB IO synchronously. I am > > seeing shrinking throughput is limited by disk IOPS * 4KB while much > > higher throughput can be achieved by disabling zswap. direct pageout > > can be faster than zswap writeback, possibly because of bio > > optimization or sequential allocation of swap. > > Hah, this is interesting! > > I wonder though, if the solution here is to perform some sort of > batching for zswap writeback. > > BTW, what is the type of the storage device you are using for swap? Is > it SSD or HDD etc? > It was tested on an Azure VM with SSD-backed storage. The total IOPS was capped at 4K IOPS by the VM host. The max throughput of the global shrinker was around 16 MB/s. Proactive shrinking cannot prevent pool_limit_hit since memory allocation can be on the order of GB/s. (The benchmark script allocates 2 GB sequentially, which was compressed to 1.3 GB, while the zswap pool was limited to 200 MB.) > > > > > > > Have you experimented with synchronous reclaim in the case the pool is > > > full? All the way to the acceptance threshold is too aggressive of > > > course - you might need to find something in between :) > > > > > > > I don't get what the expected situation is. > > The benchmark of patch 6 is performing synchronous reclaim in the case > > the pool is full, since bulk memory allocation (write to mmapped > > space) is much faster than writeback throughput. The zswap pool is > > filled instantly at the beginning of benchmark runs. The > > accept_thr_percent is not significant for the benchmark, I think. > > No. I meant synchronous reclaim as in triggering zswap writeback > within the zswap store path, to make space for the incoming new zswap > pages. But you already addressed it above :) > > > > > > > > > > > > I wonder if this contention would show up in PSI metrics > > > (/proc/pressure/io, or the cgroup variants if you use them ). Maybe > > > correlate reclaim counters (pgscan, zswpout, pswpout, zswpwb etc.) > > > with IO pressure to show the pattern, i.e the contention problem was > > > there before, and is now resolved? :) > > > > Unfortunately, I could not find a reliable metric other than elapsed > > time. It seems PSI does not distinguish stalls for rejected pageout > > from stalls for shrinker writeback. > > For counters, this issue affects latency but does not increase the > > number of pagein/out. Is there any better way to observe the origin of > > contention? > > > > Thanks.