Re: [PATCH v2 0/6] mm: zswap: global shrinker fix and proactive shrink

Takero Funaki <flintglass@xxxxxxxxx> · Mon, 15 Jul 2024 17:20:06 +0900

2024年7月13日(土) 8:02 Nhat Pham <nphamcs@xxxxxxxxx>:

> >
> > I agree this does not follow LRU, but I think the LRU priority
> > inversion is unavoidable once the pool limit is hit.
> > The accept_thr_percent should be lowered to reduce the probability of
> > LRU inversion if it matters. (it is why I implemented proactive
> > shrinker.)
>
> And yet, in your own benchmark it fails to prevent that, no? I think
> you lower it all the way down to 50%.
>
> >
> > When the writeback throughput is slower than memory usage grows,
> > zswap_store() will have to reject pages sooner or later.
> > If we evict the oldest stored pages synchronously before rejecting a
> > new page (rotating pool to keep LRU), it will affect latency depending
> > how much writeback is required to store the new page. If the oldest
> > pages were compressed well, we would have to evict too many pages to
> > store a warmer page, which blocks the reclaim progress. Fragmentation
> > in the zspool may also increase the required writeback amount.
> > We cannot accomplish both maintaining LRU priority and maintaining
> > pageout latency.
>
> Hmm yeah, I guess this is fair. Looks like there is not a lot of
> choice, if you want to maintain decent pageout latency...
>
> I could suggest that you have a budgeted zswap writeback on zswap
> store - i.e if the pool is full, then try to zswap writeback until we
> have enough space or if the budget is reached. But that feels like
> even more engineering - the IO priority approach might even be easier
> at that point LOL.
>
> Oh well, global shrinker delay it is :)
>
> >
> > Additionally, zswap_writeback_entry() is slower than direct pageout. I
> > assume this is because shrinker performs 4KB IO synchronously. I am
> > seeing shrinking throughput is limited by disk IOPS * 4KB while much
> > higher throughput can be achieved by disabling zswap. direct pageout
> > can be faster than zswap writeback, possibly because of bio
> > optimization or sequential allocation of swap.
>
> Hah, this is interesting!
>
> I wonder though, if the solution here is to perform some sort of
> batching for zswap writeback.
>
> BTW, what is the type of the storage device you are using for swap? Is
> it SSD or HDD etc?
>

It was tested on an Azure VM with SSD-backed storage. The total IOPS
was capped at 4K IOPS by the VM host. The max throughput of the global
shrinker was around 16 MB/s. Proactive shrinking cannot prevent
pool_limit_hit since memory allocation can be on the order of GB/s.
(The benchmark script allocates 2 GB sequentially, which was
compressed to 1.3 GB, while the zswap pool was limited to 200 MB.)

> >
> >
> > > Have you experimented with synchronous reclaim in the case the pool is
> > > full? All the way to the acceptance threshold is too aggressive of
> > > course - you might need to find something in between :)
> > >
> >
> > I don't get what the expected situation is.
> > The benchmark of patch 6 is performing synchronous reclaim in the case
> > the pool is full, since bulk memory allocation (write to mmapped
> > space) is much faster than writeback throughput. The zswap pool is
> > filled instantly at the beginning of benchmark runs. The
> > accept_thr_percent is not significant for the benchmark, I think.
>
> No. I meant synchronous reclaim as in triggering zswap writeback
> within the zswap store path, to make space for the incoming new zswap
> pages. But you already addressed it above :)
>
>
> >
> >
> > >
> > > I wonder if this contention would show up in PSI metrics
> > > (/proc/pressure/io, or the cgroup variants if you use them ). Maybe
> > > correlate reclaim counters (pgscan, zswpout, pswpout, zswpwb etc.)
> > > with IO pressure to show the pattern, i.e the contention problem was
> > > there before, and is now resolved? :)
> >
> > Unfortunately, I could not find a reliable metric other than elapsed
> > time. It seems PSI does not distinguish stalls for rejected pageout
> > from stalls for shrinker writeback.
> > For counters, this issue affects latency but does not increase the
> > number of pagein/out. Is there any better way to observe the origin of
> > contention?
> >
> > Thanks.