Re: [PATCH v2 0/6] mm: zswap: global shrinker fix and proactive shrink

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Tue, 16 Jul 2024 19:53:09 -0700

[..]
>
> > My concern is that we are knowingly (and perhaps unnecessarily)
> > creating an LRU inversion here - preferring swapping out the rejected
> > pages over the colder pages in the zswap pool. Shouldn't it be the
> > other way around? For instance, can we spiral into the following
> > scenario:
> >
> > 1. zswap pool becomes full.
> > 2. Memory is still tight, so anonymous memory will be reclaimed. zswap
> > keeps rejecting incoming pages, and putting a hold on the global
> > shrinker.
> > 3. The pages that are swapped out are warmer than the ones stored in
> > the zswap pool, so they will be more likely to be swapped in (which,
> > IIUC, will also further delay the global shrinker).
> >
> > and the cycle keeps going on and on?
>
> I agree this does not follow LRU, but I think the LRU priority
> inversion is unavoidable once the pool limit is hit.
> The accept_thr_percent should be lowered to reduce the probability of
> LRU inversion if it matters. (it is why I implemented proactive
> shrinker.)

Why?

Let's take a step back. You are suggesting that we throttle zswap
writeback to allow reclaim to swapout warmer pages to swap device. As
Nhat said, we are proliferating LRU inversion instead of fixing it.

I think I had a similar discussion with Johannes about this before,
and we discussed that if zswap becomes full, we should instead
throttle reclaim and allow zswap writeback to proceed (i.e. the
opposite of what this series is doing). This would be similar to how
we throttle reclaim today to wait for dirty pages to be written back.

This should reduce/fix the LRU inversion instead of proliferating it,
and it should reduce the total amout of IO as colder pages should go
to disk while warmer pages go to zswap. I am wondering if we can reuse
the reclaim_throttle() mechanism here.

One concern I have is that we will also throttle file pages if we use
reclaim_throttle(), since I don't see per-type throttling there. This
could be fine, since we similarly throttle zswap reclaim if there are
too many dirty file pages. I am not super familiar with reclaim
throttling, so maybe I missed something obvious or there is a better
way, but I believe that from a high level this should be the right way
to go.

I actually think if we do this properly, and throttle reclaim when
zswap becomes full, we may be able to drop the acceptance hysteresis
and rely on the throttling mechanism to make sure we stop reclaim
until we free up enough space in zswap to avoid consistently hitting
the limit, but this could be a future extension.

Johannes, any thoughts here?

Anyway, since patches 1-2 are independent of the rest of the series,
feel free to send them separately, and we can continue the discussion
on the best way forward for the rest of the series.