Re: [PATCH v1 0/3] mm: zswap: global shrinker fix and proactive shrink

Nhat Pham <nphamcs@xxxxxxxxx> · Thu, 13 Jun 2024 08:22:29 -0700

On Sat, Jun 8, 2024 at 8:53 AM Takero Funaki <flintglass@xxxxxxxxx> wrote:
>
> This series addresses two issues and introduces a minor improvement in
> zswap global shrinker:
>
> 1. Fix the memcg iteration logic that breaks iteration on offline memcgs.
> 2. Fix the error path that aborts on expected error codes.
> 3. Add proactive shrinking at 91% full, for 90% accept threshold.
>

Taking a step back from the correctness conversation, could you
include in the changelog of the patches and cover letter a realistic
scenario, along with user space-visible metrics that show (ideally all
4, but at least some of the following):

1. A user problem (that affects performance, or usability, etc.) is happening.

2. The root cause is what we are trying to fix (for e.g in patch 1, we
are skipping over memcgs unnecessarily in the global shrinker loop).

3. The fix alleviates the root cause in b)

4. The userspace-visible problem goes away or is less serious.

I have already hinted in a previous response, but global shrinker is
rarely triggered in production. There are lots of factors that would
prevent this from triggering:

1. Max zswap pool size 20% of memory by default, which is a lot.

2. Swapfile size also limits the size of the amount of data storable
in the zswap pool.

3. Other cgroup constraints (memory.max, memory.zswap.max,
memory.swap.max) also limit a cgroup's zswap usage.

I do agree that patch 1 at least is fixing a problem, and probably
patch 2 too but please justify why we are investing in the extra
complexity to fix this problem in the first place.