Re: [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration

Nhat Pham <nphamcs@xxxxxxxxx> · Thu, 13 Jun 2024 09:08:05 -0700

On Sat, Jun 8, 2024 at 8:53 AM Takero Funaki <flintglass@xxxxxxxxx> wrote:
>
> This patch fixes an issue where the zswap global shrinker stopped
> iterating through the memcg tree.
>
> The problem was that `shrink_worker()` would stop iterating when a memcg
> was being offlined and restart from the tree root.  Now, it properly
> handles the offlining memcg and continues shrinking with the next memcg.
>
> This patch also modified handing of the lock for offlined memcg cleaner
> to adapt the change in the iteration, and avoid negligibly rare skipping
> of a memcg from shrink iteration.
>

Honestly, I think this version is even more complicated than v0 :)

Hmmm how about this:

do {
    spin_lock(&zswap_pools_lock);
    do {
        /* no offline caller has been called to memcg yet */
        if (memcg == zswap_next_shrink) {
            zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);

        memcg = zswap_next_shrink;
        if (!memcg && ++failure > MAX_FAILURE) {
            spin_unlock(&zswap_pools_lock);
            return;
        }

    } while (!zswap_next_shrink || !mem_cgroup_tryget_online(zswap_next_shrink))
    spin_unlock(&zswap_pools_lock);

    /* proceed with reclaim */
} while (...)

This should achieve all the goals right?

1. No restarting from the top, unless we have traversed the entire hierarchy.

2. No skipping over zswap_next_shrink updated by the memcg offline cleaner.

3. No leaving offlined zswap_next_shrink hanging (and blocking memcg
killing indefinitely).

4. No long running loop unnecessarily. If you want to be extra safe,
we can put a max number of retries on the offline memcg case too (and
restart from the top).

5. At the end of the inner loop, you are guaranteed to hold at least
one reference to memcg (and perhaps 2, if zswap_next_shrink is not
updated by the cleaner).

5. At the end of the inner loop, the selected memcg is known to not
have its cleaner started yet (since it is online with zswap_pools_lock
held). Our selection will undo the cleaner and hold the memcg hostage
forever.

Is there anything that I'm missing? We are not dropping the lock for
the entirety of the inner loop, but honestly I'm fine with this trade
off, for the sake of simplicity :)

If you want it to be even more readable, feel free to refactor the
inner loop into a helper function (but it's probably redundant since
it's only used once).