Re: [PATCH v2 1/2] zswap: implement a second chance algorithm for dynamic zswap shrinker

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Mon, 5 Aug 2024 16:58:04 -0700

[..]
> > > @@ -1167,25 +1189,6 @@ static unsigned long zswap_shrinker_scan(struct shrinker *shrinker,
> > >                 return SHRINK_STOP;
> > >         }
> > >
> > > -       nr_protected =
> > > -               atomic_long_read(&lruvec->zswap_lruvec_state.nr_zswap_protected);
> > > -       lru_size = list_lru_shrink_count(&zswap_list_lru, sc);
> > > -
> > > -       /*
> > > -        * Abort if we are shrinking into the protected region.
> > > -        *
> > > -        * This short-circuiting is necessary because if we have too many multiple
> > > -        * concurrent reclaimers getting the freeable zswap object counts at the
> > > -        * same time (before any of them made reasonable progress), the total
> > > -        * number of reclaimed objects might be more than the number of unprotected
> > > -        * objects (i.e the reclaimers will reclaim into the protected area of the
> > > -        * zswap LRU).
> > > -        */
> > > -       if (nr_protected >= lru_size - sc->nr_to_scan) {
> > > -               sc->nr_scanned = 0;
> > > -               return SHRINK_STOP;
> > > -       }
> > > -
> >
> > Do we need a similar mechanism to protect against concurrent shrinkers
> > quickly consuming nr_swapins?
>
> Not for nr_swapins consumption per se, and the original reason why I
> included this (racy) check is just so that concurrent reclaimers do
> not disrespect the protection scheme. We had no guarantee that we
> wouldn't just reclaim into the protected region (well even with this
> racy check technically). With the second chance scheme, a "protected"
> page (i.e with its referenced bit set) would not be reclaimed right
> away - a shrinker encountering it would have to "age" it first (by
> unsetting the referenced bit), so the intended protection is enforced.
>
> That said, I do believe we need a mechanism to limit the concurrency
> here. The amount of pages aged/reclaimed should scale (linearly?
> proportionally?) with the reclaim pressure, i.e more reclaimers ==
> more pages reclaimed/aged, so the current behavior is desired.
> However, at some point, if we have more shrinkers than there are work
> assigned to each of them, we might be unnecessarily wasting resources
> (and potentially building up the nr_deferred counter that we discussed
> in v1 of the patch series). Additionally, we might be overshrinking in
> a very short amount of time, without letting the system have the
> chance to react and provide feedback (through swapins/refaults) to the
> memory reclaimers.
>
> But let's do this as a follow-up work :) It seems orthogonal to what
> we have here.

Agreed, as long as the data shows we don't regress by removing this
part I am fine with doing this as a follow-up work.

>
> > > -        * Subtract the lru size by an estimate of the number of pages
> > > -        * that should be protected.
> > > +        * Subtract the lru size by the number of pages that are recently swapped
> >
> > nit: I don't think "subtract by" is correct, it's usually "subtract
> > from". So maybe "Subtract the number of pages that are recently
> > swapped in from the lru size"? Also, should we remain consistent about
> > mentioning that these are disk swapins throughout all the comments to
> > keep things clear?
>
> Yeah I should be clearer here - it should be swapped in from disk, or
> more generally (accurately?) swapped in from the backing swap device
> (but the latter can change once we decoupled swap from zswap). Or
> maybe swapped in from the secondary tier?
>
> Let's just not overthink and go with swapped in from disk for now :)

Agreed :)

I will take a look at the new version soon, thanks for working on this.