[..] > > > @@ -1167,25 +1189,6 @@ static unsigned long zswap_shrinker_scan(struct shrinker *shrinker, > > > return SHRINK_STOP; > > > } > > > > > > - nr_protected = > > > - atomic_long_read(&lruvec->zswap_lruvec_state.nr_zswap_protected); > > > - lru_size = list_lru_shrink_count(&zswap_list_lru, sc); > > > - > > > - /* > > > - * Abort if we are shrinking into the protected region. > > > - * > > > - * This short-circuiting is necessary because if we have too many multiple > > > - * concurrent reclaimers getting the freeable zswap object counts at the > > > - * same time (before any of them made reasonable progress), the total > > > - * number of reclaimed objects might be more than the number of unprotected > > > - * objects (i.e the reclaimers will reclaim into the protected area of the > > > - * zswap LRU). > > > - */ > > > - if (nr_protected >= lru_size - sc->nr_to_scan) { > > > - sc->nr_scanned = 0; > > > - return SHRINK_STOP; > > > - } > > > - > > > > Do we need a similar mechanism to protect against concurrent shrinkers > > quickly consuming nr_swapins? > > Not for nr_swapins consumption per se, and the original reason why I > included this (racy) check is just so that concurrent reclaimers do > not disrespect the protection scheme. We had no guarantee that we > wouldn't just reclaim into the protected region (well even with this > racy check technically). With the second chance scheme, a "protected" > page (i.e with its referenced bit set) would not be reclaimed right > away - a shrinker encountering it would have to "age" it first (by > unsetting the referenced bit), so the intended protection is enforced. > > That said, I do believe we need a mechanism to limit the concurrency > here. The amount of pages aged/reclaimed should scale (linearly? > proportionally?) with the reclaim pressure, i.e more reclaimers == > more pages reclaimed/aged, so the current behavior is desired. > However, at some point, if we have more shrinkers than there are work > assigned to each of them, we might be unnecessarily wasting resources > (and potentially building up the nr_deferred counter that we discussed > in v1 of the patch series). Additionally, we might be overshrinking in > a very short amount of time, without letting the system have the > chance to react and provide feedback (through swapins/refaults) to the > memory reclaimers. > > But let's do this as a follow-up work :) It seems orthogonal to what > we have here. Agreed, as long as the data shows we don't regress by removing this part I am fine with doing this as a follow-up work. > > > > - * Subtract the lru size by an estimate of the number of pages > > > - * that should be protected. > > > + * Subtract the lru size by the number of pages that are recently swapped > > > > nit: I don't think "subtract by" is correct, it's usually "subtract > > from". So maybe "Subtract the number of pages that are recently > > swapped in from the lru size"? Also, should we remain consistent about > > mentioning that these are disk swapins throughout all the comments to > > keep things clear? > > Yeah I should be clearer here - it should be swapped in from disk, or > more generally (accurately?) swapped in from the backing swap device > (but the latter can change once we decoupled swap from zswap). Or > maybe swapped in from the secondary tier? > > Let's just not overthink and go with swapped in from disk for now :) Agreed :) I will take a look at the new version soon, thanks for working on this.