2024年7月23日(火) 6:51 Nhat Pham <nphamcs@xxxxxxxxx>: > > On Fri, Jul 19, 2024 at 9:41 PM Takero Funaki <flintglass@xxxxxxxxx> wrote: > > > > This patch fixes zswap global shrinker that did not shrink zpool as > > expected. > > > > The issue it addresses is that `shrink_worker()` did not distinguish > > between unexpected errors and expected error codes that should be > > skipped, such as when there is no stored page in a memcg. This led to > > the shrinking process being aborted on the expected error codes. > > The code itself seems reasonable to me, but may I ask you to document > (as a comment) all the expected v.s unexpected cases? i.e when do we > increment (or not increment) the failure counter? > In addition to changes in the commit log suggested by Yosry, adding some comments specifying what memcg is (not) candidates for writeback, and what should be a failure. - /* global reclaim will select cgroup in a round-robin fashion. + /* + * Global reclaim will select cgroup in a round-robin fashion from all + * online memcgs, but memcgs that have no pages in zswap and + * writeback-disabled memcgs (memory.zswap.writeback=0) are not + * candidates for shrinking. + * + * Shrinking will be aborted if we encounter the following + * MAX_RECLAIM_RETRIES times: + * - No writeback-candidate memcgs found in a memcg tree walk. + * - Shrinking a writeback-candidate memcg failed. * * We save iteration cursor memcg into zswap_next_shrink, * which can be modified by the offline memcg cleaner and, the reasons to (not) increment the progress: @@ -1387,10 +1407,20 @@ static void shrink_worker(struct work_struct *w) /* drop the extra reference */ mem_cgroup_put(memcg); - if (ret == -EINVAL) - break; + /* + * There are no writeback-candidate pages in the memcg. + * This is not an issue as long as we can find another memcg + * with pages in zswap. Skip this without incrementing progress + * and failures. + */ + if (ret == -ENOENT) + continue; + if (ret && ++failures == MAX_RECLAIM_RETRIES) break; + + /* completed writeback or incremented failures */ + ++progress; resched: > My understanding is, we only increment the failure counter if we fail > to reclaim from a selected memcg that is non-empty and > writeback-enabled, or if we go a full tree walk without making any > progress. Is this correct? > Yes, that's the expected behavior. Please let me know if there is more appropriate wording. Thanks.