[..] > > > static void shrink_worker(struct work_struct *w) > > > { > > > struct zswap_pool *pool = container_of(w, typeof(*pool), > > > shrink_work); > > > + struct mem_cgroup *memcg; > > > int ret, failures = 0; > > > > > > + /* global reclaim will select cgroup in a round-robin fashion. */ > > > do { > > > - ret = zswap_reclaim_entry(pool); > > > - if (ret) { > > > - zswap_reject_reclaim_fail++; > > > - if (ret != -EAGAIN) > > > + spin_lock(&zswap_pools_lock); > > > + pool->next_shrink = mem_cgroup_iter(NULL, pool->next_shrink, NULL); > > > + memcg = pool->next_shrink; > > > + > > > + /* > > > + * We need to retry if we have gone through a full round trip, or if we > > > + * got an offline memcg (or else we risk undoing the effect of the > > > + * zswap memcg offlining cleanup callback). This is not catastrophic > > > + * per se, but it will keep the now offlined memcg hostage for a while. > > > + * > > > + * Note that if we got an online memcg, we will keep the extra > > > + * reference in case the original reference obtained by mem_cgroup_iter > > > + * is dropped by the zswap memcg offlining callback, ensuring that the > > > + * memcg is not killed when we are reclaiming. > > > + */ > > > + if (!memcg) { > > > + spin_unlock(&zswap_pools_lock); > > > + if (++failures == MAX_RECLAIM_RETRIES) > > > break; > > > + > > > + goto resched; > > > + } > > > + > > > + if (!mem_cgroup_online(memcg)) { > > > + /* drop the reference from mem_cgroup_iter() */ > > > + mem_cgroup_put(memcg); > > > > Probably better to use mem_cgroup_iter_break() here? > > mem_cgroup_iter_break(NULL, memcg) seems to perform the same thing, right? Yes, but it's better to break the iteration with the documented API (e.g. if mem_cgroup_iter_break() changes to do extra work). > > > > > Also, I don't see mem_cgroup_tryget_online() being used here (where I > > expected it to be used), did I miss it? > > Oh shoot yeah that was a typo - it should be > mem_cgroup_tryget_online(). Let me send a fix to that. > > > > > > + pool->next_shrink = NULL; > > > + spin_unlock(&zswap_pools_lock); > > > + > > > if (++failures == MAX_RECLAIM_RETRIES) > > > break; > > > + > > > + goto resched; > > > } > > > + spin_unlock(&zswap_pools_lock); > > > + > > > + ret = shrink_memcg(memcg); > > > > We just checked for online-ness above, and then shrink_memcg() checks > > it again. Is this intentional? > > Hmm these two checks are for two different purposes. The check above > is mainly to prevent accidentally undoing the offline cleanup callback > during memcg selection step. Inside shrink_memcg(), we check > onlineness again to prevent reclaiming from offlined memcgs - which in > effect will trigger the reclaim of the parent's memcg. Right, but two checks in close proximity are not doing a lot. Especially that the memcg online-ness can change right after the check inside shrink_memcg() anyway, so it's a best effort thing. Anyway, it shouldn't matter much. We can leave it. > > > > > > + /* drop the extra reference */ > > > > Where does the extra reference come from? > > The extra reference is from mem_cgroup_tryget_online(). We get two > references in the dance above - one from mem_cgroup_iter() (which can > be dropped) and one extra from mem_cgroup_tryget_online(). I kept the > second one in case the first one was dropped by the zswap memcg > offlining callback, but after reclaiming it is safe to just drop it. Right. I was confused by the missing mem_cgroup_tryget_online(). > > > > > > + mem_cgroup_put(memcg); > > > + > > > + if (ret == -EINVAL) > > > + break; > > > + if (ret && ++failures == MAX_RECLAIM_RETRIES) > > > + break; > > > + > > > +resched: > > > cond_resched(); > > > } while (!zswap_can_accept()); > > > - zswap_pool_put(pool); > > > } > > > > > > static struct zswap_pool *zswap_pool_create(char *type, char *compressor) [..] > > > @@ -1240,15 +1395,15 @@ bool zswap_store(struct folio *folio) > > > zswap_invalidate_entry(tree, dupentry); > > > } > > > spin_unlock(&tree->lock); > > > - > > > - /* > > > - * XXX: zswap reclaim does not work with cgroups yet. Without a > > > - * cgroup-aware entry LRU, we will push out entries system-wide based on > > > - * local cgroup limits. > > > - */ > > > objcg = get_obj_cgroup_from_folio(folio); > > > - if (objcg && !obj_cgroup_may_zswap(objcg)) > > > - goto reject; > > > + if (objcg && !obj_cgroup_may_zswap(objcg)) { > > > + memcg = get_mem_cgroup_from_objcg(objcg); > > > > Do we need a reference here? IIUC, this is folio_memcg() and the folio > > is locked, so folio_memcg() should remain stable, no? > > Hmmm obj_cgroup_may_zswap() also holds a reference to the objcg's > memcg, so I just followed the patterns to be safe. Perhaps it's less clear inside obj_cgroup_may_zswap(). We can actually pass the folio to obj_cgroup_may_zswap(), add a debug check that the folio is locked, and avoid getting the ref there as well. That can be done separately. Perhaps Johannes can shed some light on this, if there's a different reason why getting a ref there is needed. For this change, I think the refcount manipulation is unnecessary. > > > > > > Same for the call below. > > > > > + if (shrink_memcg(memcg)) { > > > + mem_cgroup_put(memcg); > > > + goto reject; > > > + } > > > + mem_cgroup_put(memcg); > > > + } > > > > > > /* reclaim space if needed */ > > > if (zswap_is_full()) { [..]