On Tue, Oct 31, 2023 at 6:33 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Tue, Oct 31, 2023 at 6:26 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > > > cc-ing Johannes, Roman, Shakeel, Muchun since you all know much more > > about memory controller + list_lru reparenting logic than me. > > > > There seems to be a race between memcg offlining and zswap’s > > cgroup-aware LRU implementation: > > > > CPU0 CPU1 > > zswap_lru_add() mem_cgroup_css_offline() > > get_mem_cgroup_from_objcg() > > memcg_offline_kmem() > > memcg_reparent_objcgs() > > memcg_reparent_list_lrus() > > memcg_reparent_list_lru() > > memcg_reparent_list_lru_node() > > list_lru_add() > > memcg_list_lru_free() > > > > > > Essentially: on CPU0, zswap gets the memcg from the entry's objcg > > (before the objcgs are reparented). Then it performs list_lru_add() > > after the list_lru entries reparenting (memcg_reparent_list_lru_node()) > > step. If the list_lru of the memcg being offlined has not been freed > > (i.e before the memcg_list_lru_free() call), then the list_lru_add() > > call would succeed - but the list will be freed soon after. The new > > zswap entry as a result will not be subjected to future reclaim > > attempt. IOW, this list_lru_add() call is effectively swallowed. And > > worse, there might be a crash when we invalidate the zswap_entry in the > > future (which will perform a list_lru removal). > > > > Within get_mem_cgroup_from_objcg(), none of the following seem > > sufficient to prevent this race: > > > > 1. Perform the objcg-to-memcg lookup inside a rcu_read_lock() > > section. > > 2. Checking if the memcg is freed yet (with css_tryget()) (what > > we're currently doing in this patch series). > > 3. Checking if the memcg is still online (with css_tryget_online()) > > The memcg can still be offlined down the line. > > > > > > I've discussed this privately with Johannes, and it seems like the > > cleanest solution here is to move the reparenting logic down to release > > stage. That way, when get_mem_cgroup_from_objcg() returns, > > zswap_lru_add() is given an memcg that is reparenting-safe (until we > > drop the obtained reference). > > The objcgs hold refs to the memcg, which are dropped during > reparenting. How can we do reparenting in the release stage, which > IIUC happens after all refs are dropped? Oh I meant just the list_lru reparenting. The list_lru themselves don't hold any ref I believe, right? Then it's safe to perform this at the release stage. (also, I think I might have messed up the encoding for the email above. Let me know if people cannot view it, and I'll resend it :( )