On Tue, Oct 31, 2023 at 8:07 PM Muchun Song <muchun.song@xxxxxxxxx> wrote: > > > > > On Nov 1, 2023, at 09:26, Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > > > cc-ing Johannes, Roman, Shakeel, Muchun since you all know much more > > about memory controller + list_lru reparenting logic than me. > > > > There seems to be a race between memcg offlining and zswap’s > > cgroup-aware LRU implementation: > > > > CPU0 CPU1 > > zswap_lru_add() mem_cgroup_css_offline() > > get_mem_cgroup_from_objcg() > > memcg_offline_kmem() > > memcg_reparent_objcgs() > > memcg_reparent_list_lrus() > > memcg_reparent_list_lru() > > memcg_reparent_list_lru_node() > > list_lru_add() > > memcg_list_lru_free() > > > > > > Essentially: on CPU0, zswap gets the memcg from the entry's objcg > > (before the objcgs are reparented). Then it performs list_lru_add() > > after the list_lru entries reparenting (memcg_reparent_list_lru_node()) > > step. If the list_lru of the memcg being offlined has not been freed > > (i.e before the memcg_list_lru_free() call), then the list_lru_add() > > call would succeed - but the list will be freed soon after. The new > > No worries. list_lru_add() will add the object to the lru list of > the parent of the memcg being offlined, because the ->kmemcg_id of the > memcg being offlined will be changed to its parent's ->kmemcg_id before memcg_reparent_list_lru(). > Ohhh that is subtle. Thanks for pointing this out, Muchun! In that case, I think Yosry is right after all! We don't even need to get a reference to the memcg: rcu_read_lock(); memcg = obj_cgroup_memcg(objcg); list_lru_add(); rcu_read_unlock(); As long as we're inside this rcu section, we're guaranteed to get an un-freed memcg. Now it could be offlined etc., but as Muchun has pointed out, the list_lru_add() call will still does the right thing - it will either add the new entry to the parent list if this happens after the kmemcg_id update, or the child list before the list_lru reparenting action. Both of these scenarios are fine. > Muchun, > Thanks > > > zswap entry as a result will not be subjected to future reclaim > > attempt. IOW, this list_lru_add() call is effectively swallowed. And > > worse, there might be a crash when we invalidate the zswap_entry in the > > future (which will perform a list_lru removal). > > > > Within get_mem_cgroup_from_objcg(), none of the following seem > > sufficient to prevent this race: > > > > 1. Perform the objcg-to-memcg lookup inside a rcu_read_lock() > > section. > > 2. Checking if the memcg is freed yet (with css_tryget()) (what > > we're currently doing in this patch series). > > 3. Checking if the memcg is still online (with css_tryget_online()) > > The memcg can still be offlined down the line. > > > > > > I've discussed this privately with Johannes, and it seems like the > > cleanest solution here is to move the reparenting logic down to release > > stage. That way, when get_mem_cgroup_from_objcg() returns, > > zswap_lru_add() is given an memcg that is reparenting-safe (until we > > drop the obtained reference). >