> On Nov 1, 2023, at 09:26, Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > cc-ing Johannes, Roman, Shakeel, Muchun since you all know much more > about memory controller + list_lru reparenting logic than me. > > There seems to be a race between memcg offlining and zswap’s > cgroup-aware LRU implementation: > > CPU0 CPU1 > zswap_lru_add() mem_cgroup_css_offline() > get_mem_cgroup_from_objcg() > memcg_offline_kmem() > memcg_reparent_objcgs() > memcg_reparent_list_lrus() > memcg_reparent_list_lru() > memcg_reparent_list_lru_node() > list_lru_add() > memcg_list_lru_free() > > > Essentially: on CPU0, zswap gets the memcg from the entry's objcg > (before the objcgs are reparented). Then it performs list_lru_add() > after the list_lru entries reparenting (memcg_reparent_list_lru_node()) > step. If the list_lru of the memcg being offlined has not been freed > (i.e before the memcg_list_lru_free() call), then the list_lru_add() > call would succeed - but the list will be freed soon after. The new No worries. list_lru_add() will add the object to the lru list of the parent of the memcg being offlined, because the ->kmemcg_id of the memcg being offlined will be changed to its parent's ->kmemcg_id before memcg_reparent_list_lru(). Muchun, Thanks > zswap entry as a result will not be subjected to future reclaim > attempt. IOW, this list_lru_add() call is effectively swallowed. And > worse, there might be a crash when we invalidate the zswap_entry in the > future (which will perform a list_lru removal). > > Within get_mem_cgroup_from_objcg(), none of the following seem > sufficient to prevent this race: > > 1. Perform the objcg-to-memcg lookup inside a rcu_read_lock() > section. > 2. Checking if the memcg is freed yet (with css_tryget()) (what > we're currently doing in this patch series). > 3. Checking if the memcg is still online (with css_tryget_online()) > The memcg can still be offlined down the line. > > > I've discussed this privately with Johannes, and it seems like the > cleanest solution here is to move the reparenting logic down to release > stage. That way, when get_mem_cgroup_from_objcg() returns, > zswap_lru_add() is given an memcg that is reparenting-safe (until we > drop the obtained reference).