> On Nov 2, 2023, at 01:44, Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > On Tue, Oct 31, 2023 at 8:07 PM Muchun Song <muchun.song@xxxxxxxxx> wrote: >> >> >> >>> On Nov 1, 2023, at 09:26, Nhat Pham <nphamcs@xxxxxxxxx> wrote: >>> >>> cc-ing Johannes, Roman, Shakeel, Muchun since you all know much more >>> about memory controller + list_lru reparenting logic than me. >>> >>> There seems to be a race between memcg offlining and zswap’s >>> cgroup-aware LRU implementation: >>> >>> CPU0 CPU1 >>> zswap_lru_add() mem_cgroup_css_offline() >>> get_mem_cgroup_from_objcg() >>> memcg_offline_kmem() >>> memcg_reparent_objcgs() >>> memcg_reparent_list_lrus() >>> memcg_reparent_list_lru() >>> memcg_reparent_list_lru_node() >>> list_lru_add() >>> memcg_list_lru_free() >>> >>> >>> Essentially: on CPU0, zswap gets the memcg from the entry's objcg >>> (before the objcgs are reparented). Then it performs list_lru_add() >>> after the list_lru entries reparenting (memcg_reparent_list_lru_node()) >>> step. If the list_lru of the memcg being offlined has not been freed >>> (i.e before the memcg_list_lru_free() call), then the list_lru_add() >>> call would succeed - but the list will be freed soon after. The new >> >> No worries. list_lru_add() will add the object to the lru list of >> the parent of the memcg being offlined, because the ->kmemcg_id of the >> memcg being offlined will be changed to its parent's ->kmemcg_id before memcg_reparent_list_lru(). >> > > Ohhh that is subtle. Thanks for pointing this out, Muchun! > > In that case, I think Yosry is right after all! We don't even need to get > a reference to the memcg: > > rcu_read_lock(); > memcg = obj_cgroup_memcg(objcg); > list_lru_add(); > rcu_read_unlock(); > > As long as we're inside this rcu section, we're guaranteed to get > an un-freed memcg. Now it could be offlined etc., but as Muchun has Right. Thanks. > pointed out, the list_lru_add() call will still does the right thing - it will > either add the new entry to the parent list if this happens after the > kmemcg_id update, or the child list before the list_lru reparenting > action. Both of these scenarios are fine. > >> Muchun, >> Thanks >> >>> zswap entry as a result will not be subjected to future reclaim >>> attempt. IOW, this list_lru_add() call is effectively swallowed. And >>> worse, there might be a crash when we invalidate the zswap_entry in the >>> future (which will perform a list_lru removal). >>> >>> Within get_mem_cgroup_from_objcg(), none of the following seem >>> sufficient to prevent this race: >>> >>> 1. Perform the objcg-to-memcg lookup inside a rcu_read_lock() >>> section. >>> 2. Checking if the memcg is freed yet (with css_tryget()) (what >>> we're currently doing in this patch series). >>> 3. Checking if the memcg is still online (with css_tryget_online()) >>> The memcg can still be offlined down the line. >>> >>> >>> I've discussed this privately with Johannes, and it seems like the >>> cleanest solution here is to move the reparenting logic down to release >>> stage. That way, when get_mem_cgroup_from_objcg() returns, >>> zswap_lru_add() is given an memcg that is reparenting-safe (until we >>> drop the obtained reference).