On Sat, Oct 01, 2022 at 03:38:43PM +0300, Alexander Fedorov wrote: > On 30.09.2022 21:26, Roman Gushchin wrote: > > On Fri, Sep 30, 2022 at 02:06:48PM +0000, Alexander Fedorov wrote: > >> 1) First CPU: > >> css_killed_work_fn() -> mem_cgroup_css_offline() -> > >> drain_all_stock() -> obj_stock_flush_required() > >> if (stock->cached_objcg) { > >> > >> This check sees a non-NULL pointer for *another* CPU's `memcg_stock` > >> instance. > >> > >> 2) Second CPU: > >> css_free_rwork_fn() -> __mem_cgroup_free() -> free_percpu() -> > >> obj_cgroup_uncharge() -> drain_obj_stock() > >> It frees `cached_objcg` pointer in its own `memcg_stock` instance: > >> struct obj_cgroup *old = stock->cached_objcg; > >> < ... > > >> obj_cgroup_put(old); > >> stock->cached_objcg = NULL; > >> > >> 3) First CPU continues after the 'if' check and re-reads the pointer > >> again, now it is NULL and dereferencing it leads to kernel panic: > >> static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, > >> struct mem_cgroup *root_memcg) > >> { > >> < ... > > >> if (stock->cached_objcg) { > >> memcg = obj_cgroup_memcg(stock->cached_objcg); > > > > Great catch! > > > > I'm not sure about switching to rcu primitives though. In all other cases > > stock->cached_objcg is accessed only from a local cpu, so using rcu_* > > function is an overkill. > > > > How's something about this? (completely untested) > > Tested READ_ONCE() patch and it works. Thank you! > But are rcu primitives an overkill? > For me they are documenting how actually complex is synchronization here. I agree, however rcu primitives will add unnecessary barriers on hot paths. In this particular case most accesses to stock->cached_objcg are done from a local cpu, so no rcu primitives are needed. So in my opinion using a READ_ONCE() is preferred. Thanks!