On Fri, Aug 18, 2023 at 3:19 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Fri, Aug 18, 2023 at 11:44:45AM -0700, Yosry Ahmed wrote: > > On Fri, Aug 18, 2023 at 11:35 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > > > > > On Fri, Aug 18, 2023 at 10:45:56AM -0700, Yosry Ahmed wrote: > > > > On Fri, Aug 18, 2023 at 10:35 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > > > > On Fri, Aug 18, 2023 at 07:56:37AM -0700, Yosry Ahmed wrote: > > > > > > If this happens it seems possible for this to happen: > > > > > > > > > > > > cpu #1 cpu#2 > > > > > > css_put() > > > > > > /* css_free_rwork_fn is queued */ > > > > > > rcu_read_lock() > > > > > > mem_cgroup_from_id() > > > > > > mem_cgroup_id_remove() > > > > > > /* access memcg */ > > > > > > > > > > I don't quite see how that'd possible. IDR uses rcu_assign_pointer() > > > > > during deletion, which inserts the necessary barriering. My > > > > > understanding is that this should always be safe: > > > > > > > > > > rcu_read_lock() (writer serialization, in this case ref count == 0) > > > > > foo = idr_find(x) idr_remove(x) > > > > > if (foo) kfree_rcu(foo) > > > > > LOAD(foo->bar) > > > > > rcu_read_unlock() > > > > > > > > How does a barrier inside IDR removal protect against the memcg being > > > > freed here though? > > > > > > > > If css_put() is executed out-of-order before mem_cgroup_id_remove(), > > > > the memcg can be freed even before mem_cgroup_id_remove() is called, > > > > right? > > > > > > css_put() can start earlier, but it's not allowed to reorder the rcu > > > callback that frees past the rcu_assign_pointer() in idr_remove(). > > > > > > This is what RCU and its access primitives guarantees. It ensures that > > > after "unpublishing" the pointer, all concurrent RCU-protected > > > accesses to the object have finished, and the memory can be freed. > > > > I am not sure I understand, this is the scenario I mean: > > > > cpu#1 cpu#2 cpu#3 > > css_put() > > /* schedule free */ > > rcu_read_lock() > > idr_remove() > > mem_cgroup_from_id() > > > > /* free memcg */ > > /* use memcg */ > > idr_remove() cannot be re-ordered after scheduling the free. Think > about it, this is the common rcu-freeing pattern: > > rcu_assign_pointer(p, NULL); > call_rcu(rh, free_pointee); > > on the write side, and: > > rcu_read_lock(); > pointee = rcu_dereference(p); > if (pointee) > do_stuff(pointee); > rcu_read_unlock(); > > on the read side. > > In our case, the rcu_assign_pointer() is in idr_remove(). And the > rcu_dereference() is in mem_cgroup_from_id() -> idr_find() -> > radix_tree_lookup() -> radix_tree_descend(). > > So if we find the memcg in the idr under rcu lock, the cgroup rcu work > is guaranteed to not run until the lock is dropped. If we don't find > it, it may or may not have already run. Yeah I missed the implicit barrier there, thanks for bearing with me. I think Shakeel might have found the actual problem here (see his response).