On Thu, Aug 8, 2024 at 7:29 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > On 08/07, Andrii Nakryiko wrote: > > > > So, any ideas how we can end up with "corrupted" root on lockless > > lookup with rb_find_rcu()? > > I certainly can't help ;) I know ABSOLUTELY NOTHING about rb or any > other tree. > > But, > > > This seems to be the very first lockless > > RB-tree lookup use case in the tree, > > Well, latch_tree_find() is supposed to be rcu-safe afaics, and > __lt_erase() is just rb_erase(). So it is not the 1st use case. > > See also the "Notes on lockless lookups" comment in lib/rbtree.c. > > So it seems that rb_erase() is supposed to be rcu-safe. However > it uses __rb_change_child(), not __rb_change_child_rcu(). > While trying to mitigate the crash locally I noticed __rb_change_child() and changed manually all the cases to __rb_change_child_rcu(). That didn't help :) But I think your guess about sharing rcu and rb_node is the right now, so hopefully that will solve the issue. > Not that I think this can explain the problem, and on x86 > __smp_store_release() is just WRITE_ONCE, but looks confusing... > > Oleg. >