On Thu, Aug 8, 2024 at 9:58 AM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Thu, Aug 8, 2024 at 3:20 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > > > On 08/07, Andrii Nakryiko wrote: > > > > > > struct uprobe { > > > - struct rb_node rb_node; /* node in the rb tree */ > > > + union { > > > + struct rb_node rb_node; /* node in the rb tree */ > > > + struct rcu_head rcu; /* mutually exclusive with rb_node */ > > > > Andrii, I am sorry. > > > > I suggested this in reply to 3/8 before I read > > [PATCH 7/8] uprobes: perform lockless SRCU-protected uprobes_tree lookup > > > > I have no idea if rb_erase() is rcu-safe or not, but this union certainly > > doesn't look right if we use rb_find_rcu/etc. > > > > Ah, because put_uprobe() might be fast enough to remove uprobe from > the tree, process delayed_uprobe_remove() and then enqueue > uprobe_free_rcu() callback (which would use rcu field here, > overwriting rb_node), while we are still doing a lockless lookup, > finding this overwritten rb_node . Good catch, if that's the case (and > I'm testing all this right now), then it's an easy fix. > > It would also explain why I initially didn't get any crashes for > lockless RB-tree lookup with uprobe-stress (I was really surprised > that I "missed" the crash initially). > > Thanks! I can confirm that the crash went away. Previously it was crashing after a few minutes, but now it's running for almost an hour with no problem. Phew, I was worried there for a bit, but it seems like we are back to the "everything is fine" state. Okay, I'll incorporate this fix and synchronize_srcu() locally, will give it a few more days, maybe Peter will want to take another look. Will send a new revision early next week. > > > > Yes, this version doesn't include the SRCU-protected uprobes_tree changes, > > but still... > > > > Oleg. > >