On Wed, Aug 7, 2024 at 6:30 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > On 07/31, Andrii Nakryiko wrote: > > > > Andrii Nakryiko (6): > > uprobes: revamp uprobe refcounting and lifetime management > > uprobes: protected uprobe lifetime with SRCU > > uprobes: get rid of enum uprobe_filter_ctx in uprobe filter callbacks > > uprobes: travers uprobe's consumer list locklessly under SRCU > > protection > > uprobes: perform lockless SRCU-protected uprobes_tree lookup > > uprobes: switch to RCU Tasks Trace flavor for better performance > > > > Peter Zijlstra (2): > > rbtree: provide rb_find_rcu() / rb_find_add_rcu() > > perf/uprobe: split uprobe_unregister() > > I see nothing wrong in 1-7. LGTM. > So, I don't know how it slipped the first time, because I tested overnight with uprobe-stress, perhaps I adjusted some parameters (more threads or different ratio of threads, not sure by now), but it turns out that lockless RB-tree traversal actually crashes after a few minutes of running uprobe-stress. I'll post details separately later today, but I suspect that rb_find_rcu() and rb_find_add_rcu() is not enough to make it safe. Hopefully someone can help me figure this out. > But since you are going to send the new version, I'd like to apply V2 > and then try to re-check the resulting code. Yes, I was waiting for more of Peter's comments, but I guess I'll just send a v2 today. I'll probably include the SRCU+timeout logic for return_instances, and maybe lockless VMA parts as well. Those won't be yet for landing, but it's probably useful to see everything end-to-end. Given the hiccup with lockless uprobes_tree lookup, we should land that change, but the first 4 patches hopefully are in good enough shape to apply and reduce the amount of patches that need to be juggled around. > > As for 8/8 - I leave it to you and Peter. I'd prefer SRCU though ;) Honestly curious, why the preference? > > Oleg. > BTW, while you are here :) What can you say about current->sighand->siglock use in handle_singlestep()? Is there any way to avoid that (or at least not have to take it every single time a single-stepped uprobe is triggered?). This turned out to be a huge issue for single-stepped uprobe when scaling to multiple CPUs even with all the other things (all the SRCU and lockless VMA lookup) taken care of.