On Wed, Mar 13, 2024 at 6:20 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > I forgot everything about this code, plus it has changed a lot since > I looked at it many years ago, but ... > > I think this change is fine but the changelog looks a bit confusing > (overcomplicated) to me. It's a new piece of code and logic, so I tried to do my due diligence and argue why I think it's fine. I'll drop the overcomplicated explanation, as I agree with you that it's inherently racy even without my changes (and use-after-free safety is provided with uprobe->register_rwsem independent from all this). > > On 03/12, Andrii Nakryiko wrote: > > > > This patch adds a speculative check before grabbing that rwlock. If > > nr_systemwide is non-zero, lock is skipped and event is passed through. > > From examining existing logic it looks correct and safe to do. If > > nr_systemwide is being modified under rwlock in parallel, we have to > > consider basically just one important race condition: the case when > > nr_systemwide is dropped from one to zero (from > > trace_uprobe_filter_remove()) under filter->rwlock, but > > uprobe_perf_filter() raced and saw it as >0. > > Unless I am totally confused, there is nothing new. Even without > this change trace_uprobe_filter_remove() can clear nr_systemwide > right after uprobe_perf_filter() drops filter->rwlock. > > And of course, trace_uprobe_filter_add() can change nr_systemwide > from 0 to 1. In this case uprobe_perf_func() can "wrongly" return > UPROBE_HANDLER_REMOVE but we can't avoid this and afaics this is > fine even if handler_chain() does unapply_uprobe(), uprobe_perf_open() > will do uprobe_apply() after that, we can rely on ->register_rwsem. > yep, agreed > > In case we speculatively read nr_systemwide as zero, while it was > > incremented in parallel, we'll proceed to grabbing filter->rwlock and > > re-doing the check, this time in lock-protected and non-racy way. > > See above... > > > So I think uprobe_perf_filter() needs filter->rwlock only to iterate > the list, it can check nr_systemwide lockless and this means that you > can also remove the same check in __uprobe_perf_filter(), other callers > trace_uprobe_filter_add/remove check it themselves. > makes sense, will do > > > --- a/kernel/trace/trace_uprobe.c > > +++ b/kernel/trace/trace_uprobe.c > > @@ -1351,6 +1351,10 @@ static bool uprobe_perf_filter(struct uprobe_consumer *uc, > > tu = container_of(uc, struct trace_uprobe, consumer); > > filter = tu->tp.event->filter; > > > > + /* speculative check */ > > + if (READ_ONCE(filter->nr_systemwide)) > > + return true; > > + > > read_lock(&filter->rwlock); > > ret = __uprobe_perf_filter(filter, mm); > > read_unlock(&filter->rwlock); > > ACK, > > but see above. I think the changelog should be simplified and the > filter->nr_systemwide check in __uprobe_perf_filter() should be > removed. But I won't insist and perhaps I missed something... > I think you are right, I'll move the check > Oleg. >