On Sat, Oct 26, 2024 at 8:48 AM Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > > The grace period used internally within tracepoint.c:release_probes() > uses call_rcu() to batch waiting for quiescence of old probe arrays, > rather than using the tracepoint_synchronize_unregister() which blocks > while waiting for quiescence. > > With the introduction of faultable syscall tracepoints, this causes > use-after-free issues reproduced with syzkaller. > > Fix this by using the appropriate call_rcu() or call_rcu_tasks_trace() > before invoking the rcu_free_old_probes callback. This can be chosen > using the tracepoint_is_syscall() API. > > A similar issue exists in bpf use of call_rcu(). Fixing this is left to > a separate change. > > Reported-by: syzbot+b390c8062d8387b6272a@xxxxxxxxxxxxxxxxxxxxxxxxx > Fixes: a363d27cdbc2 ("tracing: Allow system call tracepoints to handle page faults") > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > Cc: Michael Jeanson <mjeanson@xxxxxxxxxxxx> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> > Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Alexei Starovoitov <ast@xxxxxxxxxx> > Cc: Yonghong Song <yhs@xxxxxx> > Cc: Paul E. McKenney <paulmck@xxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> > Cc: Mark Rutland <mark.rutland@xxxxxxx> > Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx> > Cc: Namhyung Kim <namhyung@xxxxxxxxxx> > Cc: Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> > Cc: bpf@xxxxxxxxxxxxxxx > Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> > Cc: Jordan Rife <jrife@xxxxxxxxxx> > --- > Changes since v0: > - Introduce tracepoint_call_rcu(), > - Fix bpf_link_free() use of call_rcu as well. > > Changes since v1: > - Use tracepoint_call_rcu() for bpf_prog_put as well. > > Changes since v2: > - Do not cover bpf changes in the same commit, let bpf developers > implement it. > --- > kernel/tracepoint.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c > index 5658dc92f5b5..47569fb06596 100644 > --- a/kernel/tracepoint.c > +++ b/kernel/tracepoint.c > @@ -106,13 +106,16 @@ static void rcu_free_old_probes(struct rcu_head *head) > kfree(container_of(head, struct tp_probes, rcu)); > } > > -static inline void release_probes(struct tracepoint_func *old) > +static inline void release_probes(struct tracepoint *tp, struct tracepoint_func *old) > { > if (old) { > struct tp_probes *tp_probes = container_of(old, > struct tp_probes, probes[0]); > > - call_rcu(&tp_probes->rcu, rcu_free_old_probes); > + if (tracepoint_is_syscall(tp)) > + call_rcu_tasks_trace(&tp_probes->rcu, rcu_free_old_probes); should this be call_rcu_tasks_trace() -> call_rcu() chain instead of just call_rcu_tasks_trace()? While currently call_rcu_tasks_trace() implies RCU GP (as evidenced by rcu_trace_implies_rcu_gp() being hardcoded right now to returning true), this might not always be the case in the future, so it's best to have a guarantee that regardless of sleepable or not, we'll always have have RCU GP, and for sleepable tracepoint *also* RCU Tasks Trace GP. > + else > + call_rcu(&tp_probes->rcu, rcu_free_old_probes); > } > } > > @@ -334,7 +337,7 @@ static int tracepoint_add_func(struct tracepoint *tp, > break; > } > > - release_probes(old); > + release_probes(tp, old); > return 0; > } > > @@ -405,7 +408,7 @@ static int tracepoint_remove_func(struct tracepoint *tp, > WARN_ON_ONCE(1); > break; > } > - release_probes(old); > + release_probes(tp, old); > return 0; > } > > -- > 2.39.5 >