On 2024-10-28 15:19, Mathieu Desnoyers wrote:
On 2024-10-27 21:22, Andrii Nakryiko wrote:
On Sat, Oct 26, 2024 at 8:48 AM Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
The grace period used internally within tracepoint.c:release_probes()
uses call_rcu() to batch waiting for quiescence of old probe arrays,
rather than using the tracepoint_synchronize_unregister() which blocks
while waiting for quiescence.
With the introduction of faultable syscall tracepoints, this causes
use-after-free issues reproduced with syzkaller.
Fix this by using the appropriate call_rcu() or call_rcu_tasks_trace()
before invoking the rcu_free_old_probes callback. This can be chosen
using the tracepoint_is_syscall() API.
A similar issue exists in bpf use of call_rcu(). Fixing this is left to
a separate change.
Reported-by: syzbot+b390c8062d8387b6272a@xxxxxxxxxxxxxxxxxxxxxxxxx
Fixes: a363d27cdbc2 ("tracing: Allow system call tracepoints to
handle page faults")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
Cc: Michael Jeanson <mjeanson@xxxxxxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
Cc: Yonghong Song <yhs@xxxxxx>
Cc: Paul E. McKenney <paulmck@xxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Mark Rutland <mark.rutland@xxxxxxx>
Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
Cc: Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx>
Cc: bpf@xxxxxxxxxxxxxxx
Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
Cc: Jordan Rife <jrife@xxxxxxxxxx>
---
Changes since v0:
- Introduce tracepoint_call_rcu(),
- Fix bpf_link_free() use of call_rcu as well.
Changes since v1:
- Use tracepoint_call_rcu() for bpf_prog_put as well.
Changes since v2:
- Do not cover bpf changes in the same commit, let bpf developers
implement it.
---
kernel/tracepoint.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 5658dc92f5b5..47569fb06596 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -106,13 +106,16 @@ static void rcu_free_old_probes(struct rcu_head
*head)
kfree(container_of(head, struct tp_probes, rcu));
}
-static inline void release_probes(struct tracepoint_func *old)
+static inline void release_probes(struct tracepoint *tp, struct
tracepoint_func *old)
{
if (old) {
struct tp_probes *tp_probes = container_of(old,
struct tp_probes, probes[0]);
- call_rcu(&tp_probes->rcu, rcu_free_old_probes);
+ if (tracepoint_is_syscall(tp))
+ call_rcu_tasks_trace(&tp_probes->rcu,
rcu_free_old_probes);
should this be call_rcu_tasks_trace() -> call_rcu() chain instead of
just call_rcu_tasks_trace()? While currently call_rcu_tasks_trace()
implies RCU GP (as evidenced by rcu_trace_implies_rcu_gp() being
hardcoded right now to returning true), this might not always be the
case in the future, so it's best to have a guarantee that regardless
of sleepable or not, we'll always have have RCU GP, and for sleepable
tracepoint *also* RCU Tasks Trace GP.
Given that faultable tracepoints only use RCU tasks trace for the
read-side and do not rely on preempt disable, I don't see why we would
need to chain both grace periods there ?
Hi Andrii,
AFAIU, your question above is rooted in the way bpf does its sleepable
program grace periods (chaining RCU tasks trace + RCU GP), e.g.:
bpf_map_free_mult_rcu_gp
bpf_link_defer_dealloc_mult_rcu_gp
and
bpf_link_free:
/* schedule BPF link deallocation; if underlying BPF program
* is sleepable, we need to first wait for RCU tasks trace
* sync, then go through "classic" RCU grace period
*/
This is introduced in commit 1a80dbcb2db ("bpf: support deferring bpf_link dealloc to after RCU grace period")
which has a bit more information in the commit message, but what I'm not seeing
is an explanation of *why* chaining RCU tasks trace and RCU grace periods is
needed for sleepable bpf programs. What am I missing ?
As far as tracepoint.c release_probes() is concerned, just waiting for
RCU tasks trace before freeing memory of faultable tracepoints is
sufficient.
Thanks,
Mathieu
Thanks,
Mathieu
+ else
+ call_rcu(&tp_probes->rcu, rcu_free_old_probes);
}
}
@@ -334,7 +337,7 @@ static int tracepoint_add_func(struct tracepoint
*tp,
break;
}
- release_probes(old);
+ release_probes(tp, old);
return 0;
}
@@ -405,7 +408,7 @@ static int tracepoint_remove_func(struct
tracepoint *tp,
WARN_ON_ONCE(1);
break;
}
- release_probes(old);
+ release_probes(tp, old);
return 0;
}
--
2.39.5
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com