On Wed, Oct 23, 2024 at 8:21 AM Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > > On 2024-10-23 11:14, Alexei Starovoitov wrote: > > On Wed, Oct 23, 2024 at 7:56 AM Jordan Rife <jrife@xxxxxxxxxx> wrote: > >> > >> Mathieu's patch alone does not seem to be enough to prevent the > >> use-after-free issue reported by syzbot. > >> > >> Link: https://lore.kernel.org/bpf/67121037.050a0220.10f4f4.000f.GAE@xxxxxxxxxx/T/#u > >> > >> I reran the repro script with his patch applied to my tree and was > >> still able to get the same KASAN crash to occur. > >> > >> In this case, when bpf_link_free is invoked it kicks off three instances > >> of call_rcu*. > >> > >> bpf_link_free() > >> ops->release() > >> bpf_raw_tp_link_release() > >> bpf_probe_unregister() > >> tracepoint_probe_unregister() > >> tracepoint_remove_func() > >> release_probes() > >> call_rcu() [1] > >> bpf_prog_put() > >> __bpf_prog_put() > >> bpf_prog_put_deferred() > >> __bpf_prog_put_noref() > >> call_rcu() [2] > >> call_rcu() [3] > >> > >> With Mathieu's patch, [1] is chained with call_rcu_tasks_trace() > >> making the grace period suffiently long to safely free the probe itself. > >> The callback for [2] and [3] may be invoked before the > >> call_rcu_tasks_trace() grace period has elapsed leading to the link or > >> program itself being freed while still in use. I was able to prevent > >> any crashes with the patch below which also chains > >> call_rcu_tasks_trace() and call_rcu() at [2] and [3]. > >> > >> --- > >> kernel/bpf/syscall.c | 24 ++++++++++-------------- > >> 1 file changed, 10 insertions(+), 14 deletions(-) > >> > >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > >> index 59de664e580d..5290eccb465e 100644 > >> --- a/kernel/bpf/syscall.c > >> +++ b/kernel/bpf/syscall.c > >> @@ -2200,6 +2200,14 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu) > >> bpf_prog_free(aux->prog); > >> } > >> > >> +static void __bpf_prog_put_tasks_trace_rcu(struct rcu_head *rcu) > >> +{ > >> + if (rcu_trace_implies_rcu_gp()) > >> + __bpf_prog_put_rcu(rcu); > >> + else > >> + call_rcu(rcu, __bpf_prog_put_rcu); > >> +} > >> + > >> static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred) > >> { > >> bpf_prog_kallsyms_del_all(prog); > >> @@ -2212,10 +2220,7 @@ static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred) > >> btf_put(prog->aux->attach_btf); > >> > >> if (deferred) { > >> - if (prog->sleepable) > >> - call_rcu_tasks_trace(&prog->aux->rcu, __bpf_prog_put_rcu); > >> - else > >> - call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); > >> + call_rcu_tasks_trace(&prog->aux->rcu, __bpf_prog_put_tasks_trace_rcu); > >> } else { > >> __bpf_prog_put_rcu(&prog->aux->rcu); > >> } > >> @@ -2996,24 +3001,15 @@ static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) > >> static void bpf_link_free(struct bpf_link *link) > >> { > >> const struct bpf_link_ops *ops = link->ops; > >> - bool sleepable = false; > >> > >> bpf_link_free_id(link->id); > >> if (link->prog) { > >> - sleepable = link->prog->sleepable; > >> /* detach BPF program, clean up used resources */ > >> ops->release(link); > >> bpf_prog_put(link->prog); > >> } > >> if (ops->dealloc_deferred) { > >> - /* schedule BPF link deallocation; if underlying BPF program > >> - * is sleepable, we need to first wait for RCU tasks trace > >> - * sync, then go through "classic" RCU grace period > >> - */ > >> - if (sleepable) > >> - call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); > >> - else > >> - call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp); > >> + call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); > > > > This patch is completely wrong. > > Actually I suspect Jordan's patch works. We're not going to penalize all bpf progs for that. This patch is a non-starter. > > Looks like Mathieu patch broke bpf program contract somewhere. > > My patch series introduced this in the probe: > > #define __BPF_DECLARE_TRACE_SYSCALL(call, proto, args) \ > static notrace void \ > __bpf_trace_##call(void *__data, proto) \ > { \ > might_fault(); \ > preempt_disable_notrace(); \ > CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args)); \ > preempt_enable_notrace(); \ > } > > To ensure we'd call the bpf program from preempt-off context. > > > The tracepoint type bpf programs must be called with rcu_read_lock held. > > Calling the bpf program with preempt off is equivalent. __DO_TRACE() calls > the probes with preempt_disable_notrace() in the normal case. > > > Looks like it's not happening anymore. > > The issue here is not about the context in which the bpf program runs, that's > still preempt off. The problem is about expectations that a call_rcu grace period > is enough to delay reclaim after unregistration of the tracepoint. Now that > __DO_TRACE() uses rcu_read_lock_trace() to protect RCU dereference, it's not > sufficient anymore, at least for syscall tracepoints. I don't see how preempt_disable vs preempt_disable_notrace and preemption in general are relevant here. The prog dereference needs to happen under rcu_read_lock. But since you've switched to use rcu_read_lock_trace, so then this part: > >> bpf_link_free() > >> ops->release() > >> bpf_raw_tp_link_release() > >> bpf_probe_unregister() > >> tracepoint_probe_unregister() > >> tracepoint_remove_func() > >> release_probes() > >> call_rcu() [1] is probably incorrect. It should be call_rcu_tasks_trace ? and in addition all tracepoints (both sleepable and not) should deref __data under normal rcu_read_lock() before passing that pointer into __bpf_trace_##call. Because that's bpf link and prog are rcu protected. I don't have patches in front of me, so I'm guessing a bit. So pls remind me where all these patches are? What tree/branch are we talking about?