Mathieu, can you look at this? [ more below ] On Mon, 21 Oct 2024 18:23:47 +0000 Jordan Rife <jrife@xxxxxxxxxx> wrote: > I performed a bisection and this issue starts with commit a363d27cdbc2 > ("tracing: Allow system call tracepoints to handle page faults") which > introduces this change. > > > + * > > + * With @syscall=0, the tracepoint callback array dereference is > > + * protected by disabling preemption. > > + * With @syscall=1, the tracepoint callback array dereference is > > + * protected by Tasks Trace RCU, which allows probes to handle page > > + * faults. > > */ > > #define __DO_TRACE(name, args, cond, syscall) \ > > do { \ > > @@ -204,11 +212,17 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) > > if (!(cond)) \ > > return; \ > > \ > > - preempt_disable_notrace(); \ > > + if (syscall) \ > > + rcu_read_lock_trace(); \ > > + else \ > > + preempt_disable_notrace(); \ > > \ > > __DO_TRACE_CALL(name, TP_ARGS(args)); \ > > \ > > - preempt_enable_notrace(); \ > > + if (syscall) \ > > + rcu_read_unlock_trace(); \ > > + else \ > > + preempt_enable_notrace(); \ > > } while (0) > > Link: https://lore.kernel.org/bpf/20241009010718.2050182-6-mathieu.desnoyers@xxxxxxxxxxxx/ > > I reproduced the bug locally by running syz-execprog inside a QEMU VM. > > > ./syz-execprog -repeat=0 -procs=5 ./repro.syz.txt > > I /think/ what is happening is that with this change preemption may now > occur leading to a scenario where the RCU grace period is insufficient > in a few places where call_rcu() is used. In other words, there are a > few scenarios where call_rcu_tasks_trace() should be used instead to > prevent a use-after-free bug when a preempted tracepoint call tries to > access a program, link, etc. that was freed. It seems the syzkaller > program induces page faults while attaching raw tracepoints to > sys_enter making preemption more likely to occur. > > kernel/tracepoint.c > =================== > > ... > > static inline void release_probes(struct tracepoint_func *old) > > { > > ... > > call_rcu(&tp_probes->rcu, rcu_free_old_probes); <-- Here Have you tried just changing this one to call_rcu_tasks_trace()? -- Steve > > ... > > } > > ... > > kernel/bpf/syscall.c > ==================== > > static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred) > > { > > bpf_prog_kallsyms_del_all(prog); > > btf_put(prog->aux->btf); > > module_put(prog->aux->mod); > > kvfree(prog->aux->jited_linfo); > > kvfree(prog->aux->linfo); > > kfree(prog->aux->kfunc_tab); > > if (prog->aux->attach_btf) > > btf_put(prog->aux->attach_btf); > > > > if (deferred) { > > if (prog->sleepable) <------ HERE: New condition needed? > > call_rcu_tasks_trace(&prog->aux->rcu, __bpf_prog_put_rcu); > > else > > call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); > > } else { > > __bpf_prog_put_rcu(&prog->aux->rcu); > > } > > } > > > > static void bpf_link_free(struct bpf_link *link) > > { > > const struct bpf_link_ops *ops = link->ops; > > bool sleepable = false; > > > > bpf_link_free_id(link->id); > > if (link->prog) { > > sleepable = link->prog->sleepable; > > /* detach BPF program, clean up used resources */ > > ops->release(link); > > bpf_prog_put(link->prog); > > } > > if (ops->dealloc_deferred) { > > /* schedule BPF link deallocation; if underlying BPF program > > * is sleepable, we need to first wait for RCU tasks trace > > * sync, then go through "classic" RCU grace period > > */ > > if (prog->sleepable) <------ HERE: New condition needed? > > call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); > > else > > call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp); > > } else if (ops->dealloc) > > ops->dealloc(link); > > } > > After patching things locally to ensure that call_rcu_tasks_trace() is > always used in these three places I was unable to induce a KASAN bug > to occur whereas before it happened pretty much every time I ran > ./sys-execprog within a minute or so. > > I'm a bit unsure about the actual conditions under which > call_rcu_tasks_trace() should be used here though. Should there perhaps > be another condition such as `preemptable` which is used to determine > if call_rcu_tasks_trace() or call_rcu() should be used to free > links/programs? Is there any harm in just using call_rcu_tasks_trace() > every time in combination with rcu_trace_implies_rcu_gp() like it is > in bpf_link_defer_dealloc_mult_rcu_gp()? > > > static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu)? > > { > > if (rcu_trace_implies_rcu_gp()) > > bpf_link_defer_dealloc_rcu_gp(rcu); > > else > > call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp); > > } > > - Jordan