On Mon, Apr 22, 2024 at 4:20 PM Siddharth Chintamaneni <sidchintamaneni@xxxxxxxxx> wrote: > > On Mon, 22 Apr 2024 at 13:13, Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Mon, Apr 22, 2024 at 12:47 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > > > > > On Sun, Apr 21, 2024 at 07:43:36PM -0400, Siddharth Chintamaneni wrote: > > > > This patch is to prevent deadlocks when multiple bpf > > > > programs are attached to queued_spin_locks functions. This issue is similar > > > > to what is already discussed[1] before with the spin_lock helpers. > > > > > > > > The addition of notrace macro to the queued_spin_locks > > > > has been discussed[2] when bpf_spin_locks are introduced. > > > > > > > > [1] https://lore.kernel.org/bpf/CAE5sdEigPnoGrzN8WU7Tx-h-iFuMZgW06qp0KHWtpvoXxf1OAQ@xxxxxxxxxxxxxx/#r > > > > [2] https://lore.kernel.org/all/20190117011629.efxp7abj4bpf5yco@ast-mbp/t/#maf05c4d71f935f3123013b7ed410e4f50e9da82c > > > > > > > > Fixes: d83525ca62cf ("bpf: introduce bpf_spin_lock") > > > > Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@xxxxxx> > > > > --- > > > > kernel/locking/qspinlock.c | 2 +- > > > > .../bpf/prog_tests/tracing_failure.c | 24 +++++++++++++++++++ > > > > .../selftests/bpf/progs/tracing_failure.c | 6 +++++ > > > > 3 files changed, 31 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > > > > index ebe6b8ec7cb3..4d46538d8399 100644 > > > > --- a/kernel/locking/qspinlock.c > > > > +++ b/kernel/locking/qspinlock.c > > > > @@ -313,7 +313,7 @@ static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, > > > > * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : > > > > * queue : ^--' : > > > > */ > > > > -void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) > > > > +notrace void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) > > > > > > we did the same for bpf spin lock helpers, which is fine, but I wonder > > > removing queued_spin_lock_slowpath from traceable functions could break > > > some scripts (even though many probably use contention tracepoints..) > > > > > > maybe we could have a list of helpers/kfuncs that could call spin lock > > > and deny bpf program to load/attach to queued_spin_lock_slowpath > > > if it calls anything from that list > > > > We can filter out many such functions, but the possibility of deadlock > > will still exist. > > Adding notrace here won't help much, > > since there are tracepoints in there: trace_contention_begin/end > > which are quite useful and we should still allow bpf to use them. > > I think the only bullet proof way is to detect deadlocks at runtime. > > I'm working on such "try hard to spin_lock but abort if it deadlocks." > > I agree with the point that notracing all the functions will not > resolve the issue. I could also find a scenario where BPF programs > will end up in a deadlock easily by using bpf_map_pop_elem and > bpf_map_push_elem helper functions called from two different BPF > programs accessing the same map. Here are some issues raised by syzbot > [2, 3]. ringbuf and stackqueue maps should probably be fixed now similar to hashmap's __this_cpu_inc_return(*(htab->map_locked...) approach. Both ringbug and queue_stack can handle failure to lock. That will address the issue spotted by these 2 syzbot reports. Could you work on such patches? The full run-time solution will take time to land and may be too big to be backportable. I'll certainly cc you on the patches when I send them. > I also believe that a BPF program can end up in a deadlock scenario > without any assistance from the second BPF program, like described > above. The runtime solution sounds like a better fit to address this > problem, unless there is a BPF program that should definitely run for > the performance or security of the system (like an LSM hook or a > nested scheduling type program as mentioned here [1]). Right. Certain bpf progs like tcp-bpf don't even have a recursion run-time counter, because they have to nest and it's safe to nest. > In those cases, the user assumes that these BPF programs will always > trigger. So, to address these types of issues, we are currently > working on a helper's function callgraph based approach so that the > verifier gets the ability to make a decision during load time on > whether to load it or not, ensuring that if a BPF program is attached, > it will be triggered. callgraph approach? Could you share more details? > > [1] https://lore.kernel.org/all/a15f6a20-c902-4057-a1a9-8259a225bb8b@xxxxxxxxx/ > [2] https://lore.kernel.org/lkml/0000000000004aa700061379547e@xxxxxxxxxx/ > [3] https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@xxxxxxxxxx/ > > PS. We are following a similar approach to solve the stackoverflow > problem with nesting. Not following. The stack overflow issue is being fixed by not using the kernel stack. So each bpf prog will consume a tiny bit of stack to save frame pointer, return address, and callee regs.