Re: [PATCH bpf-next v2] Add notrace to queued_spin_lock_slowpath

Siddharth Chintamaneni <sidchintamaneni@xxxxxxxxx> · Mon, 22 Apr 2024 19:20:18 -0400

On Mon, 22 Apr 2024 at 13:13, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Mon, Apr 22, 2024 at 12:47 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> >
> > On Sun, Apr 21, 2024 at 07:43:36PM -0400, Siddharth Chintamaneni wrote:
> > > This patch is to prevent deadlocks when multiple bpf
> > > programs are attached to queued_spin_locks functions. This issue is similar
> > > to what is already discussed[1] before with the spin_lock helpers.
> > >
> > > The addition of notrace macro to the queued_spin_locks
> > > has been discussed[2] when bpf_spin_locks are introduced.
> > >
> > > [1] https://lore.kernel.org/bpf/CAE5sdEigPnoGrzN8WU7Tx-h-iFuMZgW06qp0KHWtpvoXxf1OAQ@xxxxxxxxxxxxxx/#r
> > > [2] https://lore.kernel.org/all/20190117011629.efxp7abj4bpf5yco@ast-mbp/t/#maf05c4d71f935f3123013b7ed410e4f50e9da82c
> > >
> > > Fixes: d83525ca62cf ("bpf: introduce bpf_spin_lock")
> > > Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@xxxxxx>
> > > ---
> > >  kernel/locking/qspinlock.c                    |  2 +-
> > >  .../bpf/prog_tests/tracing_failure.c          | 24 +++++++++++++++++++
> > >  .../selftests/bpf/progs/tracing_failure.c     |  6 +++++
> > >  3 files changed, 31 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> > > index ebe6b8ec7cb3..4d46538d8399 100644
> > > --- a/kernel/locking/qspinlock.c
> > > +++ b/kernel/locking/qspinlock.c
> > > @@ -313,7 +313,7 @@ static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
> > >   * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
> > >   *   queue               :         ^--'                             :
> > >   */
> > > -void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> > > +notrace void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> >
> > we did the same for bpf spin lock helpers, which is fine, but I wonder
> > removing queued_spin_lock_slowpath from traceable functions could break
> > some scripts (even though many probably use contention tracepoints..)
> >
> > maybe we could have a list of helpers/kfuncs that could call spin lock
> > and deny bpf program to load/attach to queued_spin_lock_slowpath
> > if it calls anything from that list
>
> We can filter out many such functions, but the possibility of deadlock
> will still exist.
> Adding notrace here won't help much,
> since there are tracepoints in there: trace_contention_begin/end
> which are quite useful and we should still allow bpf to use them.
> I think the only bullet proof way is to detect deadlocks at runtime.
> I'm working on such "try hard to spin_lock but abort if it deadlocks."

I agree with the point that notracing all the functions will not
resolve the issue. I could also find a scenario where BPF programs
will end up in a deadlock easily by using bpf_map_pop_elem and
bpf_map_push_elem helper functions called from two different BPF
programs accessing the same map. Here are some issues raised by syzbot
[2, 3].

 I also believe that a BPF program can end up in a deadlock scenario
without any assistance from the second BPF program, like described
above. The runtime solution sounds like a better fit to address this
problem, unless there is a BPF program that should definitely run for
the performance or security of the system (like an LSM hook or a
nested scheduling type program as mentioned here [1]).

In those cases, the user assumes that these BPF programs will always
trigger. So, to address these types of issues, we are currently
working on a helper's function callgraph based approach so that the
verifier gets the ability to make a decision during load time on
whether to load it or not, ensuring that if a BPF program is attached,
it will be triggered.

[1] https://lore.kernel.org/all/a15f6a20-c902-4057-a1a9-8259a225bb8b@xxxxxxxxx/
[2] https://lore.kernel.org/lkml/0000000000004aa700061379547e@xxxxxxxxxx/
[3] https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@xxxxxxxxxx/

PS. We are following a similar approach to solve the stackoverflow
problem with nesting.

Thanks,
Siddharth