Re: [RFC PATCH] bpf: Prevent recursive deadlocks in BPF programs attached to spin lock helpers using fentry/ fexit

Siddharth Chintamaneni <sidchintamaneni@xxxxxxxxx> · Tue, 6 Feb 2024 19:21:22 -0500

On Sun, 4 Feb 2024 at 14:09, Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
>
> On 2/2/24 4:21 PM, Siddharth Chintamaneni wrote:
> > On Tue, 30 Jan 2024 at 04:25, Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> >> On Wed, Jan 24, 2024 at 10:43:32AM -0500, Siddharth Chintamaneni wrote:
> >>> While we were working on some experiments with BPF trampoline, we came
> >>> across a deadlock scenario that could happen.
> >>>
> >>> A deadlock happens when two nested BPF programs tries to acquire the
> >>> same lock i.e, If a BPF program is attached using fexit to
> >>> bpf_spin_lock or using a fentry to bpf_spin_unlock, and it then
> >>> attempts to acquire the same lock as the previous BPF program, a
> >>> deadlock situation arises.
> >>>
> >>> Here is an example:
> >>>
> >>> SEC(fentry/bpf_spin_unlock)
> >>> int fentry_2{
> >>>    bpf_spin_lock(&x->lock);
> >>>    bpf_spin_unlock(&x->lock);
> >>> }
> >>>
> >>> SEC(fentry/xxx)
> >>> int fentry_1{
> >>>    bpf_spin_lock(&x->lock);
> >>>    bpf_spin_unlock(&x->lock);
> >>> }
> >> hi,
> >> looks like valid issue, could you add selftest for that?
> > Hello,
> > I have added selftest for the deadlock scenario.
> >
> >> I wonder we could restrict just programs that use bpf_spin_lock/bpf_spin_unlock
> >> helpers? I'm not sure there's any useful use case for tracing spin lock helpers,
> >> but I think we should at least try this before we deny it completely
> >>
> > If we restrict programs (attached to spinlock helpers) that use
> > bpf_spin_lock/unlock helpers, there could be a scenario where a helper
> > function called within the program has a BPF program attached that
> > tries to acquire the same lock.
> >
> >>> To prevent these cases, a simple fix could be adding these helpers to
> >>> denylist in the verifier. This fix will prevent the BPF programs from
> >>> being loaded by the verifier.
> >>>
> >>> previously, a similar solution was proposed to prevent recursion.
> >>> https://lore.kernel.org/lkml/20230417154737.12740-2-laoar.shao@xxxxxxxxx/
> >> the difference is that __rcu_read_lock/__rcu_read_unlock are called unconditionally
> >> (always) when executing bpf tracing probe, the problem you described above is only
> >> for programs calling spin lock helpers (on same spin lock)
> >>
> >>> Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@xxxxxx>
> >>> ---
> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >>> index 65f598694d55..8f1834f27f81 100644
> >>> --- a/kernel/bpf/verifier.c
> >>> +++ b/kernel/bpf/verifier.c
> >>> @@ -20617,6 +20617,10 @@ BTF_ID(func, preempt_count_sub)
> >>>   BTF_ID(func, __rcu_read_lock)
> >>>   BTF_ID(func, __rcu_read_unlock)
> >>>   #endif
> >>> +#if defined(CONFIG_DYNAMIC_FTRACE)
> >> why the CONFIG_DYNAMIC_FTRACE dependency?
> > As we described in the self-tests, nesting of multiple BPF programs
> > could only happen with fentry/fexit programs when DYNAMIC_FTRACE is
> > enabled. In other scenarios, when DYNAMIC_FTRACE is disabled, a BPF
> > program cannot be attached to any helper functions.
> >> jirka
> >>
> >>> +BTF_ID(func, bpf_spin_lock)
> >>> +BTF_ID(func, bpf_spin_unlock)
> >>> +#endif
> >>>   BTF_SET_END(btf_id_deny)
>
> Currently, we already have 'notrace' marked to bpf_spin_lock
> and bpf_spin_unlock:
>
> notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
> {
>          __bpf_spin_lock_irqsave(lock);
>          return 0;
> }
> notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock)
> {
>          __bpf_spin_unlock_irqrestore(lock);
>          return 0;
> }
>
> But unfortunately, BPF_CALL_* macros put notrace to the static
> inline function ____bpf_spin_lock()/____bpf_spin_unlock(), and not
> to static function bpf_spin_lock()/bpf_spin_unlock().
>
> I think the following is a better fix and reflects original
> intention:

My bad, I somehow incorrectly tested the fix using the notrace macro
and didn't realize that it is because of inlining. You are right, and
I agree that the proposed solution fixes the unintended bug.

>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index fee070b9826e..779f8ee71607 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -566,6 +566,25 @@ static inline bool insn_is_zext(const struct bpf_insn *insn)
>   #define BPF_CALL_4(name, ...)  BPF_CALL_x(4, name, __VA_ARGS__)
>   #define BPF_CALL_5(name, ...)  BPF_CALL_x(5, name, __VA_ARGS__)
>
> +#define NOTRACE_BPF_CALL_x(x, name, ...)                                              \
> +       static __always_inline                                                 \
> +       u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__));   \
> +       typedef u64 (*btf_##name)(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \
> +       notrace u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__));         \
> +       notrace u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__))          \
> +       {                                                                      \
> +               return ((btf_##name)____##name)(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\
> +       }                                                                      \
> +       static __always_inline                                                 \
> +       u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__))
> +
> +#define NOTRACE_BPF_CALL_0(name, ...)  NOTRACE_BPF_CALL_x(0, name, __VA_ARGS__)
> +#define NOTRACE_BPF_CALL_1(name, ...)  NOTRACE_BPF_CALL_x(1, name, __VA_ARGS__)
> +#define NOTRACE_BPF_CALL_2(name, ...)  NOTRACE_BPF_CALL_x(2, name, __VA_ARGS__)
> +#define NOTRACE_BPF_CALL_3(name, ...)  NOTRACE_BPF_CALL_x(3, name, __VA_ARGS__)
> +#define NOTRACE_BPF_CALL_4(name, ...)  NOTRACE_BPF_CALL_x(4, name, __VA_ARGS__)
> +#define NOTRACE_BPF_CALL_5(name, ...)  NOTRACE_BPF_CALL_x(5, name, __VA_ARGS__)
> +
>   #define bpf_ctx_range(TYPE, MEMBER)                                            \
>          offsetof(TYPE, MEMBER) ... offsetofend(TYPE, MEMBER) - 1
>   #define bpf_ctx_range_till(TYPE, MEMBER1, MEMBER2)                             \
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 4db1c658254c..87136e27a99a 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -334,7 +334,7 @@ static inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock)
>          __this_cpu_write(irqsave_flags, flags);
>   }
>
> -notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
> +NOTRACE_BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
>   {
>          __bpf_spin_lock_irqsave(lock);
>          return 0;
> @@ -357,7 +357,7 @@ static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock)
>          local_irq_restore(flags);
>   }
>
> -notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock)
> +NOTRACE_BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock)
>   {
>          __bpf_spin_unlock_irqrestore(lock);
>          return 0;
>
>
> Macros NOTRACE_BPF_CALL_*() could be consolated with BPF_CALL_*() but I think
> a separate macro might be easier to understand.
>
> > Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@xxxxxx>
> > ---
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 65f598694d55..ffc2515195f1 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -20617,6 +20617,10 @@ BTF_ID(func, preempt_count_sub)
> >   BTF_ID(func, __rcu_read_lock)
> >   BTF_ID(func, __rcu_read_unlock)
> >   #endif
> > +#ifdef CONFIG_DYNAMIC_FTRACE
> > +BTF_ID(func, bpf_spin_lock)
> > +BTF_ID(func, bpf_spin_unlock)
> > +#endif
> >   BTF_SET_END(btf_id_deny)
> >
> >   static bool can_be_sleepable(struct bpf_prog *prog)
> [...]