On Sun, 4 Feb 2024 at 14:09, Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > On 2/2/24 4:21 PM, Siddharth Chintamaneni wrote: > > On Tue, 30 Jan 2024 at 04:25, Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > >> On Wed, Jan 24, 2024 at 10:43:32AM -0500, Siddharth Chintamaneni wrote: > >>> While we were working on some experiments with BPF trampoline, we came > >>> across a deadlock scenario that could happen. > >>> > >>> A deadlock happens when two nested BPF programs tries to acquire the > >>> same lock i.e, If a BPF program is attached using fexit to > >>> bpf_spin_lock or using a fentry to bpf_spin_unlock, and it then > >>> attempts to acquire the same lock as the previous BPF program, a > >>> deadlock situation arises. > >>> > >>> Here is an example: > >>> > >>> SEC(fentry/bpf_spin_unlock) > >>> int fentry_2{ > >>> bpf_spin_lock(&x->lock); > >>> bpf_spin_unlock(&x->lock); > >>> } > >>> > >>> SEC(fentry/xxx) > >>> int fentry_1{ > >>> bpf_spin_lock(&x->lock); > >>> bpf_spin_unlock(&x->lock); > >>> } > >> hi, > >> looks like valid issue, could you add selftest for that? > > Hello, > > I have added selftest for the deadlock scenario. > > > >> I wonder we could restrict just programs that use bpf_spin_lock/bpf_spin_unlock > >> helpers? I'm not sure there's any useful use case for tracing spin lock helpers, > >> but I think we should at least try this before we deny it completely > >> > > If we restrict programs (attached to spinlock helpers) that use > > bpf_spin_lock/unlock helpers, there could be a scenario where a helper > > function called within the program has a BPF program attached that > > tries to acquire the same lock. > > > >>> To prevent these cases, a simple fix could be adding these helpers to > >>> denylist in the verifier. This fix will prevent the BPF programs from > >>> being loaded by the verifier. > >>> > >>> previously, a similar solution was proposed to prevent recursion. > >>> https://lore.kernel.org/lkml/20230417154737.12740-2-laoar.shao@xxxxxxxxx/ > >> the difference is that __rcu_read_lock/__rcu_read_unlock are called unconditionally > >> (always) when executing bpf tracing probe, the problem you described above is only > >> for programs calling spin lock helpers (on same spin lock) > >> > >>> Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@xxxxxx> > >>> --- > >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > >>> index 65f598694d55..8f1834f27f81 100644 > >>> --- a/kernel/bpf/verifier.c > >>> +++ b/kernel/bpf/verifier.c > >>> @@ -20617,6 +20617,10 @@ BTF_ID(func, preempt_count_sub) > >>> BTF_ID(func, __rcu_read_lock) > >>> BTF_ID(func, __rcu_read_unlock) > >>> #endif > >>> +#if defined(CONFIG_DYNAMIC_FTRACE) > >> why the CONFIG_DYNAMIC_FTRACE dependency? > > As we described in the self-tests, nesting of multiple BPF programs > > could only happen with fentry/fexit programs when DYNAMIC_FTRACE is > > enabled. In other scenarios, when DYNAMIC_FTRACE is disabled, a BPF > > program cannot be attached to any helper functions. > >> jirka > >> > >>> +BTF_ID(func, bpf_spin_lock) > >>> +BTF_ID(func, bpf_spin_unlock) > >>> +#endif > >>> BTF_SET_END(btf_id_deny) > > Currently, we already have 'notrace' marked to bpf_spin_lock > and bpf_spin_unlock: > > notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock) > { > __bpf_spin_lock_irqsave(lock); > return 0; > } > notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock) > { > __bpf_spin_unlock_irqrestore(lock); > return 0; > } > > But unfortunately, BPF_CALL_* macros put notrace to the static > inline function ____bpf_spin_lock()/____bpf_spin_unlock(), and not > to static function bpf_spin_lock()/bpf_spin_unlock(). > > I think the following is a better fix and reflects original > intention: My bad, I somehow incorrectly tested the fix using the notrace macro and didn't realize that it is because of inlining. You are right, and I agree that the proposed solution fixes the unintended bug. > > diff --git a/include/linux/filter.h b/include/linux/filter.h > index fee070b9826e..779f8ee71607 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -566,6 +566,25 @@ static inline bool insn_is_zext(const struct bpf_insn *insn) > #define BPF_CALL_4(name, ...) BPF_CALL_x(4, name, __VA_ARGS__) > #define BPF_CALL_5(name, ...) BPF_CALL_x(5, name, __VA_ARGS__) > > +#define NOTRACE_BPF_CALL_x(x, name, ...) \ > + static __always_inline \ > + u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \ > + typedef u64 (*btf_##name)(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \ > + notrace u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__)); \ > + notrace u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__)) \ > + { \ > + return ((btf_##name)____##name)(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\ > + } \ > + static __always_inline \ > + u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)) > + > +#define NOTRACE_BPF_CALL_0(name, ...) NOTRACE_BPF_CALL_x(0, name, __VA_ARGS__) > +#define NOTRACE_BPF_CALL_1(name, ...) NOTRACE_BPF_CALL_x(1, name, __VA_ARGS__) > +#define NOTRACE_BPF_CALL_2(name, ...) NOTRACE_BPF_CALL_x(2, name, __VA_ARGS__) > +#define NOTRACE_BPF_CALL_3(name, ...) NOTRACE_BPF_CALL_x(3, name, __VA_ARGS__) > +#define NOTRACE_BPF_CALL_4(name, ...) NOTRACE_BPF_CALL_x(4, name, __VA_ARGS__) > +#define NOTRACE_BPF_CALL_5(name, ...) NOTRACE_BPF_CALL_x(5, name, __VA_ARGS__) > + > #define bpf_ctx_range(TYPE, MEMBER) \ > offsetof(TYPE, MEMBER) ... offsetofend(TYPE, MEMBER) - 1 > #define bpf_ctx_range_till(TYPE, MEMBER1, MEMBER2) \ > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index 4db1c658254c..87136e27a99a 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -334,7 +334,7 @@ static inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock) > __this_cpu_write(irqsave_flags, flags); > } > > -notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock) > +NOTRACE_BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock) > { > __bpf_spin_lock_irqsave(lock); > return 0; > @@ -357,7 +357,7 @@ static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock) > local_irq_restore(flags); > } > > -notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock) > +NOTRACE_BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock) > { > __bpf_spin_unlock_irqrestore(lock); > return 0; > > > Macros NOTRACE_BPF_CALL_*() could be consolated with BPF_CALL_*() but I think > a separate macro might be easier to understand. > > > Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@xxxxxx> > > --- > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index 65f598694d55..ffc2515195f1 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -20617,6 +20617,10 @@ BTF_ID(func, preempt_count_sub) > > BTF_ID(func, __rcu_read_lock) > > BTF_ID(func, __rcu_read_unlock) > > #endif > > +#ifdef CONFIG_DYNAMIC_FTRACE > > +BTF_ID(func, bpf_spin_lock) > > +BTF_ID(func, bpf_spin_unlock) > > +#endif > > BTF_SET_END(btf_id_deny) > > > > static bool can_be_sleepable(struct bpf_prog *prog) > [...]