On 2/6/24 4:21 PM, Siddharth Chintamaneni wrote:
On Sun, 4 Feb 2024 at 14:09, Yonghong Song <yonghong.song@xxxxxxxxx> wrote:On 2/2/24 4:21 PM, Siddharth Chintamaneni wrote:On Tue, 30 Jan 2024 at 04:25, Jiri Olsa <olsajiri@xxxxxxxxx> wrote:On Wed, Jan 24, 2024 at 10:43:32AM -0500, Siddharth Chintamaneni wrote:While we were working on some experiments with BPF trampoline, we came across a deadlock scenario that could happen. A deadlock happens when two nested BPF programs tries to acquire the same lock i.e, If a BPF program is attached using fexit to bpf_spin_lock or using a fentry to bpf_spin_unlock, and it then attempts to acquire the same lock as the previous BPF program, a deadlock situation arises. Here is an example: SEC(fentry/bpf_spin_unlock) int fentry_2{ bpf_spin_lock(&x->lock); bpf_spin_unlock(&x->lock); } SEC(fentry/xxx) int fentry_1{ bpf_spin_lock(&x->lock); bpf_spin_unlock(&x->lock); }hi, looks like valid issue, could you add selftest for that?Hello, I have added selftest for the deadlock scenario.I wonder we could restrict just programs that use bpf_spin_lock/bpf_spin_unlock helpers? I'm not sure there's any useful use case for tracing spin lock helpers, but I think we should at least try this before we deny it completelyIf we restrict programs (attached to spinlock helpers) that use bpf_spin_lock/unlock helpers, there could be a scenario where a helper function called within the program has a BPF program attached that tries to acquire the same lock.To prevent these cases, a simple fix could be adding these helpers to denylist in the verifier. This fix will prevent the BPF programs from being loaded by the verifier. previously, a similar solution was proposed to prevent recursion. https://lore.kernel.org/lkml/20230417154737.12740-2-laoar.shao@xxxxxxxxx/the difference is that __rcu_read_lock/__rcu_read_unlock are called unconditionally (always) when executing bpf tracing probe, the problem you described above is only for programs calling spin lock helpers (on same spin lock)Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@xxxxxx> --- diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 65f598694d55..8f1834f27f81 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -20617,6 +20617,10 @@ BTF_ID(func, preempt_count_sub) BTF_ID(func, __rcu_read_lock) BTF_ID(func, __rcu_read_unlock) #endif +#if defined(CONFIG_DYNAMIC_FTRACE)why the CONFIG_DYNAMIC_FTRACE dependency?As we described in the self-tests, nesting of multiple BPF programs could only happen with fentry/fexit programs when DYNAMIC_FTRACE is enabled. In other scenarios, when DYNAMIC_FTRACE is disabled, a BPF program cannot be attached to any helper functions.jirka+BTF_ID(func, bpf_spin_lock) +BTF_ID(func, bpf_spin_unlock) +#endif BTF_SET_END(btf_id_deny)Currently, we already have 'notrace' marked to bpf_spin_lock and bpf_spin_unlock: notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock) { __bpf_spin_lock_irqsave(lock); return 0; } notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock) { __bpf_spin_unlock_irqrestore(lock); return 0; } But unfortunately, BPF_CALL_* macros put notrace to the static inline function ____bpf_spin_lock()/____bpf_spin_unlock(), and not to static function bpf_spin_lock()/bpf_spin_unlock(). I think the following is a better fix and reflects original intention:My bad, I somehow incorrectly tested the fix using the notrace macro and didn't realize that it is because of inlining. You are right, and I agree that the proposed solution fixes the unintended bug.
Thanks for confirmation, I will send a formal patch later.
diff --git a/include/linux/filter.h b/include/linux/filter.h index fee070b9826e..779f8ee71607 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -566,6 +566,25 @@ static inline bool insn_is_zext(const struct bpf_insn *insn) #define BPF_CALL_4(name, ...) BPF_CALL_x(4, name, __VA_ARGS__) #define BPF_CALL_5(name, ...) BPF_CALL_x(5, name, __VA_ARGS__) +#define NOTRACE_BPF_CALL_x(x, name, ...) \ + static __always_inline \ + u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \ + typedef u64 (*btf_##name)(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \ + notrace u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__)); \ + notrace u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__)) \ + { \ + return ((btf_##name)____##name)(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\ + } \ + static __always_inline \ + u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)) + +#define NOTRACE_BPF_CALL_0(name, ...) NOTRACE_BPF_CALL_x(0, name, __VA_ARGS__) +#define NOTRACE_BPF_CALL_1(name, ...) NOTRACE_BPF_CALL_x(1, name, __VA_ARGS__) +#define NOTRACE_BPF_CALL_2(name, ...) NOTRACE_BPF_CALL_x(2, name, __VA_ARGS__) +#define NOTRACE_BPF_CALL_3(name, ...) NOTRACE_BPF_CALL_x(3, name, __VA_ARGS__) +#define NOTRACE_BPF_CALL_4(name, ...) NOTRACE_BPF_CALL_x(4, name, __VA_ARGS__) +#define NOTRACE_BPF_CALL_5(name, ...) NOTRACE_BPF_CALL_x(5, name, __VA_ARGS__) + #define bpf_ctx_range(TYPE, MEMBER) \ offsetof(TYPE, MEMBER) ... offsetofend(TYPE, MEMBER) - 1 #define bpf_ctx_range_till(TYPE, MEMBER1, MEMBER2) \ diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 4db1c658254c..87136e27a99a 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -334,7 +334,7 @@ static inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock) __this_cpu_write(irqsave_flags, flags); } -notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock) +NOTRACE_BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock) { __bpf_spin_lock_irqsave(lock); return 0; @@ -357,7 +357,7 @@ static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock) local_irq_restore(flags); } -notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock) +NOTRACE_BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock) { __bpf_spin_unlock_irqrestore(lock); return 0; Macros NOTRACE_BPF_CALL_*() could be consolated with BPF_CALL_*() but I think a separate macro might be easier to understand.
[...]