On Tue, Mar 23, 2021 at 01:26:36PM +0100, Toke Høiland-Jørgensen wrote: > Hi Paul > > Magnus and I have been debugging an issue where close() on a bpf_link > file descriptor would hang indefinitely when the system was under load > on a kernel compiled with CONFIG_PREEMPT=y, and it seems to be related > to synchronize_rcu_tasks(), so I'm hoping you can help us with it. > > The issue is triggered reliably by loading up a system with network > traffic (causing 100% softirq CPU load on one or more cores), and then > attaching an freplace bpf_link and closing it again. The close() will > hang until the network traffic load is lowered. > > Digging further, it appears that the hang happens in > synchronize_rcu_tasks(), as seen by running a bpftrace script like: > > bpftrace -e 'kprobe:synchronize_rcu_tasks { @start = nsecs; printf("enter\n"); } kretprobe:synchronize_rcu_tasks { printf("exit after %d ms\n", (nsecs - @start) / 1000000); }' > Attaching 2 probes... > enter > exit after 54 ms > enter > exit after 3249 ms > > (the two enter/exit pairs are, respectively, from an unloaded system, > and from a loaded system where I stopped the network traffic after a > couple of seconds). > > The call to synchronize_rcu_tasks() happens in bpf_trampoline_put(): > > https://elixir.bootlin.com/linux/latest/source/kernel/bpf/trampoline.c#L376 > > And because it does this while holding trampoline_mutex, even deferring > the put to a worker (as a previously applied-then-reverted patch did[0]) > doesn't help: that'll fix the initial hang on close(), but any > subsequent use of BPF trampolines will then be blocked because of the > mutex. > > Also, if I just keep the network traffic running I will eventually get a > kernel panic with: > > kernel:[44348.426312] Kernel panic - not syncing: hung_task: blocked tasks > > I've created a reproducer for the issue here: > https://github.com/xdp-project/bpf-examples/tree/master/bpf-link-hang > > To compile simply do this (needs a recent llvm/clang for compiling the BPF program): > > $ git clone --recurse-submodules https://github.com/xdp-project/bpf-examples > $ cd bpf-examples/bpf-link-hang > $ make > $ ./sudo bpf-link-hang > > you'll need to load up the system to trigger the hang; I'm using pktgen > from a separate machine to do this. > > My question is, of course, as ever, What Is To Be Done? Is it expected > that synchronize_rcu_tasks() can hang indefinitely on a PREEMPT system, > or can this be fixed? And if it is expected, how can the BPF code be > fixed so it doesn't deadlock because of this? > > Hoping you can help us with this - many thanks in advance! :) Let me start with the usual question... Is the network traffic intense enough that one of the CPUs might remain in a loop handling softirqs indefinitely? If so, does the (untested, probably does not build) patch below help? Please note that this is only a diagnostic patch. It has the serious side effect of making __do_softirq() and anything that calls it implicitly noinstr. But it might at least be a decent starting point for a real fix. Or might be part of the real fix, who knows? Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 0b06be5..e21e7b0 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -242,6 +242,7 @@ void rcu_softirq_qs(void) { rcu_qs(); rcu_preempt_deferred_qs(current); + rcu_tasks_qs(current, true); } /*