On Thu, Dec 12, 2024 at 4:41 PM Siddharth Chintamaneni <sidchintamaneni@xxxxxxxxx> wrote: > > On Thu, 12 Dec 2024 at 18:58, Priya Bala Govindasamy <pgovind2@xxxxxxx> wrote: > > > > BPF program types like kprobe and fentry can cause deadlocks in certain > > situations. If a function takes a lock and one of these bpf programs is > > hooked to some point in the function's critical section, and if the > > bpf program tries to call the same function and take the same lock it will > > lead to deadlock. These situations have been reported in the following > > bug reports. > > > > In percpu_freelist - > > Link: https://lore.kernel.org/bpf/CAADnVQLAHwsa+2C6j9+UC6ScrDaN9Fjqv1WjB1pP9AzJLhKuLQ@xxxxxxxxxxxxxx/T/ > > Link: https://lore.kernel.org/bpf/CAPPBnEYm+9zduStsZaDnq93q1jPLqO-PiKX9jy0MuL8LCXmCrQ@xxxxxxxxxxxxxx/T/ > > In bpf_lru_list - > > Link: https://lore.kernel.org/bpf/CAPPBnEajj+DMfiR_WRWU5=6A7KKULdB5Rob_NJopFLWF+i9gCA@xxxxxxxxxxxxxx/T/ > > Link: https://lore.kernel.org/bpf/CAPPBnEZQDVN6VqnQXvVqGoB+ukOtHGZ9b9U0OLJJYvRoSsMY_g@xxxxxxxxxxxxxx/T/ > > Link: https://lore.kernel.org/bpf/CAPPBnEaCB1rFAYU7Wf8UxqcqOWKmRPU1Nuzk3_oLk6qXR7LBOA@xxxxxxxxxxxxxx/T/ > > > > Similar bugs have been reported by syzbot. > > In queue_stack_maps - > > Link: https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@xxxxxxxxxx/ > > Link: https://lore.kernel.org/all/20240418230932.2689-1-hdanton@xxxxxxxx/T/ > > In lpm_trie - > > Link: https://lore.kernel.org/linux-kernel/00000000000035168a061a47fa38@xxxxxxxxxx/T/ > > In ringbuf - > > Link: https://lore.kernel.org/bpf/20240313121345.2292-1-hdanton@xxxxxxxx/T/ > > > > Prevent kprobe and fentry bpf programs from attaching to these critical > > sections by removing CC_FLAGS_FTRACE for percpu_freelist.o, > > bpf_lru_list.o, queue_stack_maps.o, lpm_trie.o, ringbuf.o files. > > > > I think the current solution is to use a per-CPU variable to prevent > deadlocks. You can look at the hashmap implementation for reference. > However, ABBA deadlocks are still possible, so to avoid these, I think > the BPF community is working towards implementing resilient spinlocks. Right. The resilient spinlocks are wip, but in the meantime we need to stop the bleeding. > I was planning to send patches for some of these bugs earlier. I'm > wondering if per-CPU checks would still be valid once resilient > spinlocks are introduced? The wip patches with res_spin_lock remove these per-cpu recursion counters from hash map and other places.