Re: [PATCH] bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs

Siddharth Chintamaneni <sidchintamaneni@xxxxxxxxx> · Thu, 12 Dec 2024 22:39:01 -0500

On Thu, 12 Dec 2024 at 22:26, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Dec 12, 2024 at 4:41 PM Siddharth Chintamaneni
> <sidchintamaneni@xxxxxxxxx> wrote:
> >
> > On Thu, 12 Dec 2024 at 18:58, Priya Bala Govindasamy <pgovind2@xxxxxxx> wrote:
> > >
> > > BPF program types like kprobe and fentry can cause deadlocks in certain
> > > situations. If a function takes a lock and one of these bpf programs is
> > > hooked to some point in the function's critical section, and if the
> > > bpf program tries to call the same function and take the same lock it will
> > > lead to deadlock. These situations have been reported in the following
> > > bug reports.
> > >
> > > In percpu_freelist -
> > > Link: https://lore.kernel.org/bpf/CAADnVQLAHwsa+2C6j9+UC6ScrDaN9Fjqv1WjB1pP9AzJLhKuLQ@xxxxxxxxxxxxxx/T/
> > > Link: https://lore.kernel.org/bpf/CAPPBnEYm+9zduStsZaDnq93q1jPLqO-PiKX9jy0MuL8LCXmCrQ@xxxxxxxxxxxxxx/T/
> > > In bpf_lru_list -
> > > Link: https://lore.kernel.org/bpf/CAPPBnEajj+DMfiR_WRWU5=6A7KKULdB5Rob_NJopFLWF+i9gCA@xxxxxxxxxxxxxx/T/
> > > Link: https://lore.kernel.org/bpf/CAPPBnEZQDVN6VqnQXvVqGoB+ukOtHGZ9b9U0OLJJYvRoSsMY_g@xxxxxxxxxxxxxx/T/
> > > Link: https://lore.kernel.org/bpf/CAPPBnEaCB1rFAYU7Wf8UxqcqOWKmRPU1Nuzk3_oLk6qXR7LBOA@xxxxxxxxxxxxxx/T/
> > >
> > > Similar bugs have been reported by syzbot.
> > > In queue_stack_maps -
> > > Link: https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@xxxxxxxxxx/
> > > Link: https://lore.kernel.org/all/20240418230932.2689-1-hdanton@xxxxxxxx/T/
> > > In lpm_trie -
> > > Link: https://lore.kernel.org/linux-kernel/00000000000035168a061a47fa38@xxxxxxxxxx/T/
> > > In ringbuf -
> > > Link: https://lore.kernel.org/bpf/20240313121345.2292-1-hdanton@xxxxxxxx/T/
> > >
> > > Prevent kprobe and fentry bpf programs from attaching to these critical
> > > sections by removing CC_FLAGS_FTRACE for percpu_freelist.o,
> > > bpf_lru_list.o, queue_stack_maps.o, lpm_trie.o, ringbuf.o files.
> > >
> >
> > I think the current solution is to use a per-CPU variable to prevent
> > deadlocks. You can look at the hashmap implementation for reference.
> > However, ABBA deadlocks are still possible, so to avoid these, I think
> > the BPF community is working towards implementing resilient spinlocks.
>
> Right. The resilient spinlocks are wip, but in the meantime
> we need to stop the bleeding.
>

Ok I can resend the patches I was working on.
https://lore.kernel.org/all/202405041108.2Up5HT0H-lkp@xxxxxxxxx/T/

I remember that you shared the RFC patch set for resilient spinlocks
with me, but I didn't get a chance to check them at the time. Now that
I have more free time, I'd be happy to help you test that work if
you'd like.

> > I was planning to send patches for some of these bugs earlier. I'm
> > wondering if per-CPU checks would still be valid once resilient
> > spinlocks are introduced?
>
> The wip patches with res_spin_lock remove these per-cpu
> recursion counters from hash map and other places.