Re: Possible deadlock in bpf queue map

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Thu, 7 Sep 2023 13:13:27 +0200

On Thu, 7 Sept 2023 at 12:26, Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>
> +Arnaldo
>
> > Hi,
> >
> > Our bpf fuzzer, a customized Syzkaller, triggered a lockdep warning in
> > the bpf queue map in v5.15. Since queue_stack_maps.c has no major changes
> > since v5.15, we think this should still exist in the latest kernel.
> > The bug can be occasionally triggered, and we suspect one of the
> > eBPF programs involved to be the following one. We also attached the lockdep
> > warning at the end.
> >
> > #define DEFINE_BPF_MAP_NO_KEY(the_map, TypeOfMap, MapFlags,
> > TypeOfValue, MaxEntries) \
> >         struct {                                                        \
> >             __uint(type, TypeOfMap);                                    \
> >             __uint(map_flags, (MapFlags));                              \
> >             __uint(max_entries, (MaxEntries));                          \
> >             __type(value, TypeOfValue);                                 \
> >         } the_map SEC(".maps");
> >
> > DEFINE_BPF_MAP_NO_KEY(map_0, BPF_MAP_TYPE_QUEUE, 0 | BPF_F_WRONLY,
> > struct_0, 162);
> > SEC("perf_event")
> > int func(struct bpf_perf_event_data *ctx) {
> >         char v0[96] = {};
> >         uint64_t v1 = 0;
> >         v1 = bpf_map_pop_elem(&map_0, v0);
> >         return 163819661;
> > }
> >
> >
> > The program is attached to the following perf event.
> >
> > struct perf_event_attr attr_type_hw = {
> >         .type = PERF_TYPE_HARDWARE,
> >         .config = PERF_COUNT_HW_CPU_CYCLES,
> >         .sample_freq = 50,
> >         .inherit = 1,
> >         .freq = 1,
> > };
> >
> > ================================WARNING: inconsistent lock state
> > 5.15.26+ #2 Not tainted
> > --------------------------------
> > inconsistent {INITIAL USE} -> {IN-NMI} usage.
> > syz-executor.5/19749 [HC1[1]:SC0[0]:HE0:SE1] takes:
> > ffff88804c9fc198 (&qs->lock){..-.}-{2:2}, at: __queue_map_get+0x31/0x250
> > {INITIAL USE} state was registered at:
> >   lock_acquire+0x1a3/0x4b0
> >   _raw_spin_lock_irqsave+0x48/0x60
> >   __queue_map_get+0x31/0x250
> >   bpf_prog_577904e86c81dead_func+0x12/0x4b4
> >   trace_call_bpf+0x262/0x5d0
> >   perf_trace_run_bpf_submit+0x91/0x1c0
> >   perf_trace_sched_switch+0x46c/0x700
> >   __schedule+0x11b5/0x24a0
> >   schedule+0xd4/0x270
> >   futex_wait_queue_me+0x25f/0x520
> >   futex_wait+0x1e0/0x5f0
> >   do_futex+0x395/0x1890
> >   __x64_sys_futex+0x1cb/0x480
> >   do_syscall_64+0x3b/0xc0
> >   entry_SYSCALL_64_after_hwframe+0x44/0xae
> > irq event stamp: 13640
> > hardirqs last  enabled at (13639): [<ffffffff95eb2bf4>]
> > _raw_spin_unlock_irq+0x24/0x40
> > hardirqs last disabled at (13640): [<ffffffff95eb2d4d>]
> > _raw_spin_lock_irqsave+0x5d/0x60
> > softirqs last  enabled at (13464): [<ffffffff93e26de5>] __sys_bpf+0x3e15/0x4e80
> > softirqs last disabled at (13462): [<ffffffff93e26da3>] __sys_bpf+0x3dd3/0x4e80
> >
> > other info that might help us debug this:
> >  Possible unsafe locking scenario:
> >
> >        CPU0
> >        ----
> >   lock(&qs->lock);
> >   <Interrupt>
> >     lock(&qs->lock);
>
> Hmm, so that lock() uses raw_spin_lock_irqsave(), which *should* be
> disabling interrupts entirely for the critical section. But I guess a
> Perf hardware event can still trigger? Which seems like it would
> potentially wreak havoc with lots of things, not just this queue map
> function?
>
> No idea how to protect against this, though. Hoping Arnaldo knows? :)
>

The locking should probably be protected by a percpu integer counter,
incremented and decremented before and after the lock is taken,
respectively. If it is already non-zero, then -EBUSY should be
returned. It is similar to what htab_lock_bucket protects against in
hashtab.c.