Re: bpf_ringbuf_reserve deadlock on rt kernels

Dmitry Dolgov <9erthalion6@xxxxxxxxx> · Thu, 13 Jun 2024 12:23:46 +0200

> On Wed, Jun 12, 2024 at 04:32:23PM GMT, Sebastian Andrzej Siewior wrote:
>
> > The BPF program in question is attached to sched_switch. The issue seems
> > to be similar to a couple of syzkaller reports [1], [2], although the
> > latter one is about nested progs, which seems to be not the case here.
> > Talking about nested progs, applying a similar approach as in [3]
> > reworked for bpf_ringbuf, elliminates the issue.
> >
> > Do I miss anything, is it a known issue? Any ideas how to address that?
>
> I haven't attached bpf program to trace-events so this new to me. But if
> you BPF attach programs to trace-events then there might be more things
> that can go wrong…

Things related to RT kernels, or something else?

> Let me add this to the bpf-list-to-look-at.
> Do you get more splats with CONFIG_DEBUG_ATOMIC_SLEEP=y?

Thanks. Adding CONFIG_DEBUG_ATOMIC_SLEEP gives me this:

    BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
    in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 154, name: script
    preempt_count: 3, expected: 0
    RCU nest depth: 1, expected: 1
    4 locks held by script/154:
     #0: ffff8881049798a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x28/0x60
     #1: ffff88813bdb2558 (&rq->__lock){-...}-{2:2}, at: __schedule+0xc4/0xca0
     #2: ffffffff83590540 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run4+0x6c/0x1e0
     #3: ffffc90007b61158 (&rb->spinlock){....}-{2:2}, at: __bpf_ringbuf_reserve+0x5a/0xf0
    irq event stamp: 129370
    hardirqs last  enabled at (129369): [<ffffffff82216818>] _raw_spin_unlock_irq+0x28/0x50
    hardirqs last disabled at (129370): [<ffffffff822084a9>] __schedule+0x5d9/0xca0
    softirqs last  enabled at (0): [<ffffffff81110ecb>] copy_process+0xc3b/0x2fd0
    softirqs last disabled at (0): [<0000000000000000>] 0x0