[RESEND] Re: [syzbot] [bpf?] WARNING: locking bug in trie_delete_elem

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Fri, 1 Nov 2024 17:44:08 +0800

Hi,

On 11/1/2024 3:32 AM, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    f9f24ca362a4 Add linux-next specific files for 20241031
> git tree:       linux-next
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=1387c6f7980000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=328572ed4d152be9
> dashboard link: https://syzkaller.appspot.com/bug?extid=b506de56cbbb63148c33
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1387655f980000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11ac5540580000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/eb84549dd6b3/disk-f9f24ca3.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/beb29bdfa297/vmlinux-f9f24ca3.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/8881fe3245ad/bzImage-f9f24ca3.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+b506de56cbbb63148c33@xxxxxxxxxxxxxxxxxxxxxxxxx
>
> =============================
> [ BUG: Invalid wait context ]
> 6.12.0-rc5-next-20241031-syzkaller #0 Not tainted
> -----------------------------
> swapper/0/0 is trying to lock:
> ffff8880261e7a00 (&trie->lock){....}-{3:3}, at: trie_delete_elem+0x96/0x6a0 kernel/bpf/lpm_trie.c:462

Sorry for the resend. The previous mail was rejected by the mail list
due to HTML content.

The warning is due to the lock for lpm_trie is a spinlock_t lock. It may
sleep under PREEMPT_RT kernel, but the bpf program has already taken a
raw_spinlock in queue_work() and the bpf program is also running inside
an interrupt handler, so lockdep warns about it. The lock should be
changed to raw_spinlock_t. Will fix it.

There have been multiple lpm trie related syzbot reports, includes:

(1) possible deadlock in get_page_from_freelist [1]
The deadlock is due to the locking of lock(&zone->lock) and
lock(&trie->lock). zone->lock comes from lpm_trie_node_alloc()

(2) possible deadlock in trie_delete_elem [2]
The deadlock is due to the recursive locking lock(&trie->lock). The
recursion comes from lpm_trie_node_alloc()

(3) possible deadlock in trie_update_elem [3]
(4) possible deadlock in stack_depot_save_flags [4]
(5) possible deadlock in get_partial_node [5]
(6) possible deadlock in deactivate_slab[6]
(7) possible deadlock in __put_partials [7]
(8) possible deadlock in debug_check_no_obj_freed [8]
issue (3)-(8) are similar with the first issue.

[1] https://syzkaller.appspot.com/bug?extid=a7f061d2d16154538c58
[2] https://syzkaller.appspot.com/bug?extid=9d95beb2a3c260622518
[3] https://syzkaller.appspot.com/bug?extid=ea624e536fee669a05cf
[4] https://syzkaller.appspot.com/bug?extid=c065d8dfbb5ad8cbdceb
[5] https://syzkaller.appspot.com/bug?extid=9045c0a3d5a7f1b119f7
[6] https://syzkaller.appspot.com/bug?extid=a4acbb99845d381e5e2f
[7] https://syzkaller.appspot.com/bug?extid=5a878c984150fad34185
[8] https://syzkaller.appspot.com/bug?extid=b12149f7ab5a8751740f

Using the bpf memory allocator for the allocation of both new node and
intermediate node will fix these reports. However, I was hesitant about
supporting the recursive lock prevention on the same CPU for lpm trie.
About fix months ago, Siddharth posted a patch set [9] to support the
recursive lock prevention for queue/stack map, so maybe I could continue
the work and also add the support for lpm trie in the same patch set.

[9]
https://lore.kernel.org/bpf/20240514124052.1240266-2-sidchintamaneni@xxxxxxxxx/
> other info that might help us debug this:
> context-{3:3}
> 5 locks held by swapper/0/0:
>  #0: ffff888020bb75c8 (&vp_dev->lock){-...}-{3:3}, at: vp_vring_interrupt drivers/virtio/virtio_pci_common.c:80 [inline]
>  #0: ffff888020bb75c8 (&vp_dev->lock){-...}-{3:3}, at: vp_interrupt+0x142/0x200 drivers/virtio/virtio_pci_common.c:113
>  #1: ffff88814174a120 (&vb->stop_update_lock){-...}-{3:3}, at: spin_lock include/linux/spinlock.h:351 [inline]
>  #1: ffff88814174a120 (&vb->stop_update_lock){-...}-{3:3}, at: stats_request+0x6f/0x230 drivers/virtio/virtio_balloon.c:438
>  #2: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
>  #2: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:849 [inline]
>  #2: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: __queue_work+0x199/0xf50 kernel/workqueue.c:2259
>  #3: ffff8880b863dd18 (&pool->lock){-.-.}-{2:2}, at: __queue_work+0x759/0xf50
>  #4: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
>  #4: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:849 [inline]
>  #4: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: __bpf_trace_run kernel/trace/bpf_trace.c:2339 [inline]
>  #4: ffffffff8e939f20 (rcu_read_lock){....}-{1:3}, at: bpf_trace_run1+0x1d6/0x520 kernel/trace/bpf_trace.c:2380
> stack backtrace:
> CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc5-next-20241031-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:94 [inline]
>  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
>  print_lock_invalid_wait_context kernel/locking/lockdep.c:4826 [inline]
>  check_wait_context kernel/locking/lockdep.c:4898 [inline]
>  __lock_acquire+0x15a8/0x2100 kernel/locking/lockdep.c:5176
>  lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
>  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>  _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
>  trie_delete_elem+0x96/0x6a0 kernel/bpf/lpm_trie.c:462
>  bpf_prog_2c29ac5cdc6b1842+0x43/0x47
>  bpf_dispatcher_nop_func include/linux/bpf.h:1290 [inline]
>  __bpf_prog_run include/linux/filter.h:701 [inline]
>  bpf_prog_run include/linux/filter.h:708 [inline]
>  __bpf_trace_run kernel/trace/bpf_trace.c:2340 [inline]
>  bpf_trace_run1+0x2ca/0x520 kernel/trace/bpf_trace.c:2380
>  trace_workqueue_activate_work+0x186/0x1f0 include/trace/events/workqueue.h:59
>  __queue_work+0xc7b/0xf50 kernel/workqueue.c:2338
>  queue_work_on+0x1c2/0x380 kernel/workqueue.c:2390
>  queue_work include/linux/workqueue.h:662 [inline]
>  stats_request+0x1a3/0x230 drivers/virtio/virtio_balloon.c:441
>  vring_interrupt+0x21d/0x380 drivers/virtio/virtio_ring.c:2595
>  vp_vring_interrupt drivers/virtio/virtio_pci_common.c:82 [inline]
>  vp_interrupt+0x192/0x200 drivers/virtio/virtio_pci_common.c:113
>  __handle_irq_event_percpu+0x29a/0xa80 kernel/irq/handle.c:158
>  handle_irq_event_percpu kernel/irq/handle.c:193 [inline]
>  handle_irq_event+0x89/0x1f0 kernel/irq/handle.c:210
>  handle_fasteoi_irq+0x48a/0xae0 kernel/irq/chip.c:720
>  generic_handle_irq_desc include/linux/irqdesc.h:173 [inline]
>  handle_irq arch/x86/kernel/irq.c:247 [inline]
>  call_irq_handler arch/x86/kernel/irq.c:259 [inline]
>  __common_interrupt+0x136/0x230 arch/x86/kernel/irq.c:285
>  common_interrupt+0xb4/0xd0 arch/x86/kernel/irq.c:278
>  </IRQ>
>  <TASK>
>  asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693
> RIP: 0010:finish_task_switch+0x1ea/0x870 kernel/sched/core.c:5201
> Code: c9 50 e8 29 05 0c 00 48 83 c4 08 4c 89 f7 e8 4d 39 00 00 0f 1f 44 00 00 4c 89 f7 e8 a0 45 69 0a e8 4b 9e 38 00 fb 48 8b 5d c0 <48> 8d bb f8 15 00 00 48 89 f8 48 c1 e8 03 49 be 00 00 00 00 00 fc
> RSP: 0018:ffffffff8e607ae8 EFLAGS: 00000282
> RAX: 467bb178e56b5700 RBX: ffffffff8e6945c0 RCX: ffffffff9a3d4903
> RDX: dffffc0000000000 RSI: ffffffff8c0ad3a0 RDI: ffffffff8c604dc0
> RBP: ffffffff8e607b30 R08: ffffffff901d03b7 R09: 1ffffffff203a076
> R10: dffffc0000000000 R11: fffffbfff203a077 R12: 1ffff110170c7e74
> R13: dffffc0000000000 R14: ffff8880b863e580 R15: ffff8880b863f3a0
>  context_switch kernel/sched/core.c:5330 [inline]
>  __schedule+0x1857/0x4c30 kernel/sched/core.c:6707
>  schedule_idle+0x56/0x90 kernel/sched/core.c:6825
>  do_idle+0x567/0x5c0 kernel/sched/idle.c:353
>  cpu_startup_entry+0x42/0x60 kernel/sched/idle.c:423
>  rest_init+0x2dc/0x300 init/main.c:747
>  start_kernel+0x47f/0x500 init/main.c:1102
>  x86_64_start_reservations+0x2a/0x30 arch/x86/kernel/head64.c:507
>  x86_64_start_kernel+0x9f/0xa0 arch/x86/kernel/head64.c:488
>  common_startup_64+0x13e/0x147
>  </TASK>
> ----------------
> Code disassembly (best guess):
>    0:	c9                   	leave
>    1:	50                   	push   %rax
>    2:	e8 29 05 0c 00       	call   0xc0530
>    7:	48 83 c4 08          	add    $0x8,%rsp
>    b:	4c 89 f7             	mov    %r14,%rdi
>    e:	e8 4d 39 00 00       	call   0x3960
>   13:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>   18:	4c 89 f7             	mov    %r14,%rdi
>   1b:	e8 a0 45 69 0a       	call   0xa6945c0
>   20:	e8 4b 9e 38 00       	call   0x389e70
>   25:	fb                   	sti
>   26:	48 8b 5d c0          	mov    -0x40(%rbp),%rbx
> * 2a:	48 8d bb f8 15 00 00 	lea    0x15f8(%rbx),%rdi <-- trapping instruction
>   31:	48 89 f8             	mov    %rdi,%rax
>   34:	48 c1 e8 03          	shr    $0x3,%rax
>   38:	49                   	rex.WB
>   39:	be 00 00 00 00       	mov    $0x0,%esi
>   3e:	00 fc                	add    %bh,%ah
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup
>
> .