[Syzkaller & bisect] There is deadlock in __bpf_ringbuf_reserve in v6.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Namhyung Kim and bpf expert,

Greetings!

There is deadlock in __bpf_ringbuf_reserve in v6.10

Found the first bad commit:
ee042be16cb4 locking: Apply contention tracepoints in the slow path

All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/240717_170536___bpf_ringbuf_reserve
Syzkaller repro code: https://github.com/xupengfe/syzkaller_logs/blob/main/240717_170536___bpf_ringbuf_reserve/repro.c
Syzkaller repro syscall: https://github.com/xupengfe/syzkaller_logs/blob/main/240717_170536___bpf_ringbuf_reserve/repro.prog
Syzkaller report: https://github.com/xupengfe/syzkaller_logs/blob/main/240717_170536___bpf_ringbuf_reserve/repro.report
Kconfig(make olddefconfig): https://github.com/xupengfe/syzkaller_logs/blob/main/240717_170536___bpf_ringbuf_reserve/kconfig_origin
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/240717_170536___bpf_ringbuf_reserve/bisect_info.log
v6.10 bzImage: https://github.com/xupengfe/syzkaller_logs/raw/main/240717_170536___bpf_ringbuf_reserve/bzImage_0c3836482481200ead7b416ca80c68a29cfdaabd.tar.gz
Issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/240717_170536___bpf_ringbuf_reserve/0c3836482481200ead7b416ca80c68a29cfdaabd_dmesg.log

"
[   25.063013] 
[   25.063211] ============================================
[   25.063694] WARNING: possible recursive locking detected
[   25.064165] 6.10.0-0c3836482481 #1 Tainted: G        W         
[   25.064787] --------------------------------------------
[   25.065264] repro/745 is trying to acquire lock:
[   25.065693] ffffc90004f1a0d8 (&rb->spinlock){-.-.}-{2:2}, at: __bpf_ringbuf_reserve+0x386/0x460
[   25.066517] 
[   25.066517] but task is already holding lock:
[   25.067054] ffffc900018360d8 (&rb->spinlock){-.-.}-{2:2}, at: __bpf_ringbuf_reserve+0x386/0x460
[   25.067878] 
[   25.067878] other info that might help us debug this:
[   25.068504]  Possible unsafe locking scenario:
[   25.068504] 
[   25.069061]        CPU0
[   25.069301]        ----
[   25.069540]   lock(&rb->spinlock);
[   25.069879]   lock(&rb->spinlock);
[   25.070208] 
[   25.070208]  *** DEADLOCK ***
[   25.070208] 
[   25.070741]  May be due to missing lock nesting notation
[   25.070741] 
[   25.071362] 4 locks held by repro/745:
[   25.071731]  #0: ffffffff86fff388 (pcpu_alloc_mutex){+.+.}-{3:3}, at: pcpu_alloc_noprof+0xa07/0x1120
[   25.072674]  #1: ffffffff86e58de0 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run2+0x1b7/0x5a0
[   25.073493]  #2: ffffc900018360d8 (&rb->spinlock){-.-.}-{2:2}, at: __bpf_ringbuf_reserve+0x386/0x460
[   25.074359]  #3: ffffffff86e58de0 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run2+0x1b7/0x5a0
[   25.075180] 
[   25.075180] stack backtrace:
[   25.075587] CPU: 0 PID: 745 Comm: repro Tainted: G        W          6.10.0-0c3836482481 #1
[   25.076373] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   25.077661] Call Trace:
[   25.078033]  <TASK>
[   25.078253]  dump_stack_lvl+0xea/0x150
[   25.078650]  dump_stack+0x19/0x20
[   25.079003]  print_deadlock_bug+0x3c0/0x680
[   25.079417]  __lock_acquire+0x2b2a/0x5ca0
[   25.079829]  ? __pfx___lock_acquire+0x10/0x10
[   25.080270]  ? __kasan_check_read+0x15/0x20
[   25.080693]  ? __lock_acquire+0xccf/0x5ca0
[   25.081101]  lock_acquire+0x1ce/0x580
[   25.081472]  ? __bpf_ringbuf_reserve+0x386/0x460
[   25.081926]  ? __pfx_lock_acquire+0x10/0x10
[   25.082343]  ? __kasan_check_read+0x15/0x20
[   25.082770]  _raw_spin_lock_irqsave+0x52/0x80
[   25.083202]  ? __bpf_ringbuf_reserve+0x386/0x460
[   25.083920]  __bpf_ringbuf_reserve+0x386/0x460
[   25.084487]  bpf_ringbuf_reserve+0x63/0xa0
[   25.084904]  bpf_prog_9efe54833449f08e+0x2d/0x47
[   25.085383]  bpf_trace_run2+0x238/0x5a0
[   25.085784]  ? __pfx_bpf_trace_run2+0x10/0x10
[   25.086237]  ? __pfx___bpf_trace_contention_end+0x10/0x10
[   25.086779]  __bpf_trace_contention_end+0xf/0x20
[   25.087230]  __traceiter_contention_end+0x66/0xb0
[   25.087697]  trace_contention_end.constprop.0+0xdc/0x140
[   25.088207]  __pv_queued_spin_lock_slowpath+0x2a1/0xc80
[   25.088751]  ? __pfx___pv_queued_spin_lock_slowpath+0x10/0x10
[   25.089369]  ? __this_cpu_preempt_check+0x21/0x30
[   25.089833]  ? lock_acquire+0x1de/0x580
[   25.090222]  do_raw_spin_lock+0x1fb/0x280
[   25.090622]  ? __pfx_do_raw_spin_lock+0x10/0x10
[   25.091056]  ? debug_smp_processor_id+0x20/0x30
[   25.091506]  ? rcu_is_watching+0x19/0xc0
[   25.091900]  _raw_spin_lock_irqsave+0x5a/0x80
[   25.092337]  ? __bpf_ringbuf_reserve+0x386/0x460
[   25.092791]  __bpf_ringbuf_reserve+0x386/0x460
[   25.093269]  bpf_ringbuf_reserve+0x63/0xa0
[   25.093694]  bpf_prog_9efe54833449f08e+0x2d/0x47
[   25.094138]  bpf_trace_run2+0x238/0x5a0
[   25.094525]  ? __pfx_bpf_trace_run2+0x10/0x10
[   25.094963]  ? lock_acquire+0x1de/0x580
[   25.095344]  ? __pfx_lock_acquire+0x10/0x10
[   25.095766]  ? __pfx___bpf_trace_contention_end+0x10/0x10
[   25.096296]  __bpf_trace_contention_end+0xf/0x20
[   25.096755]  __traceiter_contention_end+0x66/0xb0
[   25.097245]  trace_contention_end+0xc5/0x120
[   25.097699]  __mutex_lock+0x257/0x1660
[   25.098077]  ? pcpu_alloc_noprof+0xa07/0x1120
[   25.098518]  ? __pfx___lock_acquire+0x10/0x10
[   25.098951]  ? _find_first_bit+0x95/0xc0
[   25.099340]  ? __pfx___mutex_lock+0x10/0x10
[   25.099760]  ? __this_cpu_preempt_check+0x21/0x30
[   25.100223]  ? lock_release+0x418/0x840
[   25.100638]  mutex_lock_killable_nested+0x1f/0x30
[   25.101109]  ? mutex_lock_killable_nested+0x1f/0x30
[   25.101611]  pcpu_alloc_noprof+0xa07/0x1120
[   25.102034]  ? lockdep_init_map_type+0x2df/0x810
[   25.102488]  ? __raw_spin_lock_init+0x44/0x120
[   25.102931]  ? __kasan_check_write+0x18/0x20
[   25.103352]  mm_init+0x8da/0xec0
[   25.103692]  copy_mm+0x3cf/0x2550
[   25.104040]  ? __pfx_copy_mm+0x10/0x10
[   25.104431]  ? lockdep_init_map_type+0x2df/0x810
[   25.104901]  ? __raw_spin_lock_init+0x44/0x120
[   25.105371]  copy_process+0x361c/0x6a60
[   25.105776]  ? __pfx_copy_process+0x10/0x10
[   25.106194]  ? __kasan_check_read+0x15/0x20
[   25.106607]  ? __lock_acquire+0x1a02/0x5ca0
[   25.107033]  kernel_clone+0xfd/0x8d0
[   25.107396]  ? __pfx_kernel_wait4+0x10/0x10
[   25.107811]  ? __pfx_kernel_clone+0x10/0x10
[   25.108214]  ? __this_cpu_preempt_check+0x21/0x30
[   25.108736]  ? lock_release+0x418/0x840
[   25.109144]  __do_sys_clone+0xe1/0x120
[   25.109529]  ? __pfx___do_sys_clone+0x10/0x10
[   25.109999]  __x64_sys_clone+0xc7/0x150
[   25.110375]  ? syscall_trace_enter+0x14a/0x230
[   25.110815]  x64_sys_call+0x1e76/0x20d0
[   25.111188]  do_syscall_64+0x6d/0x140
[   25.111559]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   25.112045] RIP: 0033:0x7f6219f189d7
[   25.112415] Code: 00 00 00 f3 0f 1e fa 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 39 41 89 c0 85 c0 75 2a 64 48 8b 04 25 10 00
[   25.114082] RSP: 002b:00007fff149665d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[   25.115078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f6219f189d7
[   25.115792] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[   25.116487] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000007194fa985
[   25.117136] R10: 00007f621a028a10 R11: 0000000000000246 R12: 0000000000000000
[   25.117793] R13: 0000000000401e31 R14: 0000000000403e08 R15: 00007f621a073000
[   25.118449]  </TASK>
"

Thank you!

---

If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
  // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  // You could change the bzImage_xxx as you want
  // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install

Best Regards,
Thanks!




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux