On Wed, Jun 19, 2024 at 10:16:20PM +0800, Zhu Yanjun wrote: > 在 2024/6/19 17:15, Leon Romanovsky 写道: > > On Tue, Jun 18, 2024 at 11:37:18PM -0700, syzbot wrote: > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: 2ccbdf43d5e7 Merge tag 'for-linus' of git://git.kernel.org.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=179e93fe980000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=fa0ce06dcc735711 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=19ec7595e3aa1a45f623 > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > Downloadable assets: > > > disk image: https://storage.googleapis.com/syzbot-assets/27e64d7472ce/disk-2ccbdf43.raw.xz > > > vmlinux: https://storage.googleapis.com/syzbot-assets/e1c494bb5c9c/vmlinux-2ccbdf43.xz > > > kernel image: https://storage.googleapis.com/syzbot-assets/752498985a5e/bzImage-2ccbdf43.xz > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+19ec7595e3aa1a45f623@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > smc: removing ib device syz0 > > > ------------[ cut here ]------------ > > > WARNING: CPU: 0 PID: 51 at kernel/rcu/srcutree.c:653 cleanup_srcu_struct+0x404/0x4d0 kernel/rcu/srcutree.c:653 > > > Modules linked in: > > > CPU: 0 PID: 51 Comm: kworker/u8:3 Not tainted 6.10.0-rc3-syzkaller-00044-g2ccbdf43d5e7 #0 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 > > > Workqueue: ib-unreg-wq ib_unregister_work > > > RIP: 0010:cleanup_srcu_struct+0x404/0x4d0 kernel/rcu/srcutree.c:653 > > > Code: 12 80 00 48 c7 03 00 00 00 00 48 83 c4 48 5b 41 5c 41 5d 41 5e 41 5f 5d e9 14 67 34 0a 90 0f 0b 90 eb e7 90 0f 0b 90 eb e1 90 <0f> 0b 90 eb db 90 0f 0b 90 eb 0a 90 0f 0b 90 eb 04 90 0f 0b 90 48 > > > RSP: 0018:ffffc90000bb7970 EFLAGS: 00010202 > > > RAX: 0000000000000001 RBX: ffff88802a1bc980 RCX: 0000000000000002 > > > RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffe8ffffd74c58 > > > RBP: 0000000000000001 R08: ffffe8ffffd74c5f R09: 1ffffd1ffffae98b > > > R10: dffffc0000000000 R11: fffff91ffffae98c R12: dffffc0000000000 > > > R13: ffff88802285b5f0 R14: ffff88802285b000 R15: ffff88802a1bc800 > > > FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 00007fa3852cae10 CR3: 000000000e132000 CR4: 0000000000350ef0 > > > Call Trace: > > > <TASK> > > > ib_uverbs_release_dev+0x4e/0x80 drivers/infiniband/core/uverbs_main.c:136 > > > device_release+0x9b/0x1c0 > > > kobject_cleanup lib/kobject.c:689 [inline] > > > kobject_release lib/kobject.c:720 [inline] > > > kref_put include/linux/kref.h:65 [inline] > > > kobject_put+0x231/0x480 lib/kobject.c:737 > > > remove_client_context+0xb9/0x1e0 drivers/infiniband/core/device.c:776 > > > disable_device+0x13b/0x360 drivers/infiniband/core/device.c:1282 > > > __ib_unregister_device+0x6d/0x170 drivers/infiniband/core/device.c:1475 > > > ib_unregister_work+0x19/0x30 drivers/infiniband/core/device.c:1586 > > > process_one_work kernel/workqueue.c:3231 [inline] > > > process_scheduled_works+0xa2e/0x1830 kernel/workqueue.c:3312 > > > worker_thread+0x86d/0xd70 kernel/workqueue.c:3393 > > > kthread+0x2f2/0x390 kernel/kthread.c:389 > > > ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 > > > </TASK> > > > > I see that this is caused by call to ib_unregister_device_queued() as a > > response to NETDEV_UNREGISTER event, but we don't flush anything before. > > How can we be sure that ib_device is not used anymore? > > Hi, Leon > > This is the console output: > > https://syzkaller.appspot.com/x/log.txt?x=179e93fe980000 > > From the above link, it seems that other devices or subsystems failed > firstly, then caused this call trace to appear. When other problem occurred, > the whole kernel system was in mess state.So it is not weird that some > problems occurred. Which devices/subsystems failed? I grepped the log and don't see anything suspicious, before first "------------[ cut here ]------------" sentence. > > To be simple, the root cause is not in RDMA subsystem. > > I will continue to delve into this problem. > > Zhu Yanjun > > > > Thanks >