在 2024/6/20 1:48, Leon Romanovsky 写道:
On Wed, Jun 19, 2024 at 10:16:20PM +0800, Zhu Yanjun wrote:
在 2024/6/19 17:15, Leon Romanovsky 写道:
On Tue, Jun 18, 2024 at 11:37:18PM -0700, syzbot wrote:
Hello,
syzbot found the following issue on:
HEAD commit: 2ccbdf43d5e7 Merge tag 'for-linus' of git://git.kernel.org..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=179e93fe980000
kernel config: https://syzkaller.appspot.com/x/.config?x=fa0ce06dcc735711
dashboard link: https://syzkaller.appspot.com/bug?extid=19ec7595e3aa1a45f623
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/27e64d7472ce/disk-2ccbdf43.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/e1c494bb5c9c/vmlinux-2ccbdf43.xz
kernel image: https://storage.googleapis.com/syzbot-assets/752498985a5e/bzImage-2ccbdf43.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+19ec7595e3aa1a45f623@xxxxxxxxxxxxxxxxxxxxxxxxx
smc: removing ib device syz0
------------[ cut here ]------------
WARNING: CPU: 0 PID: 51 at kernel/rcu/srcutree.c:653 cleanup_srcu_struct+0x404/0x4d0 kernel/rcu/srcutree.c:653
Modules linked in:
CPU: 0 PID: 51 Comm: kworker/u8:3 Not tainted 6.10.0-rc3-syzkaller-00044-g2ccbdf43d5e7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
Workqueue: ib-unreg-wq ib_unregister_work
RIP: 0010:cleanup_srcu_struct+0x404/0x4d0 kernel/rcu/srcutree.c:653
Code: 12 80 00 48 c7 03 00 00 00 00 48 83 c4 48 5b 41 5c 41 5d 41 5e 41 5f 5d e9 14 67 34 0a 90 0f 0b 90 eb e7 90 0f 0b 90 eb e1 90 <0f> 0b 90 eb db 90 0f 0b 90 eb 0a 90 0f 0b 90 eb 04 90 0f 0b 90 48
RSP: 0018:ffffc90000bb7970 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88802a1bc980 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffe8ffffd74c58
RBP: 0000000000000001 R08: ffffe8ffffd74c5f R09: 1ffffd1ffffae98b
R10: dffffc0000000000 R11: fffff91ffffae98c R12: dffffc0000000000
R13: ffff88802285b5f0 R14: ffff88802285b000 R15: ffff88802a1bc800
FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa3852cae10 CR3: 000000000e132000 CR4: 0000000000350ef0
Call Trace:
<TASK>
ib_uverbs_release_dev+0x4e/0x80 drivers/infiniband/core/uverbs_main.c:136
device_release+0x9b/0x1c0
kobject_cleanup lib/kobject.c:689 [inline]
kobject_release lib/kobject.c:720 [inline]
kref_put include/linux/kref.h:65 [inline]
kobject_put+0x231/0x480 lib/kobject.c:737
remove_client_context+0xb9/0x1e0 drivers/infiniband/core/device.c:776
disable_device+0x13b/0x360 drivers/infiniband/core/device.c:1282
__ib_unregister_device+0x6d/0x170 drivers/infiniband/core/device.c:1475
ib_unregister_work+0x19/0x30 drivers/infiniband/core/device.c:1586
process_one_work kernel/workqueue.c:3231 [inline]
process_scheduled_works+0xa2e/0x1830 kernel/workqueue.c:3312
worker_thread+0x86d/0xd70 kernel/workqueue.c:3393
kthread+0x2f2/0x390 kernel/kthread.c:389
ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
I see that this is caused by call to ib_unregister_device_queued() as a
response to NETDEV_UNREGISTER event, but we don't flush anything before.
How can we be sure that ib_device is not used anymore?
Hi, Leon
This is the console output:
https://syzkaller.appspot.com/x/log.txt?x=179e93fe980000
From the above link, it seems that other devices or subsystems failed
firstly, then caused this call trace to appear. When other problem occurred,
the whole kernel system was in mess state.So it is not weird that some
problems occurred.
Which devices/subsystems failed? I grepped the log and don't see
anything suspicious, before first "------------[ cut here ]------------"
sentence.
Need the script to check this problem. It is an interesting problem.
Zhu Yanjun
To be simple, the root cause is not in RDMA subsystem.
I will continue to delve into this problem.
Zhu Yanjun
Thanks