On 12/16/20 2:14 PM, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: 7b1b868e Merge tag 'for-linus' of git://git.kernel.org/pub.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1156046b500000 > kernel config: https://syzkaller.appspot.com/x/.config?x=3416bb960d5c705d > dashboard link: https://syzkaller.appspot.com/bug?extid=c9937dfb2303a5f18640 > compiler: gcc (GCC) 10.1.0-syz 20200507 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1407c287500000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10ed5f07500000 > > The issue was bisected to: > > commit 4d004099a668c41522242aa146a38cc4eb59cb1e > Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Date: Fri Oct 2 09:04:21 2020 +0000 > > lockdep: Fix lockdep recursion Ehhh no... This is timing dependent, so probably why it ends up pinpointing something totally unrelated. > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=14e9d433500000 > final oops: https://syzkaller.appspot.com/x/report.txt?x=16e9d433500000 > console output: https://syzkaller.appspot.com/x/log.txt?x=12e9d433500000 > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+c9937dfb2303a5f18640@xxxxxxxxxxxxxxxxxxxxxxxxx > Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion") > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441309 > RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1 > RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8 > R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0 > R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000 > ------------[ cut here ]------------ > percpu_ref_kill_and_confirm called more than once on io_ring_ctx_ref_free! > WARNING: CPU: 0 PID: 8476 at lib/percpu-refcount.c:382 percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382 > Modules linked in: > CPU: 0 PID: 8476 Comm: syz-executor389 Not tainted 5.10.0-rc7-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > RIP: 0010:percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382 > Code: 5d 08 48 8d 7b 08 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5d 48 8b 53 08 48 c7 c6 00 4b 9d 89 48 c7 c7 60 4a 9d 89 e8 c6 97 f6 04 <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 80 3c 02 > RSP: 0018:ffffc9000b94fe10 EFLAGS: 00010086 > RAX: 0000000000000000 RBX: ffff888011da4580 RCX: 0000000000000000 > RDX: ffff88801fe84ec0 RSI: ffffffff8158c835 RDI: fffff52001729fb4 > RBP: ffff88801539f000 R08: 0000000000000001 R09: ffff8880b9e2011b > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000293 > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88802de28758 > FS: 00000000014ab880(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f2a7046b000 CR3: 0000000023368000 CR4: 0000000000350ef0 > Call Trace: > percpu_ref_kill include/linux/percpu-refcount.h:149 [inline] > io_ring_ctx_wait_and_kill+0x2b/0x450 fs/io_uring.c:8382 > io_uring_release+0x3e/0x50 fs/io_uring.c:8420 > __fput+0x285/0x920 fs/file_table.c:281 > task_work_run+0xdd/0x190 kernel/task_work.c:151 > tracehook_notify_resume include/linux/tracehook.h:188 [inline] > exit_to_user_mode_loop kernel/entry/common.c:164 [inline] > exit_to_user_mode_prepare+0x17e/0x1a0 kernel/entry/common.c:191 > syscall_exit_to_user_mode+0x38/0x260 kernel/entry/common.c:266 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x441309 > Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:00007ffed6545d38 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 > RAX: fffffffffffffff4 RBX: 0000000000000000 RCX: 0000000000441309 > RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1 > RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8 > R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0 > R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000 What I think happens here is that we switch mode, but exit before it's done. Wonder if there's a wait to wait for that on exit. Tejun? -- Jens Axboe