Re: [syzbot] [io-uring?] WARNING in io_sq_thread

Jens Axboe <axboe@xxxxxxxxx> · Sun, 25 Aug 2024 08:40:49 -0600

On 8/24/24 9:15 PM, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    bb1b0acdcd66 Add linux-next specific files for 20240820
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1363f893980000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=49406de25a441ccf
> dashboard link: https://syzkaller.appspot.com/bug?extid=82e078bac56cae572bce
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/ebc2ae824293/disk-bb1b0acd.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/5f62bd0c0e25/vmlinux-bb1b0acd.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/ddf6d0bc053d/bzImage-bb1b0acd.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+82e078bac56cae572bce@xxxxxxxxxxxxxxxxxxxxxxxxx
> 
> ------------[ cut here ]------------
> do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff816d32e6>] prepare_to_wait+0x186/0x210 kernel/sched/wait.c:237
> WARNING: CPU: 1 PID: 5335 at kernel/sched/core.c:8556 __might_sleep+0xb9/0xe0 kernel/sched/core.c:8552
> Modules linked in:
> CPU: 1 UID: 0 PID: 5335 Comm: iou-sqp-5333 Not tainted 6.11.0-rc4-next-20240820-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
> RIP: 0010:__might_sleep+0xb9/0xe0 kernel/sched/core.c:8552
> Code: 9d 0e 01 90 42 80 3c 23 00 74 08 48 89 ef e8 3e 9d 97 00 48 8b 4d 00 48 c7 c7 c0 60 0a 8c 44 89 ee 48 89 ca e8 b8 01 f1 ff 90 <0f> 0b 90 90 eb b5 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c 70 ff ff ff
> RSP: 0018:ffffc900041e7968 EFLAGS: 00010246
> RAX: 11f47f6d1cba3d00 RBX: 1ffff110040802ec RCX: ffff888020400000
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
> RBP: ffff888020401760 R08: ffffffff8155acc2 R09: fffffbfff1cfa354
> R10: dffffc0000000000 R11: fffffbfff1cfa354 R12: dffffc0000000000
> R13: 0000000000000001 R14: 0000000000000249 R15: ffffffff8c0ab880
> FS:  00007ffbe99d66c0(0000) GS:ffff8880b9100000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffed4fbfdec CR3: 0000000024c2c000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>  __mutex_lock+0xc1/0xd70 kernel/locking/mutex.c:752
>  io_sq_thread+0x1310/0x1c40 io_uring/sqpoll.c:367
>  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>  </TASK>

For this to hit, we'd need to come out of schedule() without having set
the task state back to TASK_RUNNING. That should not be possible, so
unsure what is going on there... But does not look like an io_uring
issue.

-- 
Jens Axboe