On 10/22/21 05:38, syzbot wrote:
Hello, syzbot has tested the proposed patch but the reproducer is still triggering an issue: INFO: task hung in io_wqe_worker INFO: task iou-wrk-9392:9401 blocked for more than 143 seconds. Not tainted 5.15.0-rc2-syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:iou-wrk-9392 state:D stack:27952 pid: 9401 ppid: 7038 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0xb44/0x5960 kernel/sched/core.c:6287 schedule+0xd3/0x270 kernel/sched/core.c:6366 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x176/0x280 kernel/sched/completion.c:138 io_worker_exit fs/io-wq.c:183 [inline] io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Easily reproducible, it's stuck in static void io_worker_exit(struct io_worker *worker) { ... wait_for_completion(&worker->ref_done); ... } The reference belongs to a create_worker_cb() task_work item. It's expected to either be executed or cancelled by io_wq_exit_workers(), but the owner task never goes __io_uring_cancel (called in do_exit()) and so never reaches io_wq_exit_workers(). Following the owner task, cat /proc/<pid>/stack: [<0>] do_coredump+0x1d0/0x10e0 [<0>] get_signal+0x4a3/0x960 [<0>] arch_do_signal_or_restart+0xc3/0x6d0 [<0>] exit_to_user_mode_prepare+0x10e/0x190 [<0>] irqentry_exit_to_user_mode+0x9/0x20 [<0>] irqentry_exit+0x36/0x40 [<0>] exc_page_fault+0x95/0x190 [<0>] asm_exc_page_fault+0x1e/0x30 (gdb) l *(do_coredump+0x1d0-5) 0xffffffff81343ccb is in do_coredump (fs/coredump.c:469). 464 465 if (core_waiters > 0) { 466 struct core_thread *ptr; 467 468 freezer_do_not_count(); 469 wait_for_completion(&core_state->startup); 470 freezer_count(); Can't say anything more at the moment as not familiar with coredump -- Pavel Begunkov