On 4/2/21 10:52 AM, Pavel Begunkov wrote: > [ 491.222908] INFO: task thread-exit:2490 blocked for more than 122 seconds. > [ 491.222957] Call Trace: > [ 491.222967] __schedule+0x36b/0x950 > [ 491.222985] schedule+0x68/0xe0 > [ 491.222994] schedule_timeout+0x209/0x2a0 > [ 491.223003] ? tlb_flush_mmu+0x28/0x140 > [ 491.223013] wait_for_completion+0x8b/0xf0 > [ 491.223023] io_wq_destroy_manager+0x24/0x60 > [ 491.223037] io_wq_put_and_exit+0x18/0x30 > [ 491.223045] io_uring_clean_tctx+0x76/0xa0 > [ 491.223061] __io_uring_files_cancel+0x1b9/0x2e0 > [ 491.223068] ? blk_finish_plug+0x26/0x40 > [ 491.223085] do_exit+0xc0/0xb40 > [ 491.223099] ? syscall_trace_enter.isra.0+0x1a1/0x1e0 > [ 491.223109] __x64_sys_exit+0x1b/0x20 > [ 491.223117] do_syscall_64+0x38/0x50 > [ 491.223131] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 491.223177] INFO: task iou-mgr-2490:2491 blocked for more than 122 seconds. > [ 491.223194] Call Trace: > [ 491.223198] __schedule+0x36b/0x950 > [ 491.223206] ? pick_next_task_fair+0xcf/0x3e0 > [ 491.223218] schedule+0x68/0xe0 > [ 491.223225] schedule_timeout+0x209/0x2a0 > [ 491.223236] wait_for_completion+0x8b/0xf0 > [ 491.223246] io_wq_manager+0xf1/0x1d0 > [ 491.223255] ? recalc_sigpending+0x1c/0x60 > [ 491.223265] ? io_wq_cpu_online+0x40/0x40 > [ 491.223272] ret_from_fork+0x22/0x30 > > Cancel all unbound works on exit, otherwise do_exit() -> > io_uring_files_cancel() may wait for io-wq destruction for long, e.g. > until somewhat sends a SIGKILL. > > Suggested-by: Jens Axboe <axboe@xxxxxxxxx> > Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> > --- > > Not quite happy about it as it cancels pipes and sockets, but > is probably better than waiting. I don't think there's any other way, if it's not bounded execution, we have to cancel it. The same thing would happen these requests if they were not punted async. It's either this, or "re-parenting" the requests, if the exiting task is part of a ring that belongs to a parent. -- Jens Axboe