Re: [RFC 5.12] io-wq: cancel unbounded works on io-wq destroy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/2/21 10:52 AM, Pavel Begunkov wrote:
> [  491.222908] INFO: task thread-exit:2490 blocked for more than 122 seconds.
> [  491.222957] Call Trace:
> [  491.222967]  __schedule+0x36b/0x950
> [  491.222985]  schedule+0x68/0xe0
> [  491.222994]  schedule_timeout+0x209/0x2a0
> [  491.223003]  ? tlb_flush_mmu+0x28/0x140
> [  491.223013]  wait_for_completion+0x8b/0xf0
> [  491.223023]  io_wq_destroy_manager+0x24/0x60
> [  491.223037]  io_wq_put_and_exit+0x18/0x30
> [  491.223045]  io_uring_clean_tctx+0x76/0xa0
> [  491.223061]  __io_uring_files_cancel+0x1b9/0x2e0
> [  491.223068]  ? blk_finish_plug+0x26/0x40
> [  491.223085]  do_exit+0xc0/0xb40
> [  491.223099]  ? syscall_trace_enter.isra.0+0x1a1/0x1e0
> [  491.223109]  __x64_sys_exit+0x1b/0x20
> [  491.223117]  do_syscall_64+0x38/0x50
> [  491.223131]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  491.223177] INFO: task iou-mgr-2490:2491 blocked for more than 122 seconds.
> [  491.223194] Call Trace:
> [  491.223198]  __schedule+0x36b/0x950
> [  491.223206]  ? pick_next_task_fair+0xcf/0x3e0
> [  491.223218]  schedule+0x68/0xe0
> [  491.223225]  schedule_timeout+0x209/0x2a0
> [  491.223236]  wait_for_completion+0x8b/0xf0
> [  491.223246]  io_wq_manager+0xf1/0x1d0
> [  491.223255]  ? recalc_sigpending+0x1c/0x60
> [  491.223265]  ? io_wq_cpu_online+0x40/0x40
> [  491.223272]  ret_from_fork+0x22/0x30
> 
> Cancel all unbound works on exit, otherwise do_exit() ->
> io_uring_files_cancel() may wait for io-wq destruction for long, e.g.
> until somewhat sends a SIGKILL.
> 
> Suggested-by: Jens Axboe <axboe@xxxxxxxxx>
> Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx>
> ---
> 
> Not quite happy about it as it cancels pipes and sockets, but
> is probably better than waiting.

I don't think there's any other way, if it's not bounded execution,
we have to cancel it. The same thing would happen these requests
if they were not punted async. It's either this, or "re-parenting"
the requests, if the exiting task is part of a ring that belongs
to a parent.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux