On 12/03/2021 19:40, Jens Axboe wrote: > On 3/12/21 12:35 PM, Pavel Begunkov wrote: >> On 11/03/2021 23:29, Pavel Begunkov wrote: >>> 1) The first problem is io_uring_cancel_sqpoll() -> >>> io_uring_cancel_task_requests() basically doing park(); park(); and so >>> hanging. >>> >>> 2) Another one is more subtle, when the master task is doing cancellations, >>> but SQPOLL task submits in-between the end of the cancellation but >>> before finish() requests taking a ref to the ctx, and so eternally >>> locking it up. >>> >>> 3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and >>> same io_uring_cancel_sqpoll() from the owner task, they race for >>> tctx->wait events. And there probably more of them. >>> >>> Instead do SQPOLL cancellations from within SQPOLL task context via >>> task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal >>> park()/unpark() during cancellation, which is ugly, subtle and anyway >>> doesn't allow to do io_run_task_work() properly.> >>> io_uring_cancel_sqpoll() is called only from SQPOLL task context and >>> under sqd locking, so all parking is removed from there. And so, >>> io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by >>> SQPOLL task, and that spare us from some headache. >>> >>> Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll, >>> which is not used anymore. >> >> >> Looks, the chunk below somehow slipped from the patch. Not important >> for 5.12, but can can be folded anyway >> >> diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h >> index 9761a0ec9f95..c24c62b47745 100644 >> --- a/include/linux/io_uring.h >> +++ b/include/linux/io_uring.h >> @@ -22,7 +22,6 @@ struct io_uring_task { >> void *io_wq; >> struct percpu_counter inflight; >> atomic_t in_idle; >> - bool sqpoll; >> >> spinlock_t task_lock; >> struct io_wq_work_list task_list; > > Let's do it as a separate patch instead. Ok, I'll send it for-5.13 when it's appropriate. -- Pavel Begunkov