On 11/03/2021 23:29, Pavel Begunkov wrote: > 1) The first problem is io_uring_cancel_sqpoll() -> > io_uring_cancel_task_requests() basically doing park(); park(); and so > hanging. > > 2) Another one is more subtle, when the master task is doing cancellations, > but SQPOLL task submits in-between the end of the cancellation but > before finish() requests taking a ref to the ctx, and so eternally > locking it up. > > 3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and > same io_uring_cancel_sqpoll() from the owner task, they race for > tctx->wait events. And there probably more of them. > > Instead do SQPOLL cancellations from within SQPOLL task context via > task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal > park()/unpark() during cancellation, which is ugly, subtle and anyway > doesn't allow to do io_run_task_work() properly.> > io_uring_cancel_sqpoll() is called only from SQPOLL task context and > under sqd locking, so all parking is removed from there. And so, > io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by > SQPOLL task, and that spare us from some headache. > > Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll, > which is not used anymore. Looks, the chunk below somehow slipped from the patch. Not important for 5.12, but can can be folded anyway diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 9761a0ec9f95..c24c62b47745 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -22,7 +22,6 @@ struct io_uring_task { void *io_wq; struct percpu_counter inflight; atomic_t in_idle; - bool sqpoll; spinlock_t task_lock; struct io_wq_work_list task_list; -- Pavel Begunkov