On 3/12/21 12:35 PM, Pavel Begunkov wrote: > On 11/03/2021 23:29, Pavel Begunkov wrote: >> 1) The first problem is io_uring_cancel_sqpoll() -> >> io_uring_cancel_task_requests() basically doing park(); park(); and so >> hanging. >> >> 2) Another one is more subtle, when the master task is doing cancellations, >> but SQPOLL task submits in-between the end of the cancellation but >> before finish() requests taking a ref to the ctx, and so eternally >> locking it up. >> >> 3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and >> same io_uring_cancel_sqpoll() from the owner task, they race for >> tctx->wait events. And there probably more of them. >> >> Instead do SQPOLL cancellations from within SQPOLL task context via >> task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal >> park()/unpark() during cancellation, which is ugly, subtle and anyway >> doesn't allow to do io_run_task_work() properly.> >> io_uring_cancel_sqpoll() is called only from SQPOLL task context and >> under sqd locking, so all parking is removed from there. And so, >> io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by >> SQPOLL task, and that spare us from some headache. >> >> Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll, >> which is not used anymore. > > > Looks, the chunk below somehow slipped from the patch. Not important > for 5.12, but can can be folded anyway > > diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h > index 9761a0ec9f95..c24c62b47745 100644 > --- a/include/linux/io_uring.h > +++ b/include/linux/io_uring.h > @@ -22,7 +22,6 @@ struct io_uring_task { > void *io_wq; > struct percpu_counter inflight; > atomic_t in_idle; > - bool sqpoll; > > spinlock_t task_lock; > struct io_wq_work_list task_list; Let's do it as a separate patch instead. -- Jens Axboe