io_wq_put_and_exit() is called from do_exit(), but all FIXED_FILE requests in io_wq aren't canceled in io_uring_cancel_generic() called from do_exit(). Meantime io_wq IO code path may share resource with normal iopoll code path. So if any HIPRI request is submitted via io_wq, this request may not get resource for moving on, given iopoll isn't possible in io_wq_put_and_exit(). The issue can be triggered when terminating 't/io_uring -n4 /dev/nullb0' with default null_blk parameters. Fix it by the following approaches: - switch to IO_URING_F_NONBLOCK for submitting POLLED IO from io_wq, so that requests can be canceled when submitting from exiting io_wq - reap completed events before exiting io wq, so that these completed requests won't hold resource and prevent other contexts from moving on Closes: https://lore.kernel.org/linux-block/3893581.1691785261@xxxxxxxxxxxxxxxxxxxxxx/ Reported-by: David Howells <dhowells@xxxxxxxxxx> Cc: Pavel Begunkov <asml.silence@xxxxxxxxx> Cc: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> --- V3: - take new approach and fix regression on thread_exit in liburing tests - pass liburing tests(make runtests) V2: - avoid to mess up io_uring_cancel_generic() by adding one new helper for canceling io_wq requests io_uring/io_uring.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index ad636954abae..95a3d31a1ef1 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1930,6 +1930,10 @@ void io_wq_submit_work(struct io_wq_work *work) } } + /* It is fragile to block POLLED IO, so switch to NON_BLOCK */ + if ((req->ctx->flags & IORING_SETUP_IOPOLL) && def->iopoll_queue) + issue_flags |= IO_URING_F_NONBLOCK; + do { ret = io_issue_sqe(req, issue_flags); if (ret != -EAGAIN) @@ -3363,6 +3367,12 @@ __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd) finish_wait(&tctx->wait, &wait); } while (1); + /* + * Reap events from each ctx, otherwise these requests may take + * resources and prevent other contexts from being moved on. + */ + xa_for_each(&tctx->xa, index, node) + io_iopoll_try_reap_events(node->ctx); io_uring_clean_tctx(tctx); if (cancel_all) { /* -- 2.40.1