[PATCH V3] io_uring: fix IO hang in io_wq_put_and_exit from do_exit()

Ming Lei <ming.lei@xxxxxxxxxx> · Fri, 8 Sep 2023 17:30:09 +0800

io_wq_put_and_exit() is called from do_exit(), but all FIXED_FILE requests
in io_wq aren't canceled in io_uring_cancel_generic() called from do_exit().
Meantime io_wq IO code path may share resource with normal iopoll code
path.

So if any HIPRI request is submitted via io_wq, this request may not get
resource for moving on, given iopoll isn't possible in io_wq_put_and_exit().

The issue can be triggered when terminating 't/io_uring -n4 /dev/nullb0'
with default null_blk parameters.

Fix it by the following approaches:

- switch to IO_URING_F_NONBLOCK for submitting POLLED IO from io_wq, so
that requests can be canceled when submitting from exiting io_wq

- reap completed events before exiting io wq, so that these completed
requests won't hold resource and prevent other contexts from moving on

Closes: https://lore.kernel.org/linux-block/3893581.1691785261@xxxxxxxxxxxxxxxxxxxxxx/
Reported-by: David Howells <dhowells@xxxxxxxxxx>
Cc: Pavel Begunkov <asml.silence@xxxxxxxxx>
Cc: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>
Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
---
V3:
	- take new approach and fix regression on thread_exit in liburing
	  tests
	- pass liburing tests(make runtests)
V2:
	- avoid to mess up io_uring_cancel_generic() by adding one new
    helper for canceling io_wq requests

 io_uring/io_uring.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ad636954abae..95a3d31a1ef1 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1930,6 +1930,10 @@ void io_wq_submit_work(struct io_wq_work *work)
 		}
 	}
 
+	/* It is fragile to block POLLED IO, so switch to NON_BLOCK */
+	if ((req->ctx->flags & IORING_SETUP_IOPOLL) && def->iopoll_queue)
+		issue_flags |= IO_URING_F_NONBLOCK;
+
 	do {
 		ret = io_issue_sqe(req, issue_flags);
 		if (ret != -EAGAIN)
@@ -3363,6 +3367,12 @@ __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd)
 		finish_wait(&tctx->wait, &wait);
 	} while (1);
 
+	/*
+	 * Reap events from each ctx, otherwise these requests may take
+	 * resources and prevent other contexts from being moved on.
+	 */
+	xa_for_each(&tctx->xa, index, node)
+		io_iopoll_try_reap_events(node->ctx);
 	io_uring_clean_tctx(tctx);
 	if (cancel_all) {
 		/*
-- 
2.40.1