On Fri, Apr 14, 2023 at 03:53:13PM +0800, Ming Lei wrote:
So far io_req_complete_post() only covers DEFER_TASKRUN by completing request via task work when the request is completed from IOWQ. However, uring command could be completed from any context, and if io uring is setup with DEFER_TASKRUN, the command is required to be completed from current context, otherwise wait on IORING_ENTER_GETEVENTS can't be wakeup, and may hang forever. The issue can be observed on removing ublk device, but turns out it is one generic issue for uring command & DEFER_TASKRUN, so solve it in io_uring core code.
Thanks for sharing, this has been fine for nvme-passthrough side though. We usually test with DEFER_TASKRUN option, as both fio and t/io_uring set the option.
Link: https://lore.kernel.org/linux-block/b3fc9991-4c53-9218-a8cc-5b4dd3952108@xxxxxxxxx/ Reported-by: Jens Axboe <axboe@xxxxxxxxx> Cc: Kanchan Joshi <joshi.k@xxxxxxxxxxx> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> --- io_uring/io_uring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 9083a8466ebf..9f6f92ed60b2 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1012,7 +1012,7 @@ static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags) void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags) { - if (req->ctx->task_complete && (issue_flags & IO_URING_F_IOWQ)) { + if (req->ctx->task_complete && req->ctx->submitter_task != current) { req->io_task_work.func = io_req_task_complete; io_req_task_work_add(req);
In nvme-side, we always complete in task context, so this seems bit hard to produce. But this patch ensures that task-work is setup if it is needed, and caller/driver did not get to set that explicitly. So looks fine to me. FWIW, I do not see regression in nvme tests.