hi,
On 14/12/2020 15:49, Xiaoguang Wang wrote:
io_iopoll_complete() does not hold completion_lock to complete polled
io, so in io_wq_submit_work(), we can not call io_req_complete() directly,
to complete polled io, otherwise there maybe concurrent access to cqring,
defer_list, etc, which is not safe. Commit dad1b1242fd5 ("io_uring: always
let io_iopoll_complete() complete polled io") has fixed this issue, but
Pavel reported that IOPOLL apart from rw can do buf reg/unreg requests(
IORING_OP_PROVIDE_BUFFERS or IORING_OP_REMOVE_BUFFERS), so the fix is
not good.
Given that io_iopoll_complete() is always called under uring_lock, so here
for polled io, we can also get uring_lock to fix this issue.
One thing I don't like is that io_wq_submit_work() won't be able to
publish an event while someone polling io_uring_enter(ENTER_GETEVENTS),
that's because both take the lock. The problem is when the poller waits
for an event that is currently in io-wq (i.e. io_wq_submit_work()).
The polling loop will eventually exit, so that's not a deadlock, but
latency,etc. would be huge.
In this patch, we just hold uring_lock for polled io in error path, so I think
normally it maybe not an issue, and seems that the critical section is not
that big, so it also may not result in huge latecy.
I also noticed that current codes also hold uring_lock in io_wq_submit_work()
call chain:
==> io_wq_submit_work()
====> io_issue_sqe()
======> io_provide_buffers()
========> io_ring_submit_lock(ctx, !force_nonblock);
Regards,
Xiaoguang Wang
Fixes: dad1b1242fd5 ("io_uring: always let io_iopoll_complete() complete polled io")
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@xxxxxxxxxxxxxxxxx>
---
fs/io_uring.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index f53356ced5ab..eab3d2b7d232 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6354,19 +6354,24 @@ static struct io_wq_work *io_wq_submit_work(struct io_wq_work *work)
}
if (ret) {
+ bool iopoll_enabled = req->ctx->flags & IORING_SETUP_IOPOLL;
+
/*
- * io_iopoll_complete() does not hold completion_lock to complete
- * polled io, so here for polled io, just mark it done and still let
- * io_iopoll_complete() complete it.
+ * io_iopoll_complete() does not hold completion_lock to complete polled
+ * io, so here for polled io, we can not call io_req_complete() directly,
+ * otherwise there maybe concurrent access to cqring, defer_list, etc,
+ * which is not safe. Given that io_iopoll_complete() is always called
+ * under uring_lock, so here for polled io, we also get uring_lock to
+ * complete it.
*/
- if (req->ctx->flags & IORING_SETUP_IOPOLL) {
- struct kiocb *kiocb = &req->rw.kiocb;
+ if (iopoll_enabled)
+ mutex_lock(&req->ctx->uring_lock);
- kiocb_done(kiocb, ret, NULL);
- } else {
- req_set_fail_links(req);
- io_req_complete(req, ret);
- }
+ req_set_fail_links(req);
+ io_req_complete(req, ret);
+
+ if (iopoll_enabled)
+ mutex_unlock(&req->ctx->uring_lock);
}
return io_steal_work(req);