On 14/12/2020 15:49, Xiaoguang Wang wrote: > io_iopoll_complete() does not hold completion_lock to complete polled > io, so in io_wq_submit_work(), we can not call io_req_complete() directly, > to complete polled io, otherwise there maybe concurrent access to cqring, > defer_list, etc, which is not safe. Commit dad1b1242fd5 ("io_uring: always > let io_iopoll_complete() complete polled io") has fixed this issue, but > Pavel reported that IOPOLL apart from rw can do buf reg/unreg requests( > IORING_OP_PROVIDE_BUFFERS or IORING_OP_REMOVE_BUFFERS), so the fix is > not good. > > Given that io_iopoll_complete() is always called under uring_lock, so here > for polled io, we can also get uring_lock to fix this issue. One thing I don't like is that io_wq_submit_work() won't be able to publish an event while someone polling io_uring_enter(ENTER_GETEVENTS), that's because both take the lock. The problem is when the poller waits for an event that is currently in io-wq (i.e. io_wq_submit_work()). The polling loop will eventually exit, so that's not a deadlock, but latency,etc. would be huge. > > Fixes: dad1b1242fd5 ("io_uring: always let io_iopoll_complete() complete polled io") > Signed-off-by: Xiaoguang Wang <xiaoguang.wang@xxxxxxxxxxxxxxxxx> > --- > fs/io_uring.c | 25 +++++++++++++++---------- > 1 file changed, 15 insertions(+), 10 deletions(-) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index f53356ced5ab..eab3d2b7d232 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -6354,19 +6354,24 @@ static struct io_wq_work *io_wq_submit_work(struct io_wq_work *work) > } > > if (ret) { > + bool iopoll_enabled = req->ctx->flags & IORING_SETUP_IOPOLL; > + > /* > - * io_iopoll_complete() does not hold completion_lock to complete > - * polled io, so here for polled io, just mark it done and still let > - * io_iopoll_complete() complete it. > + * io_iopoll_complete() does not hold completion_lock to complete polled > + * io, so here for polled io, we can not call io_req_complete() directly, > + * otherwise there maybe concurrent access to cqring, defer_list, etc, > + * which is not safe. Given that io_iopoll_complete() is always called > + * under uring_lock, so here for polled io, we also get uring_lock to > + * complete it. > */ > - if (req->ctx->flags & IORING_SETUP_IOPOLL) { > - struct kiocb *kiocb = &req->rw.kiocb; > + if (iopoll_enabled) > + mutex_lock(&req->ctx->uring_lock); > > - kiocb_done(kiocb, ret, NULL); > - } else { > - req_set_fail_links(req); > - io_req_complete(req, ret); > - } > + req_set_fail_links(req); > + io_req_complete(req, ret); > + > + if (iopoll_enabled) > + mutex_unlock(&req->ctx->uring_lock); > } > > return io_steal_work(req); > -- Pavel Begunkov