On 10/30/19 8:18 AM, Jens Axboe wrote: > On 10/30/19 8:02 AM, Bijan Mottahedeh wrote: >>> OK, so I still don't quite see where the issue is. Your setup has more >>> than one CPU per poll queue, and I can reproduce the issue quite easily >>> here with a similar setup. >> >> That's probably why I couldn't reproduce this in a vm. This time I set >> up one poll queue in a 8 cpu vm and reproduced it. > > You definitely do need poll queue sharing to hit this. > >>> Below are some things that are given: >>> >>> 1) If we fail to submit the IO, io_complete_rw_iopoll() is ultimately >>> invoked _from_ the submission path. This means that the result is >>> readily available by the time we go and check: >>> >>> if (req->result == -EAGAIN) >>> >>> in __io_submit_sqe(). >> >> Is that always true? >> >> Let's say the operation was __io_submit_sqe()->io_read() >> >> By "failing to submit the io", do you mean that >> io_read()->call_read_iter() returned success or failure? Are you saying >> that req->result was set from kiocb_done() or later in the block layer? > > By "failed to submit" I mean that req->result == -EAGAIN. We set that in > io_complete_rw_iopoll(), which is called off bio_wouldblock_error() in > the block layer. This is different from an ret != 0 return, which would > have been some sort of other failure we encountered, failing to submit > the request. For that error, we just end the request. For the > bio_wouldblock_error(), we need to retry submission. > >> Anyway I assume that io_read() would have to return success since >> otherwise __io_submit_sqe() would immediately return and not check >> req->result: >> >> if (ret) >> return ret; > > Right, if ret != 0, then we have a fatal error for that request. > ->result will not have been set at all for that case. > >> So if io_read() did return success, are we guaranteed that setting >> req->result = -EAGAIN would always happen before the check? > > Always, since it happens inline from when you attempt to issue the read. > >> Also, is it possible that we can be preempted in __io_submit_sqe() after >> the call to io_read() but before the -EAGAIN check? > > Sure, we're not disabling preemption. But that shouldn't have any > bearing on this case. >> >>> This is a submission time failure, not >>> something that should be happening from a completion path after the >>> IO has been submitted successfully. >>> >>> 2) If the succeed in submitting the request, given that we have other >>> tasks polling, the request can complete any time. It can very well be >>> complete by the time we call io_iopoll_req_issued(), and this is >>> perfectly fine. We know the request isn't going away, as we're >>> holding a reference to it. kiocb has the same lifetime, as it's >>> embedded in the io_kiocb request. Note that this isn't the same >>> io_ring_ctx at all, some other task with its own io_ring_ctx just >>> happens to find our request when doing polling on the same queue >>> itself. >> >> Ah yes, it's a different io_ring_ctx, different poll list etc. For my > > Exactly > >> own clarity, I assume all contexts are mapping the same actual sq/cq >> rings? > > The ring/context isn't mapped to a particular ring, a specific IO is > mapped to a specific queue at the time it's being submitted. That > depends on the IO type and where that task happens to be running at the > time the IO is being submitted. This might just do it, except it looks like there's a bug in sbitmap where we don't always flush deferred clears. I'll look into that now. diff --git a/fs/io_uring.c b/fs/io_uring.c index af1937d66aee..f3ca2ee44dbd 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1212,6 +1212,7 @@ static int io_prep_rw(struct io_kiocb *req, const struct sqe_submit *s, kiocb->ki_flags |= IOCB_HIPRI; kiocb->ki_complete = io_complete_rw_iopoll; + req->result = 0; } else { if (kiocb->ki_flags & IOCB_HIPRI) return -EINVAL; -- Jens Axboe