Re: [RFC 0/2] io_uring: examine request result only after completion

Jens Axboe <axboe@xxxxxxxxx> · Wed, 30 Oct 2019 11:32:31 -0600

On 10/30/19 8:18 AM, Jens Axboe wrote:
> On 10/30/19 8:02 AM, Bijan Mottahedeh wrote:
>>> OK, so I still don't quite see where the issue is. Your setup has more
>>> than one CPU per poll queue, and I can reproduce the issue quite easily
>>> here with a similar setup.
>>
>> That's probably why I couldn't reproduce this in a vm.  This time I set
>> up one poll queue in a 8 cpu vm and reproduced it.
> 
> You definitely do need poll queue sharing to hit this.
> 
>>> Below are some things that are given:
>>>
>>> 1) If we fail to submit the IO, io_complete_rw_iopoll() is ultimately
>>>       invoked _from_ the submission path. This means that the result is
>>>       readily available by the time we go and check:
>>>
>>>       if (req->result == -EAGAIN)
>>>
>>>       in __io_submit_sqe().
>>
>> Is that always true?
>>
>> Let's say the operation was __io_submit_sqe()->io_read()
>>
>> By "failing to submit the io", do you mean that
>> io_read()->call_read_iter() returned success or failure?  Are you saying
>> that req->result was set from kiocb_done() or later in the block layer?
> 
> By "failed to submit" I mean that req->result == -EAGAIN. We set that in
> io_complete_rw_iopoll(), which is called off bio_wouldblock_error() in
> the block layer. This is different from an ret != 0 return, which would
> have been some sort of other failure we encountered, failing to submit
> the request. For that error, we just end the request. For the
> bio_wouldblock_error(), we need to retry submission.
> 
>> Anyway I assume that io_read() would have to return success since
>> otherwise __io_submit_sqe() would immediately return and not check
>> req->result:
>>
>>            if (ret)
>>                    return ret;
> 
> Right, if ret != 0, then we have a fatal error for that request.
> ->result will not have been set at all for that case.
> 
>> So if io_read() did return success,  are we guaranteed that setting
>> req->result = -EAGAIN would always happen before the check?
> 
> Always, since it happens inline from when you attempt to issue the read.
> 
>> Also, is it possible that we can be preempted in __io_submit_sqe() after
>> the call to io_read() but before the -EAGAIN check?
> 
> Sure, we're not disabling preemption. But that shouldn't have any
> bearing on this case.
>>
>>> This is a submission time failure, not
>>>       something that should be happening from a completion path after the
>>>       IO has been submitted successfully.
>>>
>>> 2) If the succeed in submitting the request, given that we have other
>>>       tasks polling, the request can complete any time. It can very well be
>>>       complete by the time we call io_iopoll_req_issued(), and this is
>>>       perfectly fine. We know the request isn't going away, as we're
>>>       holding a reference to it. kiocb has the same lifetime, as it's
>>>       embedded in the io_kiocb request. Note that this isn't the same
>>>       io_ring_ctx at all, some other task with its own io_ring_ctx just
>>>       happens to find our request when doing polling on the same queue
>>>       itself.
>>
>> Ah yes, it's a different io_ring_ctx, different poll list etc. For my
> 
> Exactly
> 
>> own clarity, I assume all contexts are mapping the same actual sq/cq
>> rings?
> 
> The ring/context isn't mapped to a particular ring, a specific IO is
> mapped to a specific queue at the time it's being submitted. That
> depends on the IO type and where that task happens to be running at the
> time the IO is being submitted.

This might just do it, except it looks like there's a bug in sbitmap
where we don't always flush deferred clears. I'll look into that now.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index af1937d66aee..f3ca2ee44dbd 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1212,6 +1212,7 @@ static int io_prep_rw(struct io_kiocb *req, const struct sqe_submit *s,
 
 		kiocb->ki_flags |= IOCB_HIPRI;
 		kiocb->ki_complete = io_complete_rw_iopoll;
+		req->result = 0;
 	} else {
 		if (kiocb->ki_flags & IOCB_HIPRI)
 			return -EINVAL;

-- 
Jens Axboe