Re: [RFC 0/2] io_uring: examine request result only after completion

Bijan Mottahedeh <bijan.mottahedeh@xxxxxxxxxx> · Wed, 30 Oct 2019 07:02:47 -0700

OK, so I still don't quite see where the issue is. Your setup has more
than one CPU per poll queue, and I can reproduce the issue quite easily
here with a similar setup.

That's probably why I couldn't reproduce this in a vm.  This time I set 
up one poll queue in a 8 cpu vm and reproduced it.

Below are some things that are given:

1) If we fail to submit the IO, io_complete_rw_iopoll() is ultimately
    invoked _from_ the submission path. This means that the result is
    readily available by the time we go and check:

    if (req->result == -EAGAIN)

    in __io_submit_sqe().

Is that always true?

Let's say the operation was __io_submit_sqe()->io_read()

By "failing to submit the io", do you mean that 
io_read()->call_read_iter() returned success or failure?  Are you saying 
that req->result was set from kiocb_done() or later in the block layer?

Anyway I assume that io_read() would have to return success since 
otherwise __io_submit_sqe() would immediately return and not check 
req->result:

        if (ret)
                return ret;

So if io_read() did return success,  are we guaranteed that setting 
req->result = -EAGAIN would always happen before the check?

Also, is it possible that we can be preempted in __io_submit_sqe() after 
the call to io_read() but before the -EAGAIN check?

This is a submission time failure, not
    something that should be happening from a completion path after the
    IO has been submitted successfully.

2) If the succeed in submitting the request, given that we have other
    tasks polling, the request can complete any time. It can very well be
    complete by the time we call io_iopoll_req_issued(), and this is
    perfectly fine. We know the request isn't going away, as we're
    holding a reference to it. kiocb has the same lifetime, as it's
    embedded in the io_kiocb request. Note that this isn't the same
    io_ring_ctx at all, some other task with its own io_ring_ctx just
    happens to find our request when doing polling on the same queue
    itself.

Ah yes, it's a different io_ring_ctx, different poll list etc. For my 
own clarity, I assume all contexts are mapping the same actual sq/cq rings?

We would definitely get in trouble if we submitted the request
successfully, but returned -EAGAIN because we thought we didn't.

In my testing, what I seem to see is double completions on the block
layer side, and double issues. I can't quite get that to match up with
anything...

I'll keep digging, hopefully I'll get some deeper understanding of what
exactly the issue is shortly. I was hoping I'd get that by writing my
thoughts in this email, but alas that didn't happen yet.