OK, so I still don't quite see where the issue is. Your setup has more
than one CPU per poll queue, and I can reproduce the issue quite easily
here with a similar setup.
That's probably why I couldn't reproduce this in a vm. This time I set
up one poll queue in a 8 cpu vm and reproduced it.
Below are some things that are given:
1) If we fail to submit the IO, io_complete_rw_iopoll() is ultimately
invoked _from_ the submission path. This means that the result is
readily available by the time we go and check:
if (req->result == -EAGAIN)
in __io_submit_sqe().
Is that always true?
Let's say the operation was __io_submit_sqe()->io_read()
By "failing to submit the io", do you mean that
io_read()->call_read_iter() returned success or failure? Are you saying
that req->result was set from kiocb_done() or later in the block layer?
Anyway I assume that io_read() would have to return success since
otherwise __io_submit_sqe() would immediately return and not check
req->result:
if (ret)
return ret;
So if io_read() did return success, are we guaranteed that setting
req->result = -EAGAIN would always happen before the check?
Also, is it possible that we can be preempted in __io_submit_sqe() after
the call to io_read() but before the -EAGAIN check?
This is a submission time failure, not
something that should be happening from a completion path after the
IO has been submitted successfully.
2) If the succeed in submitting the request, given that we have other
tasks polling, the request can complete any time. It can very well be
complete by the time we call io_iopoll_req_issued(), and this is
perfectly fine. We know the request isn't going away, as we're
holding a reference to it. kiocb has the same lifetime, as it's
embedded in the io_kiocb request. Note that this isn't the same
io_ring_ctx at all, some other task with its own io_ring_ctx just
happens to find our request when doing polling on the same queue
itself.
Ah yes, it's a different io_ring_ctx, different poll list etc. For my
own clarity, I assume all contexts are mapping the same actual sq/cq rings?
We would definitely get in trouble if we submitted the request
successfully, but returned -EAGAIN because we thought we didn't.
In my testing, what I seem to see is double completions on the block
layer side, and double issues. I can't quite get that to match up with
anything...
I'll keep digging, hopefully I'll get some deeper understanding of what
exactly the issue is shortly. I was hoping I'd get that by writing my
thoughts in this email, but alas that didn't happen yet.