Re: [RFC PATCH] io_uring: reissue in case -EAGAIN is returned after io issue returns

Mike Snitzer <snitzer@xxxxxxxxxx> · Wed, 6 Apr 2022 12:38:06 -0400

On Wed, Apr 06 2022 at  9:21P -0400,
Ming Lei <ming.lei@xxxxxxxxxx> wrote:

> On Wed, Apr 06, 2022 at 06:58:28AM -0600, Jens Axboe wrote:
> > On 4/5/22 9:57 PM, Ming Lei wrote:
> > > On Tue, Apr 05, 2022 at 08:20:24PM -0600, Jens Axboe wrote:
> > >> On 4/3/22 5:45 AM, Ming Lei wrote:
> > >>> -EAGAIN still may return after io issue returns, and REQ_F_REISSUE is
> > >>> set in io_complete_rw_iopoll(), but the req never gets chance to be handled.
> > >>> io_iopoll_check doesn't handle this situation, and io hang can be caused.
> > >>>
> > >>> Current dm io polling may return -EAGAIN after bio submission is
> > >>> returned, also blk-throttle might trigger this situation too.
> > >>
> > >> I don't think this is necessarily safe. Handling REQ_F_ISSUE from within
> > >> the issue path is fine, as the request hasn't been submitted yet and
> > >> hence we know that passed in structs are still stable. Once you hit it
> > >> when polling for it, the io_uring_enter() call to submit requests has
> > >> potentially already returned, and now we're in a second call where we
> > >> are polling for requests. If we're doing eg an IORING_OP_READV, the
> > >> original iovec may no longer be valid and we cannot safely re-import
> > >> data associated with it.
> > > 
> > > Yeah, this reissue is really not safe, thanks for the input.
> > > 
> > > I guess the only way is to complete the cqe for this situation.
> > 
> > At least if
> > 
> > io_op_defs[req->opcode].needs_async_setup
> > 
> > is true it isn't safe. But can't dm appropriately retry rather than
> > bubble up the -EAGAIN off ->iopoll?

The -EAGAIN is happening at submission, but this is bio-based so it is
felt past the point of ->submit_bio return.  The underlying
request-based driver has run out of tags and so the bio is errored by
block core. So it isn't felt until bio completion time.

In the case of DM, that stack trace looks like:

[168195.924803] RIP: 0010:dm_io_complete+0x1e0/0x1f0 [dm_mod]
<snip>
[168196.029002] Call Trace:
[168196.031543]  <TASK>
[168196.033737]  dm_poll_bio+0xd7/0x170 [dm_mod]
[168196.038106]  bio_poll+0xe3/0x110
[168196.041435]  iocb_bio_iopoll+0x34/0x50
[168196.045278]  io_do_iopoll+0xfb/0x400
[168196.048947]  io_iopoll_check+0x5d/0x140
[168196.052873]  __do_sys_io_uring_enter+0x3d9/0x440
[168196.057578]  do_syscall_64+0x3a/0x80
[168196.061246]  entry_SYSCALL_64_after_hwframe+0x44/0xae

But prior to that, the DM clone bio's ->bi_end_io (clone_endio) was
called with BLK_STS_AGAIN -- its just that by design dm's ->poll_bio
is what triggers final dm_io_complete() of the original polled bio (in
terms of polling process's context).

> The thing is that not only DM has such issue.
> 
> NVMe multipath has the risk, and blk-throttle/blk-cgroup may run into such
> situation too.
> 
> Any situation in which submit_bio() runs into async bio submission, the
> issue may be triggered.

bio-based is always async by virtue of bios getting packed into a
request kafter ->submit_bio returns. But to do so an available tag is
needed.

Mike