Re: [RFC PATCH] io_uring: reissue in case -EAGAIN is returned after io issue returns

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 6 Apr 2022 21:21:56 +0800

On Wed, Apr 06, 2022 at 06:58:28AM -0600, Jens Axboe wrote:
> On 4/5/22 9:57 PM, Ming Lei wrote:
> > On Tue, Apr 05, 2022 at 08:20:24PM -0600, Jens Axboe wrote:
> >> On 4/3/22 5:45 AM, Ming Lei wrote:
> >>> -EAGAIN still may return after io issue returns, and REQ_F_REISSUE is
> >>> set in io_complete_rw_iopoll(), but the req never gets chance to be handled.
> >>> io_iopoll_check doesn't handle this situation, and io hang can be caused.
> >>>
> >>> Current dm io polling may return -EAGAIN after bio submission is
> >>> returned, also blk-throttle might trigger this situation too.
> >>
> >> I don't think this is necessarily safe. Handling REQ_F_ISSUE from within
> >> the issue path is fine, as the request hasn't been submitted yet and
> >> hence we know that passed in structs are still stable. Once you hit it
> >> when polling for it, the io_uring_enter() call to submit requests has
> >> potentially already returned, and now we're in a second call where we
> >> are polling for requests. If we're doing eg an IORING_OP_READV, the
> >> original iovec may no longer be valid and we cannot safely re-import
> >> data associated with it.
> > 
> > Yeah, this reissue is really not safe, thanks for the input.
> > 
> > I guess the only way is to complete the cqe for this situation.
> 
> At least if
> 
> io_op_defs[req->opcode].needs_async_setup
> 
> is true it isn't safe. But can't dm appropriately retry rather than
> bubble up the -EAGAIN off ->iopoll?

The thing is that not only DM has such issue.

NVMe multipath has the risk, and blk-throttle/blk-cgroup may run into such
situation too.

Any situation in which submit_bio() runs into async bio submission, the
issue may be triggered.

Thanks,
Ming