Re: [RFC 0/2] io_uring: examine request result only after completion

Jens Axboe <axboe@xxxxxxxxx> · Thu, 24 Oct 2019 11:09:25 -0600

On 10/24/19 3:18 AM, Bijan Mottahedeh wrote:
> Running an fio test consistenly crashes the kernel with the trace included
> below.  The root cause seems to be the code in __io_submit_sqe() that
> checks the result of a request for -EAGAIN in polled mode, without
> ensuring first that the request has completed:
> 
> 	if (ctx->flags & IORING_SETUP_IOPOLL) {
> 		if (req->result == -EAGAIN)
> 			return -EAGAIN;

I'm a little confused, because we should be holding the submission
reference to the request still at this point. So how is it going away?
I must be missing something...

> The request will be immediately resubmitted in io_sq_wq_submit_work(),
> potentially before the the fisrt submission has completed.  This creates
> a race where the original completion may set REQ_F_IOPOLL_COMPLETED in
> a freed submission request, overwriting the poisoned bits, casusing the
> panic below.
> 
> 	do {
> 		ret = __io_submit_sqe(ctx, req, s, false);
> 		/*
> 		 * We can get EAGAIN for polled IO even though
> 		 * we're forcing a sync submission from here,
> 		 * since we can't wait for request slots on the
> 		 * block side.
> 		 */
> 		if (ret != -EAGAIN)
> 			break;
> 		cond_resched();
> 	} while (1);
> 
> The suggested fix is to move a submitted request to the poll list
> unconditionally in polled mode.  The request can then be retried if
> necessary once the original submission has indeed completed.
>
> This bug raises an issue however since REQ_F_IOPOLL_COMPLETED is set
> in io_complete_rw_iopoll() from interrupt context.  NVMe polled queues
> however are not supposed to generate interrupts so it is not clear what
> is the reason for this apparent inconsitency.

It's because you're not running with poll queues for NVMe, hence you're
throwing a lot of performance away. Load nvme with poll_queues=X (or boot
with nvme.poll_queues=X, if built in) to have a set of separate queues
for polling. These don't have IRQs enabled, and it'll work much faster
for you.

-- 
Jens Axboe