On 10/24/19 3:18 AM, Bijan Mottahedeh wrote: > Running an fio test consistenly crashes the kernel with the trace included > below. The root cause seems to be the code in __io_submit_sqe() that > checks the result of a request for -EAGAIN in polled mode, without > ensuring first that the request has completed: > > if (ctx->flags & IORING_SETUP_IOPOLL) { > if (req->result == -EAGAIN) > return -EAGAIN; I'm a little confused, because we should be holding the submission reference to the request still at this point. So how is it going away? I must be missing something... > The request will be immediately resubmitted in io_sq_wq_submit_work(), > potentially before the the fisrt submission has completed. This creates > a race where the original completion may set REQ_F_IOPOLL_COMPLETED in > a freed submission request, overwriting the poisoned bits, casusing the > panic below. > > do { > ret = __io_submit_sqe(ctx, req, s, false); > /* > * We can get EAGAIN for polled IO even though > * we're forcing a sync submission from here, > * since we can't wait for request slots on the > * block side. > */ > if (ret != -EAGAIN) > break; > cond_resched(); > } while (1); > > The suggested fix is to move a submitted request to the poll list > unconditionally in polled mode. The request can then be retried if > necessary once the original submission has indeed completed. > > This bug raises an issue however since REQ_F_IOPOLL_COMPLETED is set > in io_complete_rw_iopoll() from interrupt context. NVMe polled queues > however are not supposed to generate interrupts so it is not clear what > is the reason for this apparent inconsitency. It's because you're not running with poll queues for NVMe, hence you're throwing a lot of performance away. Load nvme with poll_queues=X (or boot with nvme.poll_queues=X, if built in) to have a set of separate queues for polling. These don't have IRQs enabled, and it'll work much faster for you. -- Jens Axboe