On 10/25/19 8:18 AM, Jens Axboe wrote: > On 10/25/19 8:07 AM, Jens Axboe wrote: >> On 10/25/19 7:46 AM, Bijan Mottahedeh wrote: >>> >>> On 10/24/19 3:31 PM, Jens Axboe wrote: >>>> On 10/24/19 1:18 PM, Bijan Mottahedeh wrote: >>>>> On 10/24/19 10:09 AM, Jens Axboe wrote: >>>>>> On 10/24/19 3:18 AM, Bijan Mottahedeh wrote: >>>>>>> Running an fio test consistenly crashes the kernel with the trace included >>>>>>> below. The root cause seems to be the code in __io_submit_sqe() that >>>>>>> checks the result of a request for -EAGAIN in polled mode, without >>>>>>> ensuring first that the request has completed: >>>>>>> >>>>>>> if (ctx->flags & IORING_SETUP_IOPOLL) { >>>>>>> if (req->result == -EAGAIN) >>>>>>> return -EAGAIN; >>>>>> I'm a little confused, because we should be holding the submission >>>>>> reference to the request still at this point. So how is it going away? >>>>>> I must be missing something... >>>>> I don't think the submission reference is going away... >>>>> >>>>> I *think* the problem has to do with the fact that >>>>> io_complete_rw_iopoll() which sets REQ_F_IOPOLL_COMPLETED is being >>>>> called from interrupt context in my configuration and so there is a >>>>> potential race between updating the request there and checking it in >>>>> __io_submit_sqe(). >>>>> >>>>> My first workaround was to simply poll for REQ_F_IOPOLL_COMPLETED in the >>>>> code snippet above: >>>>> >>>>> if (req->result == --EAGAIN) { >>>>> >>>>> poll for REQ_F_IOPOLL_COMPLETED >>>>> >>>>> return -EAGAIN; >>>>> >>>>> } >>>>> >>>>> and that got rid of the problem. >>>> But that will not work at all for a proper poll setup, where you don't >>>> trigger any IRQs... It only happens to work for this case because you're >>>> still triggering interrupts. But even in that case, it's not a real >>>> solution, but I don't think that's the argument here ;-) >>> >>> Sure. >>> >>> I'm just curious though as how it would break the poll case because >>> io_complete_rw_iopoll() would still be called though through polling, >>> REQ_F_IOPOLL_COMPLETED would be set, and so io_iopoll_complete() >>> should be able to reliably check req->result. >> >> It'd break the poll case because the task doing the submission is >> generally also the one that finds and reaps completion. Hence if you >> block that task just polling on that completion bit, you are preventing >> that very task from going and reaping completions. The condition would >> never become true, and you are now looping forever. >> >>> The same poll test seemed to run ok with nvme interrupts not being >>> triggered. Anyway, no argument that it's not needed! >> >> A few reasons why it would make progress: >> >> - You eventually trigger a timeout on the nvme side, as blk-mq finds the >> request hasn't been completed by an IRQ. But that's a 30 second ordeal >> before that event occurs. >> >> - There was still interrupts enabled. >> >> - You have two threads, one doing submission and one doing completions. >> Maybe using SQPOLL? If that's the case, then yes, it'd still work as >> you have separate threads for submission and completion. >> >> For the "generic" case of just using one thread and IRQs disabled, it'd >> deadlock. >> >>>> I see what the race is now, it's specific to IRQ driven polling. We >>>> really should just disallow that, to be honest, it doesn't make any >>>> sense. But let me think about if we can do a reasonable solution to this >>>> that doesn't involve adding overhead for a proper setup. >>> >>> It's a nonsensical config in a way and so disallowing it would make >>> the most sense. >> >> Definitely. The nvme driver should not set .poll() if it doesn't have >> non-irq poll queues. Something like this: > > Actually, we already disable polling if we don't have specific poll > queues: > > if (set->nr_maps > HCTX_TYPE_POLL && > set->map[HCTX_TYPE_POLL].nr_queues) > blk_queue_flag_set(QUEUE_FLAG_POLL, q); > > Did you see any timeouts in your tests? I wonder if the use-after-free > triggered when the timeout found the request while you had the busy-spin > logic we discussed previously. Ah, but we still have fops->iopoll() set for that case. So we just won't poll for it, it'll get completed by IRQ. So I do think we need to handle this case in io_uring. I'll get back to you. -- Jens Axboe