On Thu, Apr 20, 2023 at 10:00:07AM -0700, Bart Van Assche wrote: > I'm fine with not inserting requeued requests at the head of the queue. > Inserting requeued requests at the head of the queue only preserves the > original submission order if a single request is requeued. Yes. > If multiple > requests are requeued inserting at the head of the queue will cause > inversion of the order of the requeued requests. > > This implies that the I/O scheduler or disk controller (if no I/O scheduler > is configured) will become responsible for optimizing the request order if > requeuing happens. I think we need to understand why these requeues even happen. Unfortunately blk_mq_requeue_request has quite a few callers, so they'll need a bit of an audit. Quite a few are about resource constraints in the hardware and or driver. In this case I suspect it is essential that they are prioritized over incoming new commands in the way I suggested before. A different case is nvme_retry_req with the CRD field set, in which case we want to wait some time before retrying this particular command, so having new command bypass it makes sense. Another one is the SCSI ALUA transition delay, in which case we want to wait before sending commands to the LU again. In this case we really don't want to resend any commands until the delay in kicking the requeue list. So I'm really not sure what the right thing to is here. But I'm pretty sure just skipping head inserts for zoned locked writes is even more wrong than what we do right now. And I also don't see what it would be useful for. All zoned writes should either be locked by higher layers, or even better just use zone append and just get a new new location assigned when requeing as discussed before.