On Thu, May 25, 2023 at 08:06:31AM +0900, Damien Le Moal wrote: > On 5/25/23 02:56, Bart Van Assche wrote: > > On 5/23/23 17:31, Ming Lei wrote: > >> On Tue, May 23, 2023 at 10:19:34AM -0700, Bart Van Assche wrote: > >>> The mq-deadline scheduler restricts the queue depth to one per zone for zoned > >>> storage so at any time there is at most one write command (REQ_OP_WRITE) in > >>> flight per zone. > >> > >> But if the write queue depth is 1 per zone, the requeued request won't > >> be re-ordered at all given no other write request can be issued from > >> scheduler in this zone before this requeued request is completed. > >> > >> So why bother to requeue the BLK_STS_RESOURCE request via scheduler? > > > > Hi Ming, > > > > It seems like my previous email was not clear enough. The mq-deadline > > scheduler restricts the queue depth per zone for commands passed to the > > SCSI core. It does not restrict how many requests a filesystem can > > submit per zone to the block layer. Without this patch there is a risk > > of reordering if a request is requeued, e.g. by the SCSI core, and other > > requests are pending for the same zone. > > Yes there is, but the contract we established for zoned devices in the block > layer, from the start of the support, is that users *must* write sequentially. > The block layer does not attempt, generally speaking, to reorder requests. > > When mq-deadline is used, the scheduler lba reordering *may* reorder writes, > thus hiding potential bugs in the user write sequence for a zone. That is fine. > However, once a write request is dispatched, we should keep the assumption that > it is a well formed one, namely directed at the zone write pointer. So any > consideration of requeue solving write ordering issues is moot to me. > > Furthermore, when the requeue happens, the target zone is still locked and the > only write request that can be in flight for that target zones is that one being > requeued. Add to that the above assumption that the request is the one we must > dispatch first, there are absolutely zero chances of seeing a reordering happen > for writes to a particular zone. I really do not see the point of requeuing that > request through the IO scheduler at all. Totally agree, that is exactly what I meant. > > In general, even for reads, requeuing through the scheduler is I think a really > bad idea as that can potentially significantly increase the request latency > (time to completion), with the user seeing long tail latencies. E.g. if the > request has high priority or a short CDL time limit, requeuing through the > scheduler will go against the user indicated urgency for that request and > degrade the effectivness of latency control easures such as IO priority and CDL. > > Requeues should be at the head of the dispatch queue, not through the scheduler. It is pretty easy to run into STS_RESOURCE for some scsi devices, requeuing via schedule does add more overhead for these devices. > > As long as we keep zone write locking for zoned devices, requeue to the head of > the dispatch queue is fine. But maybe this work is preparatory to removing zone > write locking ? If that is the case, I would like to see that as well to get the > big picture. Otherwise, the latency concerns I raised above are in my opinion, a > blocker for this change. Yeah, looks Bart needs to add more words about requirement & motivation of this patchset. Thanks, Ming