On Fri, Sep 08, 2017 at 09:53:53AM -0700, Damien Le Moal wrote: > Ming, > > On 9/8/17 05:43, Ming Lei wrote: > > Hi Damien, > > > > On Fri, Sep 08, 2017 at 01:16:38AM +0900, Damien Le Moal wrote: > >> In the case of a ZBC disk used with scsi-mq, zone write locking does > >> not prevent write reordering in sequential zones. Unlike the legacy > >> case, zone locking can only be done after the command request is > >> removed from the scheduler dispatch queue. That is, at the time of > >> zone locking, the write command may already be out of order. > > > > Per my understanding, for legacy case, it can be quite tricky to let > > the existed I/O scheduler guarantee the write order for ZBC disk. > > I guess requeue still might cause write reorder even in legacy path, > > since requeue can happen in both scsi_request_fn() and scsi_io_completion() > > with q->queue_lock released, meantime new rq belonging to the same > > zone can come and be inserted to queue. > > Yes, the write ordering will always depend on the scheduler doing the > right thing. But both cfq, deadline and even noop do the right thing > there, even considering the aging case. The next write for a zone will > always be the oldest in the queue for that zone, if it is not, it means > that the application did not write sequentially. Extensive testing in > the legacy case never showed a problem due to the scheduler itself. OK, I suggest to document this guarantee of no write reorder for ZBC somewhere, so that people will keep it in mind when trying to change the current code. > > scsi_requeue_command() does the unprep (zone unlock) and requeue while > holding the queue lock. So this is atomic with new write command > insertion. Requeued commands are added to the dispatch queue head, and > since a zone will only have a single write in-flight, there is no > reordering possible. The next write command for a zone to go again is > the last requeued one or the next in lba order. It works. One special case is write with FLUSH/FUA, which may be added to front of q->queue_head directly. Suppose one write with FUA is just comes between requeue and run queue, write reorder may be triggered. If this assumption is true, there might be such issue on blk-mq too. -- Ming