Re: [PATCH v3 2/7] block: Send requeued requests to the I/O scheduler

Ming Lei <ming.lei@xxxxxxxxxx> · Thu, 25 May 2023 08:53:22 +0800

On Thu, May 25, 2023 at 08:06:31AM +0900, Damien Le Moal wrote:
> On 5/25/23 02:56, Bart Van Assche wrote:
> > On 5/23/23 17:31, Ming Lei wrote:
> >> On Tue, May 23, 2023 at 10:19:34AM -0700, Bart Van Assche wrote:
> >>> The mq-deadline scheduler restricts the queue depth to one per zone for zoned
> >>> storage so at any time there is at most one write command (REQ_OP_WRITE) in
> >>> flight per zone.
> >>
> >> But if the write queue depth is 1 per zone, the requeued request won't
> >> be re-ordered at all given no other write request can be issued from
> >> scheduler in this zone before this requeued request is completed.
> >>
> >> So why bother to requeue the BLK_STS_RESOURCE request via scheduler?
> > 
> > Hi Ming,
> > 
> > It seems like my previous email was not clear enough. The mq-deadline 
> > scheduler restricts the queue depth per zone for commands passed to the 
> > SCSI core. It does not restrict how many requests a filesystem can 
> > submit per zone to the block layer. Without this patch there is a risk 
> > of reordering if a request is requeued, e.g. by the SCSI core, and other 
> > requests are pending for the same zone.
> 
> Yes there is, but the contract we established for zoned devices in the block
> layer, from the start of the support, is that users *must* write sequentially.
> The block layer does not attempt, generally speaking, to reorder requests.
> 
> When mq-deadline is used, the scheduler lba reordering *may* reorder writes,
> thus hiding potential bugs in the user write sequence for a zone. That is fine.
> However, once a write request is dispatched, we should keep the assumption that
> it is a well formed one, namely directed at the zone write pointer. So any
> consideration of requeue solving write ordering issues is moot to me.
> 
> Furthermore, when the requeue happens, the target zone is still locked and the
> only write request that can be in flight for that target zones is that one being
> requeued. Add to that the above assumption that the request is the one we must
> dispatch first, there are absolutely zero chances of seeing a reordering happen
> for writes to a particular zone. I really do not see the point of requeuing that
> request through the IO scheduler at all.

Totally agree, that is exactly what I meant.

> 
> In general, even for reads, requeuing through the scheduler is I think a really
> bad idea as that can potentially significantly increase the request latency
> (time to completion), with the user seeing long tail latencies. E.g. if the
> request has high priority or a short CDL time limit, requeuing through the
> scheduler will go against the user indicated urgency for that request and
> degrade the effectivness of latency control easures such as IO priority and CDL.
> 
> Requeues should be at the head of the dispatch queue, not through the scheduler.

It is pretty easy to run into STS_RESOURCE for some scsi devices,
requeuing via schedule does add more overhead for these devices.

> 
> As long as we keep zone write locking for zoned devices, requeue to the head of
> the dispatch queue is fine. But maybe this work is preparatory to removing zone
> write locking ? If that is the case, I would like to see that as well to get the
> big picture. Otherwise, the latency concerns I raised above are in my opinion, a
> blocker for this change.

Yeah, looks Bart needs to add more words about requirement & motivation
of this patchset.

Thanks,
Ming