On Fri, Jan 12 2018 at 1:54pm -0500, Bart Van Assche <Bart.VanAssche@xxxxxxx> wrote: > On Fri, 2018-01-12 at 13:06 -0500, Mike Snitzer wrote: > > OK, you have the stage: please give me a pointer to your best > > explaination of the several. > > Since the previous discussion about this topic occurred more than a month > ago it could take more time to look up an explanation than to explain it > again. Anyway, here we go. As you know a block layer request queue needs to > be rerun if one or more requests are waiting and a previous condition that > prevented the request to be executed has been cleared. For the dm-mpath > driver, examples of such conditions are no tags available, a path that is > busy (see also pgpath_busy()), path initialization that is in progress > (pg_init_in_progress) or a request completes with status, e.g. if the > SCSI core calls __blk_mq_end_request(req, error) with error != 0. For some > of these conditions, e.g. path initialization completes, a callback > function in the dm-mpath driver is called and it is possible to explicitly > rerun the queue. I agree that for such scenario's a delayed queue run should > not be triggered. For other scenario's, e.g. if a SCSI initiator submits a > SCSI request over a fabric and the SCSI target replies with "BUSY" then the > SCSI core will end the I/O request with status BLK_STS_RESOURCE after the > maximum number of retries has been reached (see also scsi_io_completion()). > In that last case, if a SCSI target sends a "BUSY" reply over the wire back > to the initiator, there is no other approach for the SCSI initiator to > figure out whether it can queue another request than to resubmit the > request. The worst possible strategy is to resubmit a request immediately > because that will cause a significant fraction of the fabric bandwidth to > be used just for replying "BUSY" to requests that can't be processed > immediately. > > The intention of commit 6077c2d706097c0 was to address the last mentioned > case. It may be possible to move the delayed queue rerun from the > dm_queue_rq() into dm_requeue_original_request(). But I think it would be > wrong to rerun the queue immediately in case a SCSI target system returns > "BUSY". OK, thank you very much for this. Really helps. For starters multipath_clone_and_map() could do a fair amount more with the insight that a SCSI "BUSY" was transmitted back. If both blk-mq being out of tags and SCSI "BUSY" simply return BLK_STS_RESOURCE then dm-mpath doesn't have the ability to behave more intelligently. Anyway, armed with this info I'll have a think about what we might do to tackle this problem head on. Thanks, Mike