On Sat, Jan 27, 2018 at 10:12:43PM +0000, Bart Van Assche wrote: > On Sat, 2018-01-27 at 14:09 -0500, Mike Snitzer wrote: > > Ming let me know that he successfully tested this V3 patch using both > > your test (fio to both mpath and underlying path) and Bart's (02-mq with > > can_queue in guest). > > > > Would be great if you'd review and verify this fix works for you too. > > > > Ideally we'd get a fix for this regression staged for 4.16 inclusion. > > This V3 patch seems like the best option we have at this point. > > Hello Mike, > > There are several issues with the patch at the start of this thread: > - It is an unnecessary change of the block layer API. Queue stalls can > already be addressed with the current block layer API, namely by inserting > a blk_mq_delay_run_hw_queue() call before returning BLK_STS_RESOURCE. Again, both Jens and I concluded that it is a generic issue, which need generic solution. https://marc.info/?l=linux-kernel&m=151638176727612&w=2 Otherwise, it needs to change the handling on every BLK_STS_RESOURCE in drivers, do we really want to do that? Not mention, the request isn't added to dispatch list yet in .queue_rq(), strictly speaking, it is not correct to call blk_mq_delay_run_hw_queue() in .queue_rq(), so the current block layer API can't handle it well enough. > - The patch at the start of this thread complicates code further that is > already too complicated, namely the blk-mq core. That is just your opinion, I don't agree. > - The patch at the start of this thread introduces a regression in the > SCSI core, namely a queue stall if a request completion occurs concurrently > with the newly added BLK_MQ_S_SCHED_RESTART test in the blk-mq core. This patch only moves the blk_mq_delay_run_hw_queue() from scsi_queue_rq() to blk-mq, again, please explain it in detail how this patch V3 introduces this regression on SCSI. Actually this patch should fix a race on SCSI-MQ, because when scsi_queue_rq() call blk_mq_delay_run_hw_queue(), the request isn't in dispatch list yet, so in theory this request may not be visible when __blk_mq_run_hw_queue() is run. Don't expect the 3ms delay will cover that, it is absolutely fragile to depend on timing to deal with the race. Maybe it can be one LSF/MM topic proposal... thanks, Ming