Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

Ming Lei <ming.lei@xxxxxxxxxx> · Thu, 9 Nov 2017 10:05:08 +0800

On Wed, Nov 08, 2017 at 04:41:35PM +0000, Bart Van Assche wrote:
> On Tue, 2017-11-07 at 20:06 -0700, Jens Axboe wrote:
> > At this point, I have no idea what Bart's setup looks like. Bart, it
> > would be REALLY helpful if you could tell us how you are reproducing
> > your hang. I don't know why this has to be dragged out.
> 
> Hello Jens,
> 
> It is a disappointment to me that you have allowed Ming to evaluate other
> approaches than reverting "blk-mq: don't handle TAG_SHARED in restart". That

I have mentioned in another email to Jens, that I agree to revert that
patch because of TAG_WAITING's issue in Jens's test case.

> patch namely replaces an algorithm that is trusted by the community with an
> algorithm of which even Ming acknowledged that it is racy. A quote from [1]:
> "IO hang may be caused if all requests are completed just before the current
> SCSI device is added to shost->starved_list". I don't know of any way to fix
> that race other than serializing request submission and completion by adding
> locking around these actions, which is something we don't want. Hence my
> request to revert that patch.

That can't be the reason for this revert.

This issue[1] is fixed by '[PATCH] SCSI: don't get target/host busy_count in
scsi_mq_get_budget()', follows the idea:

	- we add sdev into shost->starved_list in scsi_target_queue_ready(),
	  and the return value of BLK_STS_RESOURCE is set

	- atomic_read(&sdev->device_busy) is checked to see if there is
	  pending request, queue will be run if it is zero, otherwise we
	  depend on scsi_end_request() from pending request to restart queue.

	- you may mention sdev->device_busy may become 0 just after the
	  check, then the completion still see the sdev in
	  shost->starved_list and do the restart, and no IO hang

If you think something above is wrong, please comment on it directly.
Without this patch, no need any out-of-tree patch, IO hang can be
triggered in test 01 of srp-test. After this patch is applied on
V4.14-rc4, no IO hang can be observed any more.

> 
> Regarding the test I run, here is a summary of what I mentioned in previous
> e-mails:
> * I modified the SRP initiator such that the SCSI target queue depth is
>   reduced to one by setting starget->can_queue to 1 from inside
>   scsi_host_template.target_alloc.
> * With that modified SRP initiator I run the srp-test software as follows
>   until something breaks:
>   while ./run_tests -f xfs -d -e deadline -r 60; do :; done
> 
> Today a system with at least one InfiniBand HCA is required to run that test.
> When I have the time I will post the SRP initiator and target patches on the
> linux-rdma mailing list that make it possible to run that test against the
> SoftRoCE driver (drivers/infiniband/sw/rxe). The only hardware required to
> use that driver is an Ethernet adapter.

The thing is that we still don't know the root cause for your issue, and
keeping the restart for TAG_SHARED can be thought as a workaround. Maybe
it is same with Jens, maybe others, we don't know, and even without any
log provided, such as sched_tags or tags.

It is easy to see > 20% IOPS drops with restart for TAG_SHARED in 8
luns scsi debug test.

-- 
Ming